Wednesday, December 20, 2006

Thought compression - part deux: A python perspective

Since I last wrote about thought compression, the Hutter prize was launched and somebody has already found a way to earn part of that prize. The concept is to find a compression algorithm that compresses a 100 MB from Wikipedia in a lossless fashion to less than 17 MB. 17 MB is the benchmark value for the best "dumb" compressors like zip...The idea is that using relationships between words, one is very likely to be able to compress further a text as opposed to relying on a smart but really dumb way of compressing words of a dictionary based on the frequency use. In the end, the sense is that artificial intelligence can benefit from this because it would permit the "comparison" between statements using distances such as the ones devised by Rudi Cilibrasi.

While compressing text is fine, another area of compression still impresses me. The use of a very expressive programming language that nearly reads like pseudo-code. Python continues on providing examples as to why it falls in this category of expressive languages. For instance, the p2p example was nice but others are similarly intriguing like:

- the pageRank algorithm

- an IRC bot

- a Markov Chain Monte Carlo solver (MCMC solver) or

- Autonomous car programming in the past and in the present.

The MCMC code one is more intriguing than anything else. Because of their ability to compute multidimensional integrals, MCMC methods have allowed the use of Bayesian statistics and Bayesian programming in problems of inferences, a hallmark of artificial intelligence that the Hutter prize is seeking to enable.

No comments:

Printfriendly