A few months ago I wrote a review of the Oculus Rift. There is another similar product coming out soon which I am even more excited about. The Avegant Glyph uses a revolutionary technology called virtual retinal display (VRD). My prediction is that 50 years from now VRD will be the primary way most people interact with computers.
The first version of the Glyph will be available on Kickstarter on January 22nd for $499. I am tempted to order one. I don't like the fact that headphones/batteries are embedded in the headset, making it much bulkier than it needs to be.
Monday, January 06, 2014
cmix
I spent most of my christmas break working on a data compression project. It is a lossless compression program called "cmix" written in C++. It is aimed entirely at optimizing compression ratio (at the cost of high CPU/memory usage). I have the very ambitious goal of eventually surpassing the compression performance of PAQ8. PAQ8 is currently state of the art in most compression benchmarks. My master's thesis was related to PAQ8. Realistically, it is unlikely that I will be able to surpass PAQ8 (given the amount of work/expertise that has gone into its development).
As I develop the program, I have been tracking my progress on some standard compression benchmarks. You can see my progress at https://sites.google.com/site/cmixbenchmark/. This tracks two compression benchmarks: enwik8 (Wikipedia) and calgary corpus (a mixture of files). The tables show the compression size of some other compression programs for reference. The time line shows the performance of cmix over time, as I add new features.
cmix currently compresses the calgary corpus to 667,558 bytes. The record is 580,170 bytes by a variant of PAQ8 called PAQAR. There have been hundreds of versions of PAQ, but here are some major milestones in its version history:
So cmix currently does slightly better than PAQ4. cmix shares a similar architecture to PAQ. It is a streaming algorithm - it compresses data by predicting the probability of each bit given the previous bits in the input. It combines the predictions of a large number of independent models into a single prediction for each bit (i.e. ensemble learning). The main developer of PAQ wrote an online book about data compression: Data Compression Explained. It is an excellent reference - I have been following it closely in the development of cmix.
As I develop the program, I have been tracking my progress on some standard compression benchmarks. You can see my progress at https://sites.google.com/site/cmixbenchmark/. This tracks two compression benchmarks: enwik8 (Wikipedia) and calgary corpus (a mixture of files). The tables show the compression size of some other compression programs for reference. The time line shows the performance of cmix over time, as I add new features.
cmix currently compresses the calgary corpus to 667,558 bytes. The record is 580,170 bytes by a variant of PAQ8 called PAQAR. There have been hundreds of versions of PAQ, but here are some major milestones in its version history:
Version | Calgary | Date |
---|---|---|
P5 | 992,902 | 2000 |
PAQ1 | 716,704 | 2002 |
PAQ2 | 702,382 | May 2003 |
PAQ3 | 696,616 | Sep 2003 |
PAQ4 | 672,134 | Nov 2003 |
PAQ5 | 661,811 | Dec 2003 |
PAQ6 | 648,892 | Jan 2004 |
PAQ7 | 611,684 | Dec 2005 |
PAQ8L | 595,586 | Mar 2007 |
So cmix currently does slightly better than PAQ4. cmix shares a similar architecture to PAQ. It is a streaming algorithm - it compresses data by predicting the probability of each bit given the previous bits in the input. It combines the predictions of a large number of independent models into a single prediction for each bit (i.e. ensemble learning). The main developer of PAQ wrote an online book about data compression: Data Compression Explained. It is an excellent reference - I have been following it closely in the development of cmix.