Byron's Blog: January 2014

Monday, January 06, 2014

Avegant Glyph

A few months ago I wrote a review of the Oculus Rift. There is another similar product coming out soon which I am even more excited about. The Avegant Glyph uses a revolutionary technology called virtual retinal display (VRD). My prediction is that 50 years from now VRD will be the primary way most people interact with computers.

The first version of the Glyph will be available on Kickstarter on January 22nd for $499. I am tempted to order one. I don't like the fact that headphones/batteries are embedded in the headset, making it much bulkier than it needs to be.

cmix

I spent most of my christmas break working on a data compression project. It is a lossless compression program called "cmix" written in C++. It is aimed entirely at optimizing compression ratio (at the cost of high CPU/memory usage). I have the very ambitious goal of eventually surpassing the compression performance of PAQ8. PAQ8 is currently state of the art in most compression benchmarks. My master's thesis was related to PAQ8. Realistically, it is unlikely that I will be able to surpass PAQ8 (given the amount of work/expertise that has gone into its development).

As I develop the program, I have been tracking my progress on some standard compression benchmarks. You can see my progress at https://sites.google.com/site/cmixbenchmark/. This tracks two compression benchmarks: enwik8 (Wikipedia) and calgary corpus (a mixture of files). The tables show the compression size of some other compression programs for reference. The time line shows the performance of cmix over time, as I add new features.

cmix currently compresses the calgary corpus to 667,558 bytes. The record is 580,170 bytes by a variant of PAQ8 called PAQAR. There have been hundreds of versions of PAQ, but here are some major milestones in its version history:

Version	Calgary	Date
P5	992,902	2000
PAQ1	716,704	2002
PAQ2	702,382	May 2003
PAQ3	696,616	Sep 2003
PAQ4	672,134	Nov 2003
PAQ5	661,811	Dec 2003
PAQ6	648,892	Jan 2004
PAQ7	611,684	Dec 2005
PAQ8L	595,586	Mar 2007

So cmix currently does slightly better than PAQ4. cmix shares a similar architecture to PAQ. It is a streaming algorithm - it compresses data by predicting the probability of each bit given the previous bits in the input. It combines the predictions of a large number of independent models into a single prediction for each bit (i.e. ensemble learning). The main developer of PAQ wrote an online book about data compression: Data Compression Explained. It is an excellent reference - I have been following it closely in the development of cmix.

Monday, January 06, 2014

Avegant Glyph

cmix

Links

Blog Archive