Byron's Blog: 2014

Thursday, December 04, 2014

Geometric Median

During a meeting with some coworkers we started debating if it was possible to find a better meeting location. Given the starting location of all the meeting participants, could we find a point which minimizes the total travel distance?

It turns out that it is not trivial to solve. The average of the participant locations is not necessarily the optimal point. The actual optimal point is called the geometric median.

I made a HTML5 page which visualizes the geometric median, along with the convex cost function: http://www.byronknoll.com/median.html

Thursday, August 21, 2014

Cardboard

I finally own a VR device: Google Cardboard. It works surprisingly well! I am using it with a Nexus 5, which has a nice display. Cardboard doesn't work quite as well as the first developer edition of the Oculus Rift (see my review here), but it is pretty close. The demo apps for Cardboard are currently quite limited, so there isn't really much to do on it after the first 10 minutes. I am still waiting for my kickstarter edition of the Avegant Glyph: they have delayed the delivery date until next year :(

Saturday, July 26, 2014

Wargames

I recently started a fun activity called wargaming. Wargames are a set of computer science related challenges/puzzles. So far I have completed three of the wargames on overthewire.org: Bandit, Krypton, and Leviathan. I track my progress on wechall.net. wechall.net tracks scores across dozens of wargaming sites and has user rankings/leaderboards. The wargames I have seen touch many different areas of computer science: cryptography, network security, linux tools, C++ debugging, etc. Very educational.

Wednesday, July 16, 2014

Dwarf Fortress

I recently beat GTA V, so I was looking for a new game to play. GTA V is the most expensive video game ever made and has amazing production value. I eventually decided on Slaves to Armok: God of Blood Chapter II: Dwarf Fortress as the next game I would try. It is on the opposite-end of the game spectrum compared to GTA V: it is free, made by a single developer, and has a text-based interface. I am a fan of NetHack and it looked like DF might have some similarities.

I have been playing DF for three days and so far it is incredible. The first day was tough since there is such a steep learning curve. My first fortress was a mess because I didn't understand how stairs/slopes worked - all my dwarves got trapped underground in a pit. I ended up watching a youtube tutorial to figure out the basics. After the tutorial I started up a second fortress and for the last two days I have managed to survive in this.

I had one near disaster when my fortress ran out of food. My dwarves were so hungry that they started ignoring my orders so I couldn't get them to work on actually finding new food sources. Some of my dwarves started dying and one went crazy and started attacking everyone. I thought I was doomed until suddenly a group of immigrants arrived. I immediately put the healthy immigrants to work on making/collecting food. Success!

Today I worked on creating a reliable water source for my fortress. I looked up how wells work on the DF wiki and starting building one. It ended up taking me two hours to finish the well. My fortress is underground and the most convenient water source was from a river on the surface. Channeling water around is dangerous because it could end up flooding my fortress or drowning the workers mining the tunnels. I built a tunnel from the river to a reservoir below my fortress. The well is connected to the reservoir. To control the flow of water I created two floodgates and connected them to levers in my fortress. I could then use the levers to control the flow of water from a safe distance, once the dwarves had finished digging and evacuating the reservoir. In the screenshot above you can see one level of my fortress and a party my dwarves organized in celebration of the new well.

Sunday, July 06, 2014

visibility_polygon.js Version 1.5

I have released a new version of my visibility polygon library. One of the constraints in the old version was that none of the input line segments could intersect. In this new version, I added a function which breaks apart any intersecting line segments into non-intersecting components.

Demo: http://www.byronknoll.com/visibility.html

Thursday, June 26, 2014

cmix v3

I have released cmix version 3. This version is over twice as fast, uses less memory, and gets better compression ratio.

Sunday, June 15, 2014

Backpropagation

Today I derived backpropagation for a logistic activation function and a cross entropy loss function. This has been done before, but I couldn't find the full set of formulas for it online. I am using the variable notation from here: http://neuralnetworksanddeeplearning.com/chap2.html

Thursday, May 29, 2014

cmix version 2

I have released cmix version 2. It is now first place on the Large Text Compression Benchmark.

Wednesday, April 16, 2014

cmix version 1

I have released cmix version 1. It is now ranked first place on the Silesia Benchmark and second place on the Large Text Compression Benchmark.

Saturday, March 01, 2014

Vector Earth

I added a new HTML5 experiment to my website: Vector Earth. Inspired by the xkcd now comic, I downloaded some vector map data and tried experimenting with making some of my own map projections. The first projection I implemented was the Azimuthal equidistant projection, similar to what was used for the xkcd comic. If you are interested in map projections, Jason Davies has implemented some very cool demos.

While I was implementing the map projection, I encountered a bug where there was an unnatural vertical line cutting through Siberia. I went to Google Maps to see what Siberia was supposed to look like, and was amused to find that Google Maps seemed to have the same bug!

In both the vector and satellite data, there are strange discontinuities along this line. It turns out that this line is the threshold where 180 degrees longitude meets -180 degrees.

Wednesday, February 26, 2014

xkcd now clock

I liked the clock in today's xkcd. However, it only has a 15 minute granularity. Well, this is something I can fix in HTML5 canvas! I checked to see if anyone else has done this and I think I am the first.

Saturday, February 22, 2014

Texture

Can you guess what this is?

It is a zoomed in picture of the lid to a container of yogurt I ate yesterday. It reminded me of the patterns produced by the Belousov-Zhabotinsky reaction. I have no idea why the lid has such a strange pattern on it.

cmix update

cmix has now surpassed paq8l on both the calgary corpus and enwik8. cmix is probably now one of the most powerful general purpose compression programs. cmix is about the same speed as paq8l, but uses much more memory (which gives it an advantage on large files such as enwik8).

cmix currently uses an ensemble of 112 independent predictors (paq8l uses 552). cmix uses a more extensive neural network to mix the models than paq8l. cmix uses a four layer network with 528,496 neurons (paq8l uses a three layer network with 3,633 neurons).

For both calgary corpus and enwik8, there is still a significant gap until cmix reaches state of the art. PAQAR and decomp8 are both specialized versions of paq8, which use dictionary preprocessing specific to the dataset (i.e. they are not very good at general purpose compression). I think my first official benchmark submission will probably be on the large text compression benchmark. cmix would currently be ranked as 4th out of 183 submissions. However, I still have a number of ideas which should make cmix significantly better, so I'll hold off on submitting until I reach first place!

Saturday, February 15, 2014

Nothing To Hide

There is a cool game being developed called Nothing To Hide. It uses the visibility polygon library that I wrote. Review at rockpapershotgun.

Monday, January 06, 2014

Avegant Glyph

A few months ago I wrote a review of the Oculus Rift. There is another similar product coming out soon which I am even more excited about. The Avegant Glyph uses a revolutionary technology called virtual retinal display (VRD). My prediction is that 50 years from now VRD will be the primary way most people interact with computers.

The first version of the Glyph will be available on Kickstarter on January 22nd for $499. I am tempted to order one. I don't like the fact that headphones/batteries are embedded in the headset, making it much bulkier than it needs to be.

cmix

I spent most of my christmas break working on a data compression project. It is a lossless compression program called "cmix" written in C++. It is aimed entirely at optimizing compression ratio (at the cost of high CPU/memory usage). I have the very ambitious goal of eventually surpassing the compression performance of PAQ8. PAQ8 is currently state of the art in most compression benchmarks. My master's thesis was related to PAQ8. Realistically, it is unlikely that I will be able to surpass PAQ8 (given the amount of work/expertise that has gone into its development).

As I develop the program, I have been tracking my progress on some standard compression benchmarks. You can see my progress at https://sites.google.com/site/cmixbenchmark/. This tracks two compression benchmarks: enwik8 (Wikipedia) and calgary corpus (a mixture of files). The tables show the compression size of some other compression programs for reference. The time line shows the performance of cmix over time, as I add new features.

cmix currently compresses the calgary corpus to 667,558 bytes. The record is 580,170 bytes by a variant of PAQ8 called PAQAR. There have been hundreds of versions of PAQ, but here are some major milestones in its version history:

Version	Calgary	Date
P5	992,902	2000
PAQ1	716,704	2002
PAQ2	702,382	May 2003
PAQ3	696,616	Sep 2003
PAQ4	672,134	Nov 2003
PAQ5	661,811	Dec 2003
PAQ6	648,892	Jan 2004
PAQ7	611,684	Dec 2005
PAQ8L	595,586	Mar 2007

So cmix currently does slightly better than PAQ4. cmix shares a similar architecture to PAQ. It is a streaming algorithm - it compresses data by predicting the probability of each bit given the previous bits in the input. It combines the predictions of a large number of independent models into a single prediction for each bit (i.e. ensemble learning). The main developer of PAQ wrote an online book about data compression: Data Compression Explained. It is an excellent reference - I have been following it closely in the development of cmix.