Monday, January 04, 2021

Doodle Code


 I have created a type of barcode called Doodle Code. Similar to QR codes, it can be used to store data.

Sunday, October 25, 2020

Turing Paint

 

I have created my own programming language called Turing Paint. It is a visual programming language: programs are represented using images.

Monday, July 20, 2020

tensorflow-compress

I just released a new open source compression program: tensorflow-compress. It performs lossless data compression using neural networks. Its compression rate is significantly worse than cmix, but it has some nice features:
  • It can run significantly faster using GPUs
  • It is written in Colab and designed to be run through a web browser
I am planning to continue working on it to improve performance and compression rate.

Tuesday, August 07, 2018

Android App

Today I published my first Android app: Squishy Earth

This was based on a HTML5 demo I made a few years ago.

Wednesday, March 07, 2018

cmix updates

There was recently a new winning entry (called phda) for the Hutter Prize. The Hutter Prize is a contest for compressing the first 100MB of Wikipedia (called enwik8). Here is how all of the previous versions of cmix performed on enwik8. phda is incredible - using far less CPU and memory than cmix while getting a similar compression rate. Unfortunately it is closed source, so many of its implementation details are hidden. phda has pushed cmix v14 into second place on the Large Text Compression Benchmark. cmix currently compresses enwik8 to 15113248 bytes, so I am guessing the next cmix release should make it back to first place. Unfortunately there is a bug which is preventing cmix from compressing enwik9 properly, so I need to fix that before the v15 release. This bug is difficult to fix because I have only seen it happen when compressing enwik9... which takes over one week of CPU time for each debugging pass.

Last year I got funding for cmix through the AI Grant. This helped speed up progress by giving me access to more computational resources. I bought a new desktop and can use many virtual machines simultaneously on Google Compute Engine. The new resources let me run cmix on the Lossless Photo Compression Benchmark, getting first place using over 6 months of CPU time!

Thursday, December 14, 2017

Dollar Cost Averaging

Dollar cost averaging (DCA) is an investment strategy where you invest a fixed amount of money on a regular schedule over time. If you have some money, let's say $100, which of these strategies is better?

Strategy #1: invest $100 on day 1
Strategy #2: invest $1 every day for 100 days (i.e. DCA)

You convert your investment back to dollars at some future date, say after 1000 days. As expected, there is a risk vs reward trade off. Both strategies are expected to have some positive interest. Many people I talk to claim that DCA is a valid strategy: although it has a lower expected reward than strategy #1, it also has lower risk. They argue that you should choose strategy #1 or #2 based on your risk vs reward preference.

I have always had an uneasy feeling about this argument. Intuitively I feel like DCA never is a good strategy. In the past I have had various arguments with people about why, but I have never been able to make a convincing argument. Out of the people I have talked to, it also feels like I am the only one who has this opinion.

I have never done any research into this topic or taken any relevant classes. Today I had another argument with some friends, and I finally decided to just run some computer simulations comparing various strategies under different conditions. Figuring out the math is hard, but running computer simulations and looking at the statistics is easy.

One insight with strategy #1 is that you can adjust your risk vs reward trade off by just investing less money. Less investment = lower reward and lower risk. We can introduce this as a new strategy:

Strategy #3: invest $X (e.g. $75) on day 1 and keep $(100-X) in your wallet.

With DCA you can adjust the risk vs reward trade off using different investment schedules. Investing $10 every day for 10 days will have higher risk and higher reward than investing $1 every day for 100 days.

For completeness, we can compare a few other strategies:

Strategy #4: invest $100 on day 50
Strategy #5: invest $100 on day 100

My simulation adds random fluctuations to the investment, so we can measure the expected return and variance of the different strategies using millions of simulated investments. I also experimented with various parameters of the simulations (e.g. how much the investment fluctuates, the duration of the investment, the growth rate of the investment, etc).

Under normal conditions:
- Strategy #1 has the highest expected return and highest variance.
- Strategy #4 and #5 never are useful. You can get the same expected return with lower variance using DCA or strategy #3.
- Strategy #3 is better than DCA. At any level of preferred variance, strategy #3 can get higher expected return. At any level of preferred expected return, strategy #3 can get lower variance.

Surprisingly, there actually are some scenarios where the situation reverses and DCA becomes superior to strategy #3. When the size of the random fluctuations becomes large enough, DCA becomes better. The size of the fluctuations has to become pretty huge - much larger variation than I would expect from normal stock index funds. It might make sense to use DCA for investing in something like bitcoin.

Friday, July 14, 2017

Rock Paper Scissors Using LSTM

Recently I have been doing a lot of research into using LSTM for data compression (in cmix, lstm-compress, and tensorflow-compress). In 2011 I made a website about Rock Paper Scissors AI. I realized that LSTM should be good at playing RPS, so today I made a small demo to do that: http://www.byronknoll.com/lstm.html