Thursday, January 24, 2013


In response to my last post, Jackie asks:
Not sure I understand your point about selling in as short a time window as possible. The day-to-day fluctuations of a stock's price is greater than its fluctuation over a week or so, right? But it might not be worth the extra transaction costs to sell over a time window.
I am replying here so the response is more widely visible. I claim that a stock should have much greater fluctuation at the end of a week than a day. In order to have less variance after a week, the stock would have to revert towards some sort of mean after the daily fluctuation. If this was true, I could use a simple strategy to make huge amounts of money in a short amount of time. I would look at a rolling average of one day for all stocks on the market. I would then buy the lowest stocks relative to their mean and then sell when their price hits the mean. The reason this strategy won't work is because stocks will not revert to a mean in the short-term. The reason they don't revert to a mean is because other investors in the market (especially high-frequency traders) would pick up on such an obvious pattern and take advantage of it. These traders would have a smoothing effect on stock price, ensuring that the price never reverts from the mean. Therefore I think it is necessary that variance increases the further into the future you consider.

Wednesday, January 23, 2013

Optimal Investment Strategies

I have recently become interested in the topic of investment. More specifically, I want to know:
  1. When should I sell my Google stock? Should I sell it all in one chunk or spread out the sales over a time window?
  2. How should I invest money that I currently have sitting in a bank?
I have been reading The Four Pillars of Investment for advice (as recommended by this post).

I was surprised to learn that there are actually provably optimal strategies when making these investment decisions. If you had asked me about this a month ago, I would have shrugged and said that it would be too complex to calculate an optimal strategy and that different experts in the field would probably give different answers. However, now I can say:
If you currently own stock in a single company (or a small set of companies), the optimal strategy is to sell immediately. The sale should be made as a single lump sum instead of spread out over a time window. You should then consider investing that money in an index fund.
The goal of investing money is to maximize expected profit for a given level of risk. The function of expected profit vs risk is monotonically increasing: with more risk you can always find an investment with higher expected profit. This fact can be used to make optimal investment decisions. If one investment has lower risk and higher expected reward than another investment, the optimal choice is clear. So, how can you calculate risk and reward?

It turns out that when investing in stocks, there is one fact that makes computing risk and expected reward much easier: regardless of how smart you are, you can not predict what a stock will do in the immediate future. You have zero information that is not already reflected in the stock price of a company. A company's stock price is a weighted average of a large number of expert investors. As with ensemble methods in machine learning, an individual can not compete with a weighted average of many expert predictors. Any investor who claims to be good at actively choosing a small set of stocks to invest in is wrong - even if they have an incredibly profitable investment history. In a room full of monkeys choosing to invest in random stocks, some of those monkeys would become rich due to natural variance in the profit distribution. I find this fact comforting - I am as good at predicting stock values as the best wall street investors (i.e. I can't).

There are certain exceptions to this:
  • When new public information is released, you can try racing other investors to take advantage of the information (i.e. high-frequency trading) using an algorithm.
  • In the long term, you know that the stock market as a whole will tend to revert to a mean value. It is profitable to invest more heavily during a bear market and sell during a bull market. Reverting to a mean does not apply to individual stocks in the short-term (otherwise it would be very easy to make money in the stock market).
  • In the very long term, you know that all stocks eventually go to zero (i.e. all companies eventually go bankrupt).
It is known that different classes of stock have different levels of risk. A company on the brink of bankruptcy has high risk. As expected, riskier stocks tend to have higher expected profit (this can be verified by analyzing historical data, as done in The Four Pillars). So, let's assume you are interested in investing in a particular "class" of stocks (i.e. a set risk level). Within this class, you do not have the ability to pick out good stocks - the expected profit of all these stocks is the same. So, should you invest in a small set of stocks or a large set? Well, given a set expected profit, you should try to minimize risk. You can do this by decreasing the variance of your investment - splitting your investment among many companies has a lower profit variance than investing in few. So, the optimal strategy here is always to invest in a large number of companies (i.e. index funds).

So, back to my problem of selling stock in a single company. The optimal decision is to sell immediately since I can get the same expected profit with lower risk using an index fund. Now the question of whether to sell in a lump sum or to sell over a time window. Once again, the expected profit of these options is roughly the same since I can't predict what the stock will do in the future. Which option has lower risk? I conducted a poll among some friends and most of them argued that spreading the sale over a time window has lower risk since it will smooth out fluctuations in the stock price. However, I argue that a lump sum payment has much lower risk. The "fluctuations" in a stock price are not centered about some short-term mean - if they were it would be easy to take advantage of this to predict stock price. Instead, the longer your time window, the more variance there is in the expected stock price distribution (which has an expected value equal to the current stock price). This means the longer your time window, the more risk you incur. Therefore the optimal strategy is to minimize the time window (i.e. make a single lump-sum payment).

Now on to my second question of what to do with money in a bank. Even if I am completely risk averse, there exist investments with a higher expected value and lower risk than storing money in an account without interest rate. Due to inflation, I think most bank accounts actually have a negative expected value. As for my level of risk-aversion, I will probably work on constructing a portfolio with a mix of different classes of index funds and bonds. Once I have constructed a portfolio, the only maintenance it needs over time is rebalancing.

So, does anything here sound controversial to you? If so, please let me know - I am still new to this and could be completely wrong. But as far as I can tell, I can compute the optimal investment strategy when comparing a small set of options.

Tuesday, January 22, 2013

Space-Saving Algorithm

Peter Norvig recently posted an interesting article about the frequency of different ngrams in a massive data set. A few months ago I worked on a similar problem.

Inspired by Peter Norvig's post, I decided to release my code for the space-saving algorithm as an open source project:

I optimized the speed of the program and tried testing it on Wikipedia. The file I used was a XML dump of Wikipedia which is 42,544,528,073 bytes (39.6 GiB). I used k=100,000,000 (100 million internal buckets). Here are the results: Unlike last time, the counts reported here are exact. After an initial pass with the space-saving algorithm my program makes a second pass to compute the exact counts for the top buckets.

I ended up implementing my own hash table from scratch for this project. I figure that since this problem has some unique properties (such as a constant number of elements in the hash table), it would be easier to optimize my own hash table rather than using something more generic. I used linear probing to resolve hash collisions.

I found a way to improve upon one aspect of the standard linear probing approach. Typically deleted entries are marked with a special label, so a large number of <delete,insert> swaps would eventually fill the table up with useless deletion entries (requiring an expensive resize). My implementation doesn't use deletion labels and never needs to be resized. Instead, when an entry is removed I propagate the empty spot down until it reaches a free slot. Performing this propagation operation requires computing the desired position of subsequent elements (to know whether the element can be moved) to maintain consistency. This can be done efficiently in my hash table since the elements are integers - computing the desired position just requires a mod operation.