This is a follow-up on my last post. Here are some comparisons of two symbols generated from the same class and displayed at the same orientation. I can adjust parameters on how much variance there is between symbols within a class - the parameters I chose are fairly conservative so that most symbols should appear very similar:
Sunday, March 25, 2012
A New Kind of CAPTCHA
My goal for today was to create my own CAPTCHA system. I find CAPTCHAs fascinating because of their goal of distinguishing human intelligence from machines. Surprisingly, most major CAPTCHA systems (currently used by companies such as Google, Microsoft, Facebook, etc.) are routinely solved by computers with over 15% success rates (at least according to the brief research I did on the topic today).
Most CAPTCHA systems provide an image of a sequence of characters and ask the user to type in the text. Software that attempts to solve these typically first segment the image into characters and then classify each character individually. Character classification is a solved problem - so the only security behind current CAPTCHAs relies on the fact that character segmentation is difficult. In my attempt at creating a CAPTCHA system I will try to make both the segmentation and classification tasks as difficult as possible.
One of the reasons English letter classification is so easy is because there is a massive amount of training data available for each letter. In my CAPTCHA system, I will create my own symbols and provide only *one* training example for each symbol class. My hypothesis is that humans are better than computers at extrapolating from tiny training sets. Another benefit of creating new symbols from scratch is that it makes the system internationalizable (i.e. not dependent on the character set of a particular language).
Here is an example output from my CAPTCHA program:
The answer in this case is "1428736". First, I create 10 symbol classes and present one example from each class to the user. I then generate 7 symbols from random classes (no repetition) and arrange them from left to right (allowing overlaps). The user has to enter the symbol classes in the right order. There is a 1 in 604,800 chance of guessing correct randomly, or a 1 in 10,000,000 chance if symbol repetition is allowed. The symbols are displayed in random orientation.
The symbol from a particular class will look different between the training example and the main CAPTCHA. Each class is defined by a small set random parameters. These parameters determine features such as branching factor, curviness, length, etc. Once I have generated the random class parameters, each symbol is also created via a stochastic process - but its general appearance will remain visually similar to the other symbols in its class due to the class parameters.
Well, I suspect that this CAPTCHA system will not work well in practice because it is too tedious for humans to solve. On first impression I think this task seems more difficult to solve with a computer than typical CAPTCHA systems. It would be easy to evaluate the time/precision of humans solving these CAPTCHAs with a user-study. However, evaluating the performance of software solvers is more difficult. One idea I had for evaluating software solvers is to host a contest (say on TopCoder or Kaggle) and offer a big cash prize as incentive to make competitive AIs (I won't actually do this :P).
Here are a few more random CAPTCHAs from the program. I have posted the answers at the end of this post - how many can you solve without looking at the answers?
The answers are: 0718936, 9467251, 0197283, and 7634290 respectively.
Most CAPTCHA systems provide an image of a sequence of characters and ask the user to type in the text. Software that attempts to solve these typically first segment the image into characters and then classify each character individually. Character classification is a solved problem - so the only security behind current CAPTCHAs relies on the fact that character segmentation is difficult. In my attempt at creating a CAPTCHA system I will try to make both the segmentation and classification tasks as difficult as possible.
One of the reasons English letter classification is so easy is because there is a massive amount of training data available for each letter. In my CAPTCHA system, I will create my own symbols and provide only *one* training example for each symbol class. My hypothesis is that humans are better than computers at extrapolating from tiny training sets. Another benefit of creating new symbols from scratch is that it makes the system internationalizable (i.e. not dependent on the character set of a particular language).
Here is an example output from my CAPTCHA program:
The answer in this case is "1428736". First, I create 10 symbol classes and present one example from each class to the user. I then generate 7 symbols from random classes (no repetition) and arrange them from left to right (allowing overlaps). The user has to enter the symbol classes in the right order. There is a 1 in 604,800 chance of guessing correct randomly, or a 1 in 10,000,000 chance if symbol repetition is allowed. The symbols are displayed in random orientation.
The symbol from a particular class will look different between the training example and the main CAPTCHA. Each class is defined by a small set random parameters. These parameters determine features such as branching factor, curviness, length, etc. Once I have generated the random class parameters, each symbol is also created via a stochastic process - but its general appearance will remain visually similar to the other symbols in its class due to the class parameters.
Well, I suspect that this CAPTCHA system will not work well in practice because it is too tedious for humans to solve. On first impression I think this task seems more difficult to solve with a computer than typical CAPTCHA systems. It would be easy to evaluate the time/precision of humans solving these CAPTCHAs with a user-study. However, evaluating the performance of software solvers is more difficult. One idea I had for evaluating software solvers is to host a contest (say on TopCoder or Kaggle) and offer a big cash prize as incentive to make competitive AIs (I won't actually do this :P).
Here are a few more random CAPTCHAs from the program. I have posted the answers at the end of this post - how many can you solve without looking at the answers?
The answers are: 0718936, 9467251, 0197283, and 7634290 respectively.
Sunday, March 11, 2012
An API for Intelligence
Over the last few years I have made many attempts at creating artificial intelligence. By "artificial intelligence" I mean a general purpose system that can recognise and predict patterns in spatio-temporal data. I have written about this topic in some previous posts.
Spatio-temporal data is any data that has a temporal dimension and one or more spatial dimensions. Everything your brain perceives is a stream of spatial data over time. Any type of sensor (e.g. microphone, camera, thermometer) can create a stream of spatial data over time. The goal of artificial intelligence/machine learning/data compression is to look for patterns in this type of data and predict what the data will be in the future.
All of my AI projects have had essentially the same API. For those of you who speak Java: "double[][] perceive (double[][] inputs)". That is, a single function that takes a matrix of floating point numbers and returns another matrix of floating point numbers. The input represents spatial data at a particular moment in time and the return value from the function represents a prediction for the next input (the next time step).
A magic black box:
Let us imagine that I have a magic black box that does a great job at implementing this function. What could I do with it? Well the most obvious thing I can do is use it to predict the future. Let's use it to find out what the stock prices will be five years from now:
Of course, these estimates wouldn't be very accurate because in reality stock prices depend on a vast number of different types of data (which I didn't give to the black box as input). If I gave the black box additional data in the input matrix (such as local news stories, earning reports, weather sensors, etc.) it would do a better job at predicting stock prices.
What else could I do with the black box? If I want to have a conversation with it or make it control a robot, I would need to give it some mechanism to perform actions. Let's imagine I naively plug a light bulb into a random cell in the output matrix of the black box (and the light bulb turns on/off depending on the value of that prediction). This light bulb is now an actuator - it gives the black box a mechanism to interact with the environment. There is now a feedback loop where the black box can influence the value of future sensor readings by changing the light bulb prediction. How would the black box choose which action to take? Well, the only "goal" of the box is to minimise the difference between its predictions and what actually happens in the future. Maybe turning on the light bulb allows the black box to make more accurate predictions of future sensor values, so it "decides" to keep the light on. If I hook up a speaker and microphone to the black box, maybe it will decide that having a conversation with me will also allow it to make more accurate predictions about the future (which is probably true).
The Magic Number:
Hopefully in the previous section I convinced you that if this one function was implemented correctly, it would result in something most people would consider true AI. So, why did I choose a matrix of floating point numbers? Why not simplify the API by just making it a single array instead? Why not a higher dimensional matrix? The answer is because I think that two (as in a two-dimensional matrix) is the magic number that makes a reasonable trade-off between competing factors. Using a higher dimensional space would make implementing the function infeasible due to computational complexity. Using a lower dimensional space loses critical information about the spatial relationships between sensors. As evidence of this information loss, imagine converting a two-dimensional image into a one-dimensional array of pixels. The image would become meaningless to you because you have lost the information of how the pixels are spatially arranged.
Let us consider the most intelligent system we know of today: the human brain. I think that the human brain is complying to the same API I described above. In fact, we only need to look at a part of the brain known as the neocortex. The neocortex is a thin sheet of neurons on the outer surface of the brain. It is responsible for essentially all higher-level thought and what makes humans intelligent. Since the neocortex is fundamentally a two-dimensional surface, it uses topographic maps to project higher-dimensional signals onto a two-dimensional space. For example, the three-dimensional touch sensors on your body are mapped onto a two-dimensional homunculus on your neocortex, where regions that are neighbouring in 3D space are also neighbouring on the homunculus (and regions which are more important are mapped to larger regions on the homunculus).
So, how close are we to being able to implement AI? I think so far the most successful efforts come from the field of data compression. A compression algorithm called PAQ8 does an amazing job of implementing "boolean perceive (boolean input)". However, it doesn't scale well to continuous numbers or higher-dimensional spaces. Another promising attempt at creating AI is coming from Numenta in the form of hierarchical temporal memory. So far their attempts have been unsuccessful, but I think at a higher level their approach to the problem is the best I have seen.
Spatio-temporal data is any data that has a temporal dimension and one or more spatial dimensions. Everything your brain perceives is a stream of spatial data over time. Any type of sensor (e.g. microphone, camera, thermometer) can create a stream of spatial data over time. The goal of artificial intelligence/machine learning/data compression is to look for patterns in this type of data and predict what the data will be in the future.
All of my AI projects have had essentially the same API. For those of you who speak Java: "double[][] perceive (double[][] inputs)". That is, a single function that takes a matrix of floating point numbers and returns another matrix of floating point numbers. The input represents spatial data at a particular moment in time and the return value from the function represents a prediction for the next input (the next time step).
A magic black box:
Let us imagine that I have a magic black box that does a great job at implementing this function. What could I do with it? Well the most obvious thing I can do is use it to predict the future. Let's use it to find out what the stock prices will be five years from now:
// Training.
double[][] stockPrices;
for (Time t = TimeOfFirstData(); t < Now(); ++t) {
stockPrices = GetHistoricalStock(t);
stockPrices = blackBox.perceive(data);
}
// Predict the future.
for (Time t = Now(); t <= FiveYearsFromNow(); ++t) {
stockPrices = blackBox.perceive(stockPrices);
}
return stockPrices;
Of course, these estimates wouldn't be very accurate because in reality stock prices depend on a vast number of different types of data (which I didn't give to the black box as input). If I gave the black box additional data in the input matrix (such as local news stories, earning reports, weather sensors, etc.) it would do a better job at predicting stock prices.
What else could I do with the black box? If I want to have a conversation with it or make it control a robot, I would need to give it some mechanism to perform actions. Let's imagine I naively plug a light bulb into a random cell in the output matrix of the black box (and the light bulb turns on/off depending on the value of that prediction). This light bulb is now an actuator - it gives the black box a mechanism to interact with the environment. There is now a feedback loop where the black box can influence the value of future sensor readings by changing the light bulb prediction. How would the black box choose which action to take? Well, the only "goal" of the box is to minimise the difference between its predictions and what actually happens in the future. Maybe turning on the light bulb allows the black box to make more accurate predictions of future sensor values, so it "decides" to keep the light on. If I hook up a speaker and microphone to the black box, maybe it will decide that having a conversation with me will also allow it to make more accurate predictions about the future (which is probably true).
The Magic Number:
Hopefully in the previous section I convinced you that if this one function was implemented correctly, it would result in something most people would consider true AI. So, why did I choose a matrix of floating point numbers? Why not simplify the API by just making it a single array instead? Why not a higher dimensional matrix? The answer is because I think that two (as in a two-dimensional matrix) is the magic number that makes a reasonable trade-off between competing factors. Using a higher dimensional space would make implementing the function infeasible due to computational complexity. Using a lower dimensional space loses critical information about the spatial relationships between sensors. As evidence of this information loss, imagine converting a two-dimensional image into a one-dimensional array of pixels. The image would become meaningless to you because you have lost the information of how the pixels are spatially arranged.
Let us consider the most intelligent system we know of today: the human brain. I think that the human brain is complying to the same API I described above. In fact, we only need to look at a part of the brain known as the neocortex. The neocortex is a thin sheet of neurons on the outer surface of the brain. It is responsible for essentially all higher-level thought and what makes humans intelligent. Since the neocortex is fundamentally a two-dimensional surface, it uses topographic maps to project higher-dimensional signals onto a two-dimensional space. For example, the three-dimensional touch sensors on your body are mapped onto a two-dimensional homunculus on your neocortex, where regions that are neighbouring in 3D space are also neighbouring on the homunculus (and regions which are more important are mapped to larger regions on the homunculus).
So, how close are we to being able to implement AI? I think so far the most successful efforts come from the field of data compression. A compression algorithm called PAQ8 does an amazing job of implementing "boolean perceive (boolean input)". However, it doesn't scale well to continuous numbers or higher-dimensional spaces. Another promising attempt at creating AI is coming from Numenta in the form of hierarchical temporal memory. So far their attempts have been unsuccessful, but I think at a higher level their approach to the problem is the best I have seen.