Saturday, October 20, 2012
laservex
I have started working on a new project: www.laservex.com. This will be a remake of a Java game I made a few years ago: http://byronknoll.com/lasers.html. This time I will implement it in HTML5 and support user-generated levels.
Wednesday, October 10, 2012
The Top-k Element Problem
I have been working on the problem of efficiently finding the most common items in a large stream of data. This problem shows up in many applications - I am working on it for a data compression project. One example application: given a stream of user queries to a search engine, find the top k queries which occur most often. Some Google employees published a fascinating algorithm for this problem called count sketch.
After reading through papers on the problem, I decided to use an algorithm called space-saving. I implemented it in C++ and have been testing it on the following task:
Problem: find the k most common substrings of length N in the Wikipedia database.
I haven't fully optimized my program yet, so I estimate it would currently take 4-5 days to process the 38GiB Wikipedia file I downloaded. I have some preliminary results on a subset of Wikipedia though - here is the output for k=1000 and N=1...30:
Since the time complexity of this algorithm is completely independent of k, I actually chose k=1 million and then took the top 1000 from the million buckets. The heuristic I used for choosing k was to use the largest value possible such that the memory used by the program did not exceed the available RAM on my computer.
After reading through papers on the problem, I decided to use an algorithm called space-saving. I implemented it in C++ and have been testing it on the following task:
Problem: find the k most common substrings of length N in the Wikipedia database.
I haven't fully optimized my program yet, so I estimate it would currently take 4-5 days to process the 38GiB Wikipedia file I downloaded. I have some preliminary results on a subset of Wikipedia though - here is the output for k=1000 and N=1...30:
- N=1
- N=2
- N=3
- N=4
- N=5
- N=6
- N=7
- N=8
- N=9
- N=10
- N=11
- N=12
- N=13
- N=14
- N=15
- N=16
- N=17
- N=18
- N=19
- N=20
- N=21
- N=22
- N=23
- N=24
- N=25
- N=26
- N=27
- N=28
- N=29
- N=30
Since the time complexity of this algorithm is completely independent of k, I actually chose k=1 million and then took the top 1000 from the million buckets. The heuristic I used for choosing k was to use the largest value possible such that the memory used by the program did not exceed the available RAM on my computer.
Tuesday, October 02, 2012
Five Player Go First Dice
In a previous post I proposed a challenge to come up with a set of go first dice for five players. I thought my prize would be safe since this seems like a difficult unsolved problem. Paul Vanetti has successfully won the challenge with the following solution:
Conjecture: Given any valid set of N go first dice with number of faces x1, x2, ..., xN, it is always possible to generate a new set of go first dice with exactly LCM(x1, x2, ..., xN) faces per die.
This can be done using the following algorithm:
Suppose you have go first dice with the following labels:
- 0, 1, 2, 3, 4, 5, 6, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 259, 260, 261, 262, 263, 264, 265
- 7, 20, 21, 22, 35, 36, 37, 50, 83, 96, 97, 98, 111, 112, 113, 126, 139, 152, 153, 154, 167, 168, 169, 182, 215, 228, 229, 230, 243, 244, 245, 258
- 8, 12, 13, 14, 15, 19, 23, 27, 28, 29, 30, 34, 38, 42, 43, 44, 45, 49, 84, 88, 89, 90, 91, 95, 99, 103, 104, 105, 106, 110, 114, 118, 119, 120, 121, 125, 140, 144, 145, 146, 147, 151, 155, 159, 160, 161, 162, 166, 170, 174, 175, 176, 177, 181, 216, 220, 221, 222, 223, 227, 231, 235, 236, 237, 238, 242, 246, 250, 251, 252, 253, 257
- 9, 11, 16, 18, 24, 26, 31, 33, 39, 41, 46, 48, 85, 87, 92, 94, 100, 102, 107, 109, 115, 117, 122, 124, 141, 143, 148, 150, 156, 158, 163, 165, 171, 173, 178, 180, 217, 219, 224, 226, 232, 234, 239, 241, 247, 249, 254, 256
- 10, 17, 25, 32, 40, 47, 86, 93, 101, 108, 116, 123, 142, 149, 157, 164, 172, 179, 218, 225, 233, 240, 248, 255
- 90
- 32
- 72
- 48
- 24
Conjecture: Given any valid set of N go first dice with number of faces x1, x2, ..., xN, it is always possible to generate a new set of go first dice with exactly LCM(x1, x2, ..., xN) faces per die.
This can be done using the following algorithm:
Suppose you have go first dice with the following labels:
- a1, a2, ..., aQ
- b1, b2, ..., bR
- ...
- y1, y2, ..., yZ
- m*a1, m*a1 + 1, ..., m*a1 + (m/Q) - 1, m*a2, m*a2 + 1, ..., m*a2 + (m/Q) - 1, m*aQ, m*aQ + 1, ..., m*aQ + (m/Q) - 1
- m*b1, m*b1 + 1, ..., m*b1 + (m/R) - 1, m*b2, m*b2 + 1, ..., m*b2 + (m/R) - 1, m*bR, m*bR + 1, ..., m*bR + (m/R) - 1
- ...
- m*y1, m*y1 + 1, ..., m*y1 + (m/Z) - 1, m*y2, m*y2 + 1, ..., m*y2 + (m/Z) - 1, m*yZ, m*yZ + 1, ..., m*yZ + (m/Z) - 1
- 1,2,4
- 3,5
- 6*1, 6*1 + 1, 6*2, 6*2 + 1, 6*4, 6*4 + 1
- 6*3, 6*3 + 1, 6*3 + 2, 6*5, 6*5 + 1, 6*5 + 2
- 6, 7, 12, 13, 24, 25
- 18, 19, 20, 30, 31, 32
- 1, 2, 3, 4, 8, 9
- 5, 6, 7, 10, 11, 12