## Saturday, October 20, 2012

### laservex

I have started working on a new project: www.laservex.com. This will be a remake of a Java game I made a few years ago: http://byronknoll.com/lasers.html. This time I will implement it in HTML5 and support user-generated levels.

## Wednesday, October 10, 2012

### The Top-k Element Problem

I have been working on the problem of efficiently finding the most common items in a large stream of data. This problem shows up in many applications - I am working on it for a data compression project. One example application: given a stream of user queries to a search engine, find the top

After reading through papers on the problem, I decided to use an algorithm called space-saving. I implemented it in C++ and have been testing it on the following task:

I haven't fully optimized my program yet, so I estimate it would currently take 4-5 days to process the 38GiB Wikipedia file I downloaded. I have some preliminary results on a subset of Wikipedia though - here is the output for

Since the time complexity of this algorithm is completely independent of

*k*queries which occur most often. Some Google employees published a fascinating algorithm for this problem called count sketch.After reading through papers on the problem, I decided to use an algorithm called space-saving. I implemented it in C++ and have been testing it on the following task:

**Problem**: find the*k*most common substrings of length*N*in the Wikipedia database.I haven't fully optimized my program yet, so I estimate it would currently take 4-5 days to process the 38GiB Wikipedia file I downloaded. I have some preliminary results on a subset of Wikipedia though - here is the output for

*k*=1000 and*N*=1...30:- N=1
- N=2
- N=3
- N=4
- N=5
- N=6
- N=7
- N=8
- N=9
- N=10
- N=11
- N=12
- N=13
- N=14
- N=15
- N=16
- N=17
- N=18
- N=19
- N=20
- N=21
- N=22
- N=23
- N=24
- N=25
- N=26
- N=27
- N=28
- N=29
- N=30

Since the time complexity of this algorithm is completely independent of

*k*, I actually chose*k*=1 million and then took the top 1000 from the million buckets. The heuristic I used for choosing*k*was to use the largest value possible such that the memory used by the program did not exceed the available RAM on my computer.## Tuesday, October 02, 2012

### Five Player Go First Dice

In a previous post I proposed a challenge to come up with a set of go first dice for five players. I thought my prize would be safe since this seems like a difficult unsolved problem. Paul Vanetti has successfully won the challenge with the following solution:

This can be done using the following algorithm:

Suppose you have go first dice with the following labels:

- 0, 1, 2, 3, 4, 5, 6, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 259, 260, 261, 262, 263, 264, 265
- 7, 20, 21, 22, 35, 36, 37, 50, 83, 96, 97, 98, 111, 112, 113, 126, 139, 152, 153, 154, 167, 168, 169, 182, 215, 228, 229, 230, 243, 244, 245, 258
- 8, 12, 13, 14, 15, 19, 23, 27, 28, 29, 30, 34, 38, 42, 43, 44, 45, 49, 84, 88, 89, 90, 91, 95, 99, 103, 104, 105, 106, 110, 114, 118, 119, 120, 121, 125, 140, 144, 145, 146, 147, 151, 155, 159, 160, 161, 162, 166, 170, 174, 175, 176, 177, 181, 216, 220, 221, 222, 223, 227, 231, 235, 236, 237, 238, 242, 246, 250, 251, 252, 253, 257
- 9, 11, 16, 18, 24, 26, 31, 33, 39, 41, 46, 48, 85, 87, 92, 94, 100, 102, 107, 109, 115, 117, 122, 124, 141, 143, 148, 150, 156, 158, 163, 165, 171, 173, 178, 180, 217, 219, 224, 226, 232, 234, 239, 241, 247, 249, 254, 256
- 10, 17, 25, 32, 40, 47, 86, 93, 101, 108, 116, 123, 142, 149, 157, 164, 172, 179, 218, 225, 233, 240, 248, 255

- 90
- 32
- 72
- 48
- 24

**Conjecture**: Given any valid set of N go first dice with number of faces x1, x2, ..., xN, it is always possible to generate a new set of go first dice with exactly LCM(x1, x2, ..., xN) faces per die.This can be done using the following algorithm:

Suppose you have go first dice with the following labels:

- a1, a2, ..., aQ
- b1, b2, ..., bR
- ...
- y1, y2, ..., yZ

- m*a1, m*a1 + 1, ..., m*a1 + (m/Q) - 1, m*a2, m*a2 + 1, ..., m*a2 + (m/Q) - 1, m*aQ, m*aQ + 1, ..., m*aQ + (m/Q) - 1
- m*b1, m*b1 + 1, ..., m*b1 + (m/R) - 1, m*b2, m*b2 + 1, ..., m*b2 + (m/R) - 1, m*bR, m*bR + 1, ..., m*bR + (m/R) - 1
- ...
- m*y1, m*y1 + 1, ..., m*y1 + (m/Z) - 1, m*y2, m*y2 + 1, ..., m*y2 + (m/Z) - 1, m*yZ, m*yZ + 1, ..., m*yZ + (m/Z) - 1

- 1,2,4
- 3,5

- 6*1, 6*1 + 1, 6*2, 6*2 + 1, 6*4, 6*4 + 1
- 6*3, 6*3 + 1, 6*3 + 2, 6*5, 6*5 + 1, 6*5 + 2

- 6, 7, 12, 13, 24, 25
- 18, 19, 20, 30, 31, 32

- 1, 2, 3, 4, 8, 9
- 5, 6, 7, 10, 11, 12

Subscribe to:
Posts (Atom)