Saturday, January 25, 2014
Candle Controller
My prototype digital candle looked really good. The prototype used a separate power board wired to the Teensy3.0. That was cost effective but cost a lot of labor and delicate soldering to complete. This new board swallows the Teensy and incorporates the power supply. Better still it has a slot to swallow an XBee. That means this board will be able to communicate with the outside world. I'm not sure what I'm going to do with that just yet. I'll think of something.
Wednesday, January 15, 2014
Benchmarks
I got Cznic's code to work with very little trouble. I spent much more time thinking about how to make benchmark results comparable. The issues I considered are:
- The trees must be of the same order (I used 128)
- The trees must use the same key type (I used uint64)
- The trees must be pre-filled with a large number of elements
Let me take a moment to explain #3. The benchmarking engine in Go works by calling your code with an iteration count. It times your code and then divides the time by the count to determine the number of nanoseconds per iteration. It starts with a small iteration count and increases the value until the sample is statistically valid. This has a poor property for comparing b-trees: The more elements in the tree the slower most operations will proceed. I can't control the iteration counts, instead I pre-filled the trees with 4 million elements. Why 4 million elements? Because inserting another million elements is not likely to change depth of the tree.
The results of the benchmark:
BenchmarkRandomPut 200000 11836 ns/op
BenchmarkCznicRandomPut 1000000 1556 ns/op
BenchmarkRandomGet 5000000 652 ns/op
BenchmarkCznicRandomGet 2000000 797 ns/op
BenchmarkRandomDelete 200000 12581 ns/op
BenchmarkCznicRandomDelete 1000000 1228 ns/op
BenchmarkRandomPut 200000 11836 ns/op
BenchmarkCznicRandomPut 1000000 1556 ns/op
BenchmarkRandomGet 5000000 652 ns/op
BenchmarkCznicRandomGet 2000000 797 ns/op
BenchmarkRandomDelete 200000 12581 ns/op
BenchmarkCznicRandomDelete 1000000 1228 ns/op
There's clearly a severe penalty for the Load() and Store() operations making my implementation an order of magnitude slower for operations that alter the tree. The good news is that my fetch implementation is somewhat faster. Since the expected mix of operations on a b-tree heavily favors lookups I believe that this offsets my poor update performance somewhat.
How do I know that Load() and Store() eat up a lot of time? Go has a built-in profiling tool. Here's how it works. First you add some profiling code to a test harness. This is what I did:
file, _ := os.Create("insertions.out")
pprof.StartCPUProfile(file)
for i:=0; i<5000000; i++ {
tree.Put(uint64(src.Int63()), i)
}
pprof.StopCPUProfile()
You can then open your profile result (insertions.out) like this:
$ go tool pprof bin/main insertions.out
Welcome to pprof! For help, type 'help'.
(pprof)
In this case "bin/main" is the name of my executable. Much more information on how to use Go's pprof can be found here. The first command you're likely to use is the "top" command which shows you your top 10 CPU users.
(pprof) top
Total: 5291 samples
1289 24.4% 24.4% 2563 48.4% github.com/mike-matera/Go/btree.(*SimpleNode).Store
1069 20.2% 44.6% 1069 20.2% github.com/mike-matera/Go/btree.(*SimpleNode).Load
272 5.1% 49.7% 285 5.4% github.com/mike-matera/Go/btree.(*SimpleNode).Find
240 4.5% 54.2% 809 15.3% runtime.assertE2T
233 4.4% 58.6% 233 4.4% runtime.memcopy64
209 4.0% 62.6% 584 11.0% assertE2Tret
200 3.8% 66.4% 433 8.2% copyout
196 3.7% 70.1% 344 6.5% sweepspan
195 3.7% 73.8% 195 3.7% flushptrbuf
141 2.7% 76.4% 344 6.5% scanblock
You can see that the top CPU hogs were Store() and Load(). The right most percentage is the percent of samples where those functions were on the call stack. A whopping 68.6% of the time. The 'web' command creates a nice SVG image of the result. Below are the results of the loop above and a similar one that fetches values.
Insertions Profile |
Fetches Profile |
Subscribe to:
Posts (Atom)