Creative Compulsive: Benchmarks

I got Cznic's code to work with very little trouble. I spent much more time thinking about how to make benchmark results comparable. The issues I considered are:

The trees must be of the same order (I used 128)
The trees must use the same key type (I used uint64)
The trees must be pre-filled with a large number of elements

Let me take a moment to explain #3. The benchmarking engine in Go works by calling your code with an iteration count. It times your code and then divides the time by the count to determine the number of nanoseconds per iteration. It starts with a small iteration count and increases the value until the sample is statistically valid. This has a poor property for comparing b-trees: The more elements in the tree the slower most operations will proceed. I can't control the iteration counts, instead I pre-filled the trees with 4 million elements. Why 4 million elements? Because inserting another million elements is not likely to change depth of the tree.

The results of the benchmark:

BenchmarkRandomPut 200000    11836 ns/op
BenchmarkCznicRandomPut 1000000 1556 ns/op
BenchmarkRandomGet 5000000    652 ns/op
BenchmarkCznicRandomGet 2000000    797 ns/op
BenchmarkRandomDelete 200000 12581 ns/op
BenchmarkCznicRandomDelete 1000000 1228 ns/op

There's clearly a severe penalty for the Load() and Store() operations making my implementation an order of magnitude slower for operations that alter the tree. The good news is that my fetch implementation is somewhat faster. Since the expected mix of operations on a b-tree heavily favors lookups I believe that this offsets my poor update performance somewhat.

How do I know that Load() and Store() eat up a lot of time? Go has a built-in profiling tool. Here's how it works. First you add some profiling code to a test harness. This is what I did:

file, _ := os.Create("insertions.out")

pprof.StartCPUProfile(file)

for i:=0; i<5000000; i++ {

tree.Put(uint64(src.Int63()), i)

}

pprof.StopCPUProfile()

You can then open your profile result (insertions.out) like this:

$ go tool pprof bin/main insertions.out

Welcome to pprof! For help, type 'help'.

(pprof)

In this case "bin/main" is the name of my executable. Much more information on how to use Go's pprof can be found here. The first command you're likely to use is the "top" command which shows you your top 10 CPU users.

(pprof) top

Total: 5291 samples

1289 24.4% 24.4% 2563 48.4% github.com/mike-matera/Go/btree.(*SimpleNode).Store

1069 20.2% 44.6% 1069 20.2% github.com/mike-matera/Go/btree.(*SimpleNode).Load

272 5.1% 49.7% 285 5.4% github.com/mike-matera/Go/btree.(*SimpleNode).Find

240 4.5% 54.2% 809 15.3% runtime.assertE2T

233 4.4% 58.6% 233 4.4% runtime.memcopy64

209 4.0% 62.6% 584 11.0% assertE2Tret

200 3.8% 66.4% 433 8.2% copyout

196 3.7% 70.1% 344 6.5% sweepspan

195 3.7% 73.8% 195 3.7% flushptrbuf

141 2.7% 76.4% 344 6.5% scanblock

You can see that the top CPU hogs were Store() and Load(). The right most percentage is the percent of samples where those functions were on the call stack. A whopping 68.6% of the time. The 'web' command creates a nice SVG image of the result. Below are the results of the loop above and a similar one that fetches values.

Insertions Profile

Fetches Profile

Creative Compulsive

Wednesday, January 15, 2014

Benchmarks

No comments:

Post a Comment

About Me