Code Tuning, a programming pearl in Ruby

After a few weeks at CERN of not much blogging, I have been mostly flat hunting at one of the most expensive cities in the world. Of course the real estate market is accordingly crazy. This means I spend a big chunk of my spare time on the trolley, which is okay because I borrowed one of the best programming books ever at CERN’s library, Programming Pearls. I like to read whichever column I feel like reading instead of reading it from cover to cover, and I stumbled upon “Code Tuning” today.

Being the practical guy I am, I thought that would make a good reading choice on my way back to help me focus on details. And hell it did. I would like to go over a very well known algorithm, Sequential Search, and introduce some cool twists that Jon Bentley shows on the book.

Given an array, a sequential search simply looks for a certain element of the array iterating over it. Easy right? A simple sequential search in ruby would be something like:

Can we do better? I thought not, but there are ways to improve this dramatically. Let’s benchmark for 1M, 10M and 50M elements.

It seems that it takes around ~103 (ns) * n on the worst case, I calculated this by dividing the benchmark for 1M by the number of elements, I ran all benchmarks for the last item of the array. Fairly normal, this will be the baseline for further improvements.

How can we slim down this loop a little bit? Easy, stop performing more tests than necessary. On an ideal version of a sequential search you should only test for the elements to be equal to the desired element. In reality every time this block executes itself, array.each is checking that we haven’t gotten to the end of the array.

Benchmarks for 1M, 10M and 50M.

Bentley dropped the run for a speedup of ~5%. It looks like tests are more computationally expensive in Ruby so now it runs in ~45 (ns) * n. This is a crazy good improvement of about 55% and still it looks pretty readable.

It looks like our bottom line (tests) cannot be improved, as you will always have to check against every element in the array until you get to it. Now what. Incrementing the index in every loop is an addition that shouldn’t cost much, but we can do better. Loop_unwinding/unrolling as I prefer to call it is the answer. I am not very positive that Ruby’s 1.9.2 default interpreter unrolls loops. And we’re going to find it right away, because if it doesn’t the reduction will be very sharp.

Benchmarks for 1M, 10M and 50M.

It went from ~45 (ns) * n to ~41 (ns) * n.  A slight improvement of 9%. I’m a bit disappointed after checking the book and seeing that reducing the overhead of incrementing the iterator led to a reduction of 56%. What puzzles me even more is that his argument is that loop unrolling can help to avoid pipeline stalls, reduce branches, and increase instruction-level parallelism.

Why is this clearly happening on Bentley’s benchmarks in C and if it’s happening, it doesn’t yield the same results for my Ruby implementation? No idea, so instead of keeping this private I’m going to share it so you guys can wander about it. I’ll just leave a quote of Don Knuth:

Premature optimization is the root of much programming evil; it can compromise the correctness, functionality and maintainability of programs. Save concern for efficiency for when it matters.