Ruby Learning by Reversing: Native Gems, Part 6

The first series of Learning by Reversing examines a Ruby native gem to understand how it works. Part 6 examines how the benchmark is run to compare the performance of the native version with the original Ruby version.

Previously, in this series, we have had:

If you implement some functionality in a native C gem for improving the performance, you will normally want to include a benchmark to show the benefit. The fast_polylines gem does this – we look at how it is done. The ideas here are not specific to native gems and would work for anything that you want to benchmark.

Background

We have two interfaces that were implemented in native C code:

  • encode
  • decode

If you were implementing a performance comparison by yourself, this is what you would do in a Ruby script:

  1. Require any specific gems that you need
  2. Run the original code a number of times with some test input to get the execution time of the original code
  3. Run the optimised code a number of times with the same test input to get the execution time of the optimised code
  4. Calculate the relative speedup (or slow down!)
  5. Output the results
  6. Repeat steps 2 – 5 for every method that you want to benchmark

The code for running the benchmark can be found in perf/benchmark.rb and does almost exactly this.

In our case, the optimised code comes from fast_polylines and the original code comes from the polylines gem. Our comparison is:

Item Original Optimised
gem polylines fast_polylines
Module Polylines FastPolylines
Encode method Polylines::Encoder.encode_points(POINTS) FastPolylines.encode(POINTS)
Decode method Polylines::Decoder.decode_polyline(POLYLINE) FastPolylines.decode(POLYLINE)

So, basically, we want to do this and compare:

The Actual Benchmark

Our secret weapon (OK, it’s not that secret) is the support that we can easily use for running benchmarks. We use benchmark and bnchmark/ips to get support for making it easier to run benchmarks. For this, we require benchmark and benchmark/ips at the top of the script.

Benchmark.ips provides the support to do the mundane things of running a comparitive benchmark. The general format for it is shown below:

Benchmark.ips do |x|
  x.report("Polylines") { Polylines::Encoder.encode_points(POINTS) }
  x.report("FastPolylines") { FastPolylines.encode(POINTS) }

  x.compare!
end

We add each item on which we want a report to the comparison by doing x.report as you see above. We pass it the name that will be shown in the report, and the method that needs to be benchmarked. Finally, we call x.compare! to finish up, calculate the comparison and show the report.

Before the code above, we store the points into POINTS and use that as the common input to both the methods. We ask it to run two comparisons (called “Polylines” and “FastPolylines”) using the respective methods to encode the points. Then, we finally ask it to compare the performance.

When we run this script, we would see the below output.

Warming up --------------------------------------
           Polylines   102.000  i/100ms
       FastPolylines    28.757k i/100ms
Calculating -------------------------------------
           Polylines    763.874  (±11.8%) i/s -      3.876k in   5.150165s
       FastPolylines    174.477k (±11.2%) i/s -    891.467k in   5.177265s

Comparison:
       FastPolylines:   174477.4 i/s
           Polylines:      763.9 i/s - 228.41x  (± 0.00) slower

We can now see what using Benchmark.ips did for us:

  • It ran the code a few times to warm up the execution (which might help JIT compilers, etc.)
  • Then, it does the actual calculation. The output shows us how many times the method ran in what duration (e.g., 3.876k times in 5.15s).
  • Finally, it outputs the comparison. This is reported in terms of i/s – iterations/ second, i.e., how many times a second the method was run. it also shows how much slower the slower version is. In this case, we see that Polylines was 228.41x slower.

You might have noticed that we called it in the sequence of Polylines, then FastPolylines – this is shown in the ‘Warming up’ and ‘Calculating’ sections of the output, but the ‘Comparison’ is shown from fastest to slowest – that’s why FastPolylines is shown first.

The code for running the benchmark for the decode method is almost identical also.

Benchmark.ips do |x|
  x.report("Polylines") { Polylines::Decoder.decode_polyline(POLYLINE) }
  x.report("FastPolylines") { FastPolylines.decode(POLYLINE) }

  x.compare!
end

That allows us to run the benchmark when we need it. You can read more about benchmark/ips on the benchmark-ips GitHub page

One Extra Method

There is one more thing that the code does which is simply aesthetic. It has this function to separate the output sections to make the separation a bit more visible and a bit prettier.

# Puts a Centered, bold and inverted text for better visibility.
def shout_out(string)
  puts "\n\e[7;1m#{string.center(75)}\e[27;0m\n\n"
end

This method is called just before the benchmark for that section is run and output.

Linking it up in Development

Ideally, in development, we want to be able to run the benchmark easily and rebuilding our code if something has changed. If this sounds like something we could use make for, you would be right. The gem does exactly this and we discussed it in Part 4 – Understanding the Development Makefile in the section about Running the benchmark.

Looking ahead

That brings us to the end of Part 6. We have only a little bit more left to explain and then we can see how to make some more changes to the gem. If you have any comments, please feel free to leave them below.

comments powered by Disqus