The first series of Learning by Reversing examines a Ruby native gem to understand how it works. Part 6 examines how the benchmark is run to compare the performance of the native version with the original Ruby version.
Previously, in this series, we have had:
- Part 1 – Background to the gem we are looking at (including installing, changing and rebuilding it)
- Part 2 – How a native gem is loaded
- Part 3 – How the files get packaged so that the native extension is built during gem installation
- Part 4 – Understanding the Development Makefile
- Part 5 – The Ruby C API/ Interface
If you implement some functionality in a native C gem for improving the performance, you will normally want to include a benchmark to show the benefit. The fast_polylines
gem does this – we look at how it is done. The ideas here are not specific to native gems and would work for anything that you want to benchmark.
Background
We have two interfaces that were implemented in native C code:
- encode
- decode
If you were implementing a performance comparison by yourself, this is what you would do in a Ruby script:
- Require any specific gems that you need
- Run the original code a number of times with some test input to get the execution time of the original code
- Run the optimised code a number of times with the same test input to get the execution time of the optimised code
- Calculate the relative speedup (or slow down!)
- Output the results
- Repeat steps 2 – 5 for every method that you want to benchmark
The code for running the benchmark can be found in perf/benchmark.rb
and does almost exactly this.
In our case, the optimised code comes from fast_polylines
and the original code comes from the polylines
gem. Our comparison is:
Item | Original | Optimised |
---|---|---|
gem | polylines | fast_polylines |
Module | Polylines | FastPolylines |
Encode method | Polylines::Encoder.encode_points(POINTS) | FastPolylines.encode(POINTS) |
Decode method | Polylines::Decoder.decode_polyline(POLYLINE) | FastPolylines.decode(POLYLINE) |
So, basically, we want to do this and compare:
The Actual Benchmark
Our secret weapon (OK, it’s not that secret) is the support that we can easily use for running benchmarks. We use benchmark
and bnchmark/ips
to get support for making it easier to run benchmarks. For this, we require benchmark
and benchmark/ips
at the top of the script.
Benchmark.ips
provides the support to do the mundane things of running a comparitive benchmark. The general format for it is shown below:
We add each item on which we want a report to the comparison by doing x.report
as you see above. We pass it the name that will be shown in the report, and the method that needs to be benchmarked. Finally, we call x.compare!
to finish up, calculate the comparison and show the report.
Before the code above, we store the points into POINTS
and use that as the common input to both the methods. We ask it to run two comparisons (called “Polylines” and “FastPolylines”) using the respective methods to encode the points. Then, we finally ask it to compare the performance.
When we run this script, we would see the below output.
Warming up --------------------------------------
Polylines 102.000 i/100ms
FastPolylines 28.757k i/100ms
Calculating -------------------------------------
Polylines 763.874 (±11.8%) i/s - 3.876k in 5.150165s
FastPolylines 174.477k (±11.2%) i/s - 891.467k in 5.177265s
Comparison:
FastPolylines: 174477.4 i/s
Polylines: 763.9 i/s - 228.41x (± 0.00) slower
We can now see what using Benchmark.ips
did for us:
- It ran the code a few times to warm up the execution (which might help JIT compilers, etc.)
- Then, it does the actual calculation. The output shows us how many times the method ran in what duration (e.g., 3.876k times in 5.15s).
- Finally, it outputs the comparison. This is reported in terms of
i/s
– iterations/ second, i.e., how many times a second the method was run. it also shows how much slower the slower version is. In this case, we see thatPolylines
was 228.41x slower.
You might have noticed that we called it in the sequence of Polylines, then FastPolylines – this is shown in the ‘Warming up’ and ‘Calculating’ sections of the output, but the ‘Comparison’ is shown from fastest to slowest – that’s why FastPolylines is shown first.
The code for running the benchmark for the decode method is almost identical also.
That allows us to run the benchmark when we need it. You can read more about benchmark/ips on the benchmark-ips GitHub page
One Extra Method
There is one more thing that the code does which is simply aesthetic. It has this function to separate the output sections to make the separation a bit more visible and a bit prettier.
This method is called just before the benchmark for that section is run and output.
Linking it up in Development
Ideally, in development, we want to be able to run the benchmark easily and rebuilding our code if something has changed. If this sounds like something we could use make for, you would be right. The gem does exactly this and we discussed it in Part 4 – Understanding the Development Makefile in the section about Running the benchmark.
Looking ahead
That brings us to the end of Part 6. We have only a little bit more left to explain and then we can see how to make some more changes to the gem. If you have any comments, please feel free to leave them below.