Ruby Learning by Reversing: Native Gems, Part 2

In Part 1 , we looked at the background to the gem that we are going to explore and saw how the gem works, and also how to change, rebuild and install the gem locally. Part 2 explains how the gem gets picked up and used by Ruby when you require it, and how Ruby knows what to do. In this part, we will jump across a number of files to figure out what is happening. We will not rebuild the gem or look at the Makefile, etc. in this post.

A quick note on using ‘Ruby’ and ‘Rubygems’

In this post, I use the term Ruby a lot as if Ruby does things and so on. In reality, the correct expression to use should be “Ruby Interpreter” or “Ruby runtime” or something similar. Likewise, I refer to Rubygems but I actually mean the gem called Rubygems that provides the support for gems to be installed, found and loaded. It’s a convenience that allows me to focus on other things. If you think it could be expressed better, let me know!

What happens when we require a gem?

When you require a gem, two main things happen:

  • The path to the gem is added to the $LOAD_PATH – this allows you to require other files from within the path of that gem
  • The file that you `require` gets loaded, i.e., Ruby executes the file

Let’s walk through this first. Let’s start irb and simply output the LOAD_PATH.

irb(main):001:0> puts $LOAD_PATH
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/timeout-0.3.0/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/strscan-3.0.3/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/strscan-3.0.3
C:/Ruby31-x64/lib/ruby/site_ruby/3.1.0
C:/Ruby31-x64/lib/ruby/site_ruby/3.1.0/x64-ucrt
C:/Ruby31-x64/lib/ruby/site_ruby
C:/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0
C:/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0/x64-ucrt
C:/Ruby31-x64/lib/ruby/vendor_ruby
C:/Ruby31-x64/lib/ruby/3.1.0
C:/Ruby31-x64/lib/ruby/3.1.0/x64-mingw-ucrt

Most of the paths are to generic places where you would find gems or other Ruby source files to include. Now, let’s require a gem that we have on our system. Since we have installed `fast_polylines` on our system, let’s require that. This gives us the below $LOAD_PATH.

irb(main):005:0> puts $LOAD_PATH
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/timeout-0.3.0/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/strscan-3.0.3/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/strscan-3.0.3
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/fast-polylines-2.2.2.1
C:/Ruby31-x64/lib/ruby/site_ruby/3.1.0
C:/Ruby31-x64/lib/ruby/site_ruby/3.1.0/x64-ucrt
C:/Ruby31-x64/lib/ruby/site_ruby
C:/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0
C:/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0/x64-ucrt
C:/Ruby31-x64/lib/ruby/vendor_ruby
C:/Ruby31-x64/lib/ruby/3.1.0
C:/Ruby31-x64/lib/ruby/3.1.0/x64-mingw-ucrt

You note that we added these two paths:

C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib
C:/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/fast-polylines-2.2.2.1

In reality, we only need the lib path to be added but Ruby adds both.

So, this is what happened:

  • We did a require fast_polylines which was not found on the $LOAD_PATH
  • Then, rubygems searched for that file in the installed gems
  • It is found in fast-polylines-2.2.2.1/lib
  • That path is added to the $LOAD_PATH and the file is loaded
  • Due to the native extension, that path is also added to the $LOAD_PATH

There is more information about how Rubygems alters the $LOAD_PATH when a gem is required in the Rubygems require documentation and in the code that explains how the require is implemented within Rubygems.

So, what is available after require?

Ruby maintains a list of every loaded file in $LOADED_FEATURES and if you print these, you will see this in the list:

...
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines/fast_polylines.so
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines/version.rb
C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines.rb
...

The interesting one is the lib/fast_polylines/fast_polylines.so – which is actually the native shared library that was also loaded by Ruby.

The documentation for the Kernel method require says this:

If the filename has the extension “.rb”, it is loaded as a source file; if the extension is “.so”, “.o”, or “.dll”, or the default shared library extension on the current platform, Ruby loads the shared library as a Ruby extension.

The last part applies to the native extension it loaded. When Ruby loads it as an extension , it does one more thing: it calls a function called Init_LIBRARY from that extension. In our case, the extension is called fast_polylines and so it will try to call a method called Init_fast_polylines from the compiled C source code.

If you look at the C code in the ext/fast_polylines directory, you will find this at the end of the file.

void Init_fast_polylines() {
	VALUE mFastPolylines = rb_define_module("FastPolylines");
	rb_define_module_function(mFastPolylines, "decode", rb_FastPolylines__decode, -1);
	rb_define_module_function(mFastPolylines, "encode", rb_FastPolylines__encode, -1);
}

The name matches and will be called by the Ruby Interpreter when it loads the extension. This method actually defines a module called FastPolylines and creates two methods under that module, decode and encode, and makes them available to the Ruby code. Once the native extension is loaded, the FastPolylines module with the methods encode and decode are available to Ruby.

Summarising what we learnt

It might not look like a lot, but what we saw in this post so far is as below:

  1. Once the gem is installed, we need some files to be in the correct place
    • The file we want to include from our code should be in lib within the gem folder. It can be in a sub-folder (e.g., lib/a but then we must do a require "a/file.rb" instead)
    • The compiled native extension should be in: lib/gem_name/gem_name.so
  2. Doing a require involves Rubygems to change the load path and to load the main file
  3. The first file that is required should load the native shared library
  4. The native library needs to have a Init_LIBRARY function that sets up the bridge between Ruby and the C code
  5. Ruby will call that function when the extension file is loaded

This is shown in the picture below. For simplicity, some bits (like requiring the version.rb file) have been left out. In the picture below, I use:

  • Script – the actual running script that uses the native gem; it might even by irb
  • Ruby – as described earlier, this refers to the Ruby runtime/ interpreter and includes everything other than the Script, Rubygems and the native gem
  • Rubygems – this refers to the Rubygems code
  • Gem refers to the specific native extension gem but is broken up into two parts:
    • Gem_Code: the main Ruby file that gets required by the Script
    • Gem_Extension: the shared library file that provides the native extension

The picture below shows the difference to the $LOAD_PATH between the items marked as (1) and (2) in the sequence diagram.

At the point that is marked as (3), the extension is loaded correctly and gets added to the $LOADED_FEATURES:

...
"C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines/fast_polylines.so"
...

Finally, just before the point that is marked as (4), the gem code is loaded correctly and gets added to the $LOADED_FEATURES:

...
 "C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines/fast_polylines.so",
 "C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/fast-polylines-2.2.2.1/lib/fast_polylines.rb",
 ...

In reality, a lot of this is detail you will not need to worry about – Ruby and Rubygems will take care of it when the native gem is created, installed or required. However, understanding it at one level deeper might help you debug if you run into unexpected problems.

Looking ahead

With that, we come to the end of Part 2 where we looked at how the gem code and the native extension is loaded. In the next post, we will look at more specific details of our code.

I will add links and references later, possibly in the last post of the series. If you have any comments, please feel free to leave them below.

comments powered by Disqus