Using DuckDB on Windows and with Ruby

DuckDB is a fast, open-source database for Online Analytics Processing (OLAP) workloads. It is similar to SQLite3 in the sense that it’s intended to be built directly into a host process and runs within it rather than a separate stand-alone server. Let’s look at setting this up and using it in Windows and also from Ruby.

Windows Command Line

We are going to follow two separate paths:

  • Windows, using the command line client
  • Ruby, using a Ruby gem to access and do many of the same things as the command line client.

Installation on Windows

For working with duckdb directly in Windows, we need to get the command line package since there is no server to install and use it. You can get it from the download page on the DuckDB site, or directly from the releases page on GitHub.

Once downloaded, extract duckdb.exe and put the file in a folder on your path. I use a folder called D:\Apps for these kinds of things and this is already on my PATH in Windows. This makes it immediately available to all my Windows command shells.

Using the Command Line Client

We start the command line client simply by running duckdb from the command line.

Before we get started, let’s do some basic checks on our setup by going to a command line and running the commands below. So, we see that we have version 1.1.3 on our PATH now.

$ where duckdb
D:\apps\duckdb.exe

$ duckdb -version
v1.1.3 19864453f7

For our first (and in this post, only) activity, we will work with CSV files. I have a folder that has 9 CSV files with roughly 1 million records in each file. We will start duckdb in that folder and run a query there to see how it works. Each prompt/ line in the command line client starts with "D" and we ran a query across the CSV files directly without loading anything into memory.

D select filename, COUNT(*) AS LINES FROM read_csv('out_*.csv', filename = true) GROUP by filename;
┌───────────────────────┬─────────┐
│       filename        │  LINES  │
│        varchar        │  int64  │
├───────────────────────┼─────────┤
│ out_extracted_h16.csv │ 1237490 │
│ out_extracted_h21.csv │ 1348620 │
│ out_extracted_h23.csv │ 1222750 │
│ out_extracted_h18.csv │ 1370403 │
│ out_extracted_h19.csv │ 1252072 │
│ out_extracted_h20.csv │ 1020644 │
│ out_extracted_h15.csv │ 1055910 │
│ out_extracted_h17.csv │ 1324663 │
│ out_extracted_h22.csv │ 1333127 │
└───────────────────────┴─────────┘

Let’s look at the query to understand what it does.

select filename, COUNT(*) AS LINES
FROM read_csv('out_*.csv', filename = true)
GROUP by filename;

We’ll start by looking at the middle line first since that’s the bit that is different from a regular SQL query that you might see. Instead of a table name, we have read_csv('out_*.csv', filename = true) as the source of data for our query. This does a few things:

  1. It reads one or more CSV files treating as column as a table column.
  2. The CSV files are specified using wildcards – so, it picks up any file that starts with out_ and ends with .csv as the file name.
  3. The parameter filename = true asks DuckDB to add a column to the new table that stores the name of the file.

The rest of the query simply does a count of number of records in each file and outputs the result to us, grouped by filename.

For approximately 11 million records in 9 different CSV files, it took around 1.8 seconds to run this, which feels fast for casual exploration. Note also that it worked directly on the CSV files as a data source and did not import anything into a separate file.

Ruby on Windows

When we use DuckDB from C or Rust, we would write a program that builds DuckDB into it and thena accesses it via the methods that you can call. The program hosts DuckDB with the same process. For accessing it from Ruby, we use bindings that connect from the Ruby program (running in the Ruby interpreter) to the DLL that has a pre-compiled version of DuckDB. We will use the precompiled DLL from DuckDB along with the ruby-duckdb gem. The GitHub page includes instructions that we can follow.

Installation on Ruby

  1. Download the following from: https://github.com/duckdb/duckdb/releases
    • libduckdb-windows-amd64.zip: we need files from here for the Ruby gem
  2. Extract the file to a folder that you will use to install and build the gem (in my case, it’s d:\apps\duckdb) from
  3. Copy duckdb.dll into C:\Windows\System32 – I initially tried to put this to a folder on the PATH but that is not enough on Windows due to how DLL loading works on Windows. So, it must be in C:\Windows\System32 for now.

We are using Ruby 3.4 for Windows x64. You probably do have Ruby installed, but if needed, you can refer to this page for installing Ruby on Windows.

$ ruby -v
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x64-mingw-ucrt]

I tried to install the gem but that failed.

$ gem install duckdb
Fetching duckdb-1.1.3.1.gem
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR:  Error installing duckdb:
        ERROR: Failed to build gem native extension.

    current directory: D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/gems/duckdb-1.1.3.1/ext/duckdb
D:/stack/ruby/Ruby34-x64/bin/ruby.exe extconf.rb
checking for duckdb.h in /opt/homebrew/include,/opt/homebrew/opt/duckdb/include,/opt/local/include... no
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include=${opt-dir}/include
        --without-opt-include
        --with-opt-lib=${opt-dir}/lib
        --without-opt-lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=D:/stack/ruby/Ruby34-x64/bin/$(RUBY_BASE_NAME)
        --with-duckdb-dir
        --without-duckdb-dir
        --with-duckdb-include=${duckdb-dir}/include
        --without-duckdb-include
        --with-duckdb-lib=${duckdb-dir}/lib
        --without-duckdb-lib
extconf.rb:18:in 'Object#check_duckdb_header': duckdb.h is not found. Install duckdb.h of duckdb >= 1.0.0. (RuntimeError)
        from extconf.rb:58:in '<main>'

********************************************************************************
duckdb.h is not found. Install duckdb.h of duckdb >= 1.0.0.
********************************************************************************


To see why this extension failed to compile, please check the mkmf.log which can be found here:

  D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/extensions/x64-mingw-ucrt/3.4.0/duckdb-1.1.3.1/mkmf.log

extconf failed, exit code 1

Gem files will remain installed in D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/gems/duckdb-1.1.3.1 for inspection.
Results logged to D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/extensions/x64-mingw-ucrt/3.4.0/duckdb-1.1.3.1/gem_make.out

A new release of RubyGems is available: 3.6.2 → 3.6.3!
Run `gem update --system 3.6.3` to update your installation.

Since it did not work, we need to do a bit more. Since our extracted files are in a folder (D:\Apps\duckdb in my case), we open a command window and try to do this again, passing the path to the install command. So, now I did:

$ gem install duckdb -- --with-duckdb-include=d:/apps/duckdb --with-duckdb-lib=d:/apps/duckdb
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions with: '--with-duckdb-include=d:/apps/duckdb --with-duckdb-lib=d:/apps/duckdb'
This could take a while...
Successfully installed duckdb-1.1.3.1
1 gem installed

Great! It works now. Let’s try to do the same example as earlier from Ruby.

Using DuckDB from Ruby

Let’s open up irb and try to see if we can do a simple require 'duckdb' first. If you had not copied the DLL to C:\Windows\System32, you will find that this dies.

$ irb
irb(main):001> require 'duckdb'
<internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': 126: The specified module could not be found.   - D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/gems/duckdb-1.1.3.1/lib/duckdb/duckdb_native.so (LoadError)
        from <internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/gems/duckdb-1.1.3.1/lib/duckdb.rb:3:in '<top (required)>'
        from <internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:141:in 'Kernel#require'
        from <internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:141:in 'Kernel#require'
        from (irb):1:in '<main>'
        from <internal:kernel>:168:in 'Kernel#loop'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
        from D:/stack/ruby/Ruby34-x64/bin/irb:33:in 'Kernel#load'
        from D:/stack/ruby/Ruby34-x64/bin/irb:33:in '<main>'
<internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': cannot load such file -- duckdb (LoadError)
        from <internal:D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require'
        from (irb):1:in '<main>'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb/workspace.rb:101:in 'Kernel#eval'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb/workspace.rb:101:in 'IRB::WorkSpace#evaluate'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb/context.rb:631:in 'IRB::Context#evaluate_expression'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb/context.rb:599:in 'IRB::Context#evaluate'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1053:in 'block (2 levels) in IRB::Irb#eval_input'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1365:in 'IRB::Irb#signal_status'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1045:in 'block in IRB::Irb#eval_input'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1124:in 'block in IRB::Irb#each_top_level_statement'
        from <internal:kernel>:168:in 'Kernel#loop'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1121:in 'IRB::Irb#each_top_level_statement'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1044:in 'IRB::Irb#eval_input'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1025:in 'block in IRB::Irb#run'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1024:in 'Kernel#catch'
        from D:/stack/ruby/Ruby34-x64/lib/ruby/3.4.0/irb.rb:1024:in 'IRB::Irb#run'
        ... 4 levels...

OK, that’s not great. The guide had told us to put the duckdb.dll in c:\windows\system32 and I had not done that. So, I properly followed the instructions and dropped it into c:\windows\system32 this time. Now, let’s quickly try that once to see if it helps. That really works.

$ irb
irb(main):001> require 'duckdb'
=> true

Now, we follow the sample from the GitHub repository to see if things are working, but adapted for our needs. We do this also in irb as below.

$ irb
irb(main):001> require 'duckdb'
=> true
irb(main):002> db = DuckDB::Database.open
=> #<DuckDB::Database:0x00000206b57c39a0>
irb(main):003> con = db.connect
=> #<DuckDB::Connection:0x00000206b61d1550>
irb(main):004> query = "select filename, COUNT(*) AS LINES FROM read_csv('out_*.csv', filename = true) GROUP by filename;"
=> "select filename, COUNT(*) AS LINES FROM read_csv('out_*.csv', filename = true) GROUP by filename;"
irb(main):005> result = con.query(query)
=> #<DuckDB::Result:0x00000206b288f350>
irb(main):006> result.each { p it }
["out_extracted_h16.csv", 1237490]
["out_extracted_h21.csv", 1348620]
["out_extracted_h23.csv", 1222750]
["out_extracted_h15.csv", 1055910]
["out_extracted_h17.csv", 1324663]
["out_extracted_h22.csv", 1333127]
["out_extracted_h20.csv", 1020644]
["out_extracted_h18.csv", 1370403]
["out_extracted_h19.csv", 1252072]
=> nil

So, we get the same response as before. We also established that our general approach is:

  • create a DB object
  • create a connection
  • get the result from running the query on the connection
  • use the result

That’s all for this post. We got DuckDB running on Windows and accessed it directly as well as from Ruby on Windows. For now, we need to remember to put the DLL into c:\windows\system32 so that it can be found and loaded.

Hope this helps you in getting started.