Exploring DuckDB - Part 2 (Dates, Times, CLI)

In the previous posts, we went through setting up DuckDB and exploring DuckDB for common tasks involving importing data, aggregate queries and type casting. In this post, we continue our exploration by looking at some more common things we need, especially around date and time functions.

We use the same dataset as before, so be sure to read either the full post, or at least the background to the sample data in that post, before jumping in. As before, most of these tips may apply to any language from which you execute queries, but I ran all of these from the official DuckDB command line application.

In this post, we will look at common things to do with:

Working with Dates and Times
CLI dot commands
Command Line Options for the CLI client

Dates and Times in our Dataset

We have two places where we have dates and times/ timestamps in our data:

The first column of each row in the file has a 14-digit timestamp (YYYYmmddHHMMSS) that represents when the data was captured. This is imported as a BIGINT or int64 in our dataset.
The second place we have a timestamp is in how the CSV files are named. The file name represents the time at which the data was exported from the system. and each file is named something like data-20250121-132000.csv where the date component (8 digits) and time component (6 digits) are separated by a -. When we imported the CSV files with filename = true, we brought this name into the table for each record.

As we process the information, we will focus much of our effort on these two data fields.

Date and Time Handling

We will start by focusing our attention on the timestamp field in the original data and see what that allows us to do.

Getting a Timestamp from a Number

Our first activity sounds really simple - we have the timestamp as a number that can be read off as a date-time value and would like to interpret it as a timestamp so that we can use it for date and time intelligence. Let’s get started.

D select * from data limit 5;
┌────────────────┬───────────┬───────┬──────────────┬──────────────────────────┐
│   Timestamp    │ Sensor ID │ Value │ Description  │         filename         │
│     int64      │   int64   │ int64 │   varchar    │         varchar          │
├────────────────┼───────────┼───────┼──────────────┼──────────────────────────┤
│ 20250119130123 │    108901 │    34 │ Valve Open   │ data-20250119-131500.csv │
│ 20250119130124 │    108902 │    64 │ NA           │ data-20250119-131500.csv │
│ 20250119130129 │    108901 │    45 │ Valve Open   │ data-20250119-131500.csv │
│ 20250119130203 │    108901 │    41 │ Valve Closed │ data-20250119-131500.csv │
│ 20250119130224 │    108902 │    60 │ NA           │ data-20250119-131500.csv │
└────────────────┴───────────┴───────┴──────────────┴──────────────────────────┘

When we see the typecasting documents, we see that we cannot cast a BIGINT to a date, time or timestamp value. However, we can convert it into a string using a cast, and then apply the strptime method on it. Similar to the equivalent in other programming languages, strptime parses a string into a date using a template. Let’s specify the template for YYmmddHHMMSS which is '%Y%m%d%H%M%S' and matches our field and try.

D SELECT Timestamp,
┬╖   CAST (Timestamp AS VARCHAR) AS TextTime,
┬╖   strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' ) AS DtTimeStamp
┬╖ FROM data
┬╖ LIMIT 5;
┌────────────────┬────────────────┬─────────────────────┐
│   Timestamp    │    TextTime    │     DtTimeStamp     │
│     int64      │    varchar     │      timestamp      │
├────────────────┼────────────────┼─────────────────────┤
│ 20250119130123 │ 20250119130123 │ 2025-01-19 13:01:23 │
│ 20250119130124 │ 20250119130124 │ 2025-01-19 13:01:24 │
│ 20250119130129 │ 20250119130129 │ 2025-01-19 13:01:29 │
│ 20250119130203 │ 20250119130203 │ 2025-01-19 13:02:03 │
│ 20250119130224 │ 20250119130224 │ 2025-01-19 13:02:24 │
└────────────────┴────────────────┴─────────────────────┘

That’s fantastic. We now have a timestamp data that we can use for further date-time style calculations. You can refer to the documentation of the format specifiers for parsing other options.

Max and Min on a Timestamp

You’ve probably guessed it and you’d be right that min and max work on the timestamps, so this does exactly what you expect.

D SELECT
┬╖  min (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS min_ts,
┬╖  max (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS max_ts
┬╖ from data;
┌─────────────────────┬─────────────────────┐
│       min_ts        │       max_ts        │
│      timestamp      │      timestamp      │
├─────────────────────┼─────────────────────┤
│ 2025-01-19 13:00:28 │ 2025-01-22 13:14:23 │
└─────────────────────┴─────────────────────┘

Calculating the difference between 2 Timestamps

You can simply do a subtraction between 2 different timestamps and find the interval between them. In the query below, we find the total range of the data as the difference between the max and min timestamps as 3 days, 13 minutes and 55 seconds. Note that this returns an interval data type.

D select  min (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS min_ts,
┬╖  max (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS max_ts,
┬╖  max_ts - min_ts
┬╖ from data;
┌─────────────────────┬─────────────────────┬───────────────────┐
│       min_ts        │       max_ts        │ (max_ts - min_ts) │
│      timestamp      │      timestamp      │     interval      │
├─────────────────────┼─────────────────────┼───────────────────┤
│ 2025-01-19 13:00:28 │ 2025-01-22 13:14:23 │ 3 days 00:13:55   │
└─────────────────────┴─────────────────────┴───────────────────┘

Difference between 2 Timestamps (in seconds)

The difference between 2 timestamps above was calculated as an interval. To get the number of seconds, we need to use the function epoch on an interval data type.

D select
┬╖   min (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS min_ts,
┬╖   max (strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS max_ts,
┬╖   max_ts - min_ts AS range,
┬╖   epoch (max_ts - min_ts) AS range_sec
┬╖  from data;
┌─────────────────────┬─────────────────────┬─────────────────┬───────────┐
│       min_ts        │       max_ts        │      range      │ range_sec │
│      timestamp      │      timestamp      │    interval     │  double   │
├─────────────────────┼─────────────────────┼─────────────────┼───────────┤
│ 2025-01-19 13:00:28 │ 2025-01-22 13:14:23 │ 3 days 00:13:55 │  260035.0 │
└─────────────────────┴─────────────────────┴─────────────────┴───────────┘

We see the difference is 260035 seconds! From epoch, we get a difference in seconds (and decimal seconds). So, as shown in this post, if you want to get the difference in milliseconds, you would have needed to do: epoch(interval) * 1000 + milliseconds(interval) but newer versions of DuckDB let you do epoch_ms directly instead of epoch to get the result in milliseconds.

Truncating a Timestamp (reducing its precision)

We often want to be able to reduce the precision of a timestamp, e.g., reduce it from per-second to per-minute, etc. DuckDB has date_trunc that allows us to truncate a date item, as required. Let’s try this.

D SELECT
┬╖   strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' ) AS DtTimeStamp,
┬╖   date_trunc ('minute', DtTimeStamp) As MinTimeStamp,
┬╖   date_trunc ('hour', DtTimeStamp) AS HrTimeStamp,
┬╖   date_trunc ('day', DtTimeStamp) As Date
┬╖ FROM data
┬╖ LIMIT 10;
┌─────────────────────┬─────────────────────┬─────────────────────┬────────────┐
│     DtTimeStamp     │    MinTimeStamp     │     HrTimeStamp     │    Date    │
│      timestamp      │      timestamp      │      timestamp      │    date    │
├─────────────────────┼─────────────────────┼─────────────────────┼────────────┤
│ 2025-01-19 13:01:23 │ 2025-01-19 13:01:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:01:24 │ 2025-01-19 13:01:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:01:29 │ 2025-01-19 13:01:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:02:03 │ 2025-01-19 13:02:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:02:24 │ 2025-01-19 13:02:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:02:28 │ 2025-01-19 13:02:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:02:52 │ 2025-01-19 13:02:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:03:42 │ 2025-01-19 13:03:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:03:58 │ 2025-01-19 13:03:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
│ 2025-01-19 13:08:09 │ 2025-01-19 13:08:00 │ 2025-01-19 13:00:00 │ 2025-01-19 │
├─────────────────────┴─────────────────────┴─────────────────────┴────────────┤
│ 10 rows                                                            4 columns │
└──────────────────────────────────────────────────────────────────────────────┘

We reduced the precision to per-minute, per-hour and per-day by using minute, hour, day as the specifiers (you could also do month, year, quarter, etc. as specified in the documentation, depending on the analysis you are doing).

Grouping and Counting by Time/ Date

With what we did in the previous step, we can quickly group the data to get useful statistics by hour, minute, day.

D SELECT
┬╖   date_trunc ('minute', strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS DtMinute,
┬╖   count(*)
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 1 DESC
┬╖ LIMIT 10;
┌─────────────────────┬──────────────┐
│      DtMinute       │ count_star() │
│      timestamp      │    int64     │
├─────────────────────┼──────────────┤
│ 2025-01-22 13:14:00 │            1 │
│ 2025-01-22 13:13:00 │            1 │
│ 2025-01-22 13:12:00 │            2 │
│ 2025-01-22 13:11:00 │            3 │
│ 2025-01-22 13:10:00 │            1 │
│ 2025-01-22 13:09:00 │            1 │
│ 2025-01-22 13:07:00 │            2 │
│ 2025-01-22 13:06:00 │            1 │
│ 2025-01-22 13:05:00 │            2 │
│ 2025-01-22 13:04:00 │            1 │
├─────────────────────┴──────────────┤
│ 10 rows                  2 columns │
└────────────────────────────────────┘

You’ll see that we use GROUP BY 1 and ORDER BY 1 which means that we group and order by the 1st column in the dataset. That’s why we get what we expect, ordered from the latest minute to older time.

We can change it to be ORDER BY 2 DESC and we get:

D SELECT
┬╖   date_trunc ('minute', strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS DtMinute,
┬╖   count(*) AS DataCount
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 2 DESC
┬╖ LIMIT 10;
┌─────────────────────┬───────────┐
│      DtMinute       │ DataCount │
│      timestamp      │   int64   │
├─────────────────────┼───────────┤
│ 2025-01-21 13:10:00 │         4 │
│ 2025-01-19 13:02:00 │         4 │
│ 2025-01-19 13:16:00 │         4 │
│ 2025-01-20 13:07:00 │         3 │
│ 2025-01-19 13:19:00 │         3 │
│ 2025-01-22 13:11:00 │         3 │
│ 2025-01-19 13:18:00 │         3 │
│ 2025-01-19 13:01:00 │         3 │
│ 2025-01-19 13:14:00 │         2 │
│ 2025-01-20 13:03:00 │         2 │
├─────────────────────┴───────────┤
│ 10 rows               2 columns │
└─────────────────────────────────┘

In a similar way, we can get the number by hour and day.

D SELECT
┬╖   date_trunc ('hour', strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS DtHour,
┬╖   count(*) AS DataCount
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 2 DESC
┬╖ LIMIT 10;
┌─────────────────────┬───────────┐
│       DtHour        │ DataCount │
│      timestamp      │   int64   │
├─────────────────────┼───────────┤
│ 2025-01-20 13:00:00 │        29 │
│ 2025-01-19 13:00:00 │        27 │
│ 2025-01-22 13:00:00 │        19 │
│ 2025-01-21 13:00:00 │        12 │
└─────────────────────┴───────────┘
D
D SELECT
┬╖   date_trunc ('day', strptime ( CAST (Timestamp AS VARCHAR), '%Y%m%d%H%M%S' )) AS Date,
┬╖   count(*) AS DataCount
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 2 DESC
┬╖ LIMIT 10;
┌────────────┬───────────┐
│    Date    │ DataCount │
│    date    │   int64   │
├────────────┼───────────┤
│ 2025-01-20 │        29 │
│ 2025-01-19 │        27 │
│ 2025-01-22 │        19 │
│ 2025-01-21 │        12 │
└────────────┴───────────┘

Number of Files by Date

Our filename field also has a date and timestamp embedded in it. In the previous post, we saw how to cast that into a text string. We can parse that text string into a timestamp and use that to date_trunc to day and see how many sensor data items per reported each day. This just puts together numerous items that we have done so far.

D SELECT
┬╖   date_trunc ('day',
┬╖      strptime (substring(filename, 6, 8) || substring (filename, 15, 6), '%Y%m%d%H%M%S')
┬╖   ) AS file_date,
┬╖   count(*) as sensor_count
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 1 ASC;
┌────────────┬──────────────┐
│ file_date  │ sensor_count │
│    date    │    int64     │
├────────────┼──────────────┤
│ 2025-01-19 │           27 │
│ 2025-01-20 │           29 │
│ 2025-01-21 │           12 │
│ 2025-01-22 │           19 │
└────────────┴──────────────┘

The query below counts DISTINCT (filename) by date to see how many files were reported on each date.

D SELECT
┬╖   date_trunc ('day',
┬╖      strptime (substring(filename, 6, 8) || substring (filename, 15, 6), '%Y%m%d%H%M%S')
┬╖   ) AS file_date,
┬╖   count( distinct(filename) ) as file_count
┬╖ FROM data
┬╖ GROUP BY 1
┬╖ ORDER BY 1 ASC;
┌────────────┬────────────┐
│ file_date  │ file_count │
│    date    │   int64    │
├────────────┼────────────┤
│ 2025-01-19 │          2 │
│ 2025-01-20 │          2 │
│ 2025-01-21 │          1 │
│ 2025-01-22 │          1 │
└────────────┴────────────┘

In this case, we used the approach of using substrings to extract the timestamp, but you can directly use a date format specifier as in the query below. The characters data-, - and .csv are matched exactly and the date time portion is parsed into a timestamp.

D select filename, strptime(filename, 'data-%Y%m%d-%H%M%S.csv') AS FileDate from data limit 5;
┌──────────────────────────┬─────────────────────┐
│         filename         │      FileDate       │
│         varchar          │      timestamp      │
├──────────────────────────┼─────────────────────┤
│ data-20250119-131500.csv │ 2025-01-19 13:15:00 │
│ data-20250119-131500.csv │ 2025-01-19 13:15:00 │
│ data-20250119-131500.csv │ 2025-01-19 13:15:00 │
│ data-20250119-131500.csv │ 2025-01-19 13:15:00 │
│ data-20250119-131500.csv │ 2025-01-19 13:15:00 │
└──────────────────────────┴─────────────────────┘

Carefully since there’s a lot more!

Working with date, time and intervals is not difficult but it just needs careful queries to make sure that you put in casts, parsing, etc. as required to string together more complicated queries. For a moment, step back and think of all the steps you manually need to do and it slowly falls into place. We focussed mostly on Timestamp functions but there are numerous additional capabilities that you will discover as you explore further.

Revisiting CSV Import and Our Table

We spent a lot of time converting numbers into strings and strings into timestamps so that we could do some date-time related operations. This naturally begs the question if we could do it differently rather than have to do the conversion each time we do a query. So, let’s look at a few things that we can do.

There are two things that we will look at:

Directly parsing a field into a timestamp rather than starting with as an int64
Adding a new field by transforming the table after it has been imported.

Import Data as Timestamp

We know our source data for the timestamp is matched using '%Y%m%d%H%M%S' and we can actually specify this in the import. The first step in doing this is to specify the column types to indicate that we are now no longer relying on DuckDB to sniff the format, but are instead providing our own specified schema. This relies on us providing columns as a structure with each field type specified in it. So, we try this and it fails.


D select *
┬╖ from read_csv ('*.csv', header = true, filename = true,
┬╖ columns = { 'Timestamp': 'TIMESTAMP', 'Sensor ID': BIGINT, 'Value': INT, 'Description': VARCHAR}
┬╖ )
┬╖ LIMIT 5;
Conversion Error: CSV Error on Line: 2
Original Line: 20250122130103,108923,80,Light OFF
Error when converting column &quot;Timestamp&quot;. Could not convert string &quot;20250122130103&quot; to 'TIMESTAMP'

Column Timestamp is being converted as type TIMESTAMP
This type was either manually set or derived from an existing table. Select a different type to correctly parse this column.

  file = data-20250122-131500.csv
  delimiter = , (Auto-Detected)
  quote = &quot; (Auto-Detected)
  escape = &quot; (Auto-Detected)
  new_line = \r\n (Auto-Detected)
  header = true (Set By User)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0
The Column types set by the user do not match the ones found by the sniffer.
Column at position: 0 Set type: TIMESTAMP Sniffed type: BIGINT
Column at position: 2 Set type: INTEGER Sniffed type: BIGINT

The details give us an insight on how DuckDB is looking at the data but importantly, it shows that:


Error when converting column &quot;Timestamp&quot;. Could not convert string &quot;20250122130103&quot; to 'TIMESTAMP'

Column Timestamp is being converted as type TIMESTAMP
This type was either manually set or derived from an existing table. Select a different type to correctly parse this column.

To allow this to pass, we need to provide a Timestamp Format that matches our data. We know what that looks like, so we add it in using timestampformat = '%Y%m%d%H%M%S' and try again.

D select *
┬╖ from read_csv ('*.csv', header = true, filename = true,
┬╖ columns = { 'Timestamp': 'TIMESTAMP', 'Sensor ID': BIGINT, 'Value': INT, 'Description': VARCHAR},
┬╖ timestampformat = '%Y%m%d%H%M%S'
┬╖ )
┬╖ LIMIT 5;
┌─────────────────────┬───────────┬───────┬──────────────┬──────────────────────────┐
│      Timestamp      │ Sensor ID │ Value │ Description  │         filename         │
│      timestamp      │   int64   │ int32 │   varchar    │         varchar          │
├─────────────────────┼───────────┼───────┼──────────────┼──────────────────────────┤
│ 2025-01-22 13:01:03 │    108923 │    80 │ Light OFF    │ data-20250122-131500.csv │
│ 2025-01-22 13:01:44 │    108911 │    41 │ Valve Open   │ data-20250122-131500.csv │
│ 2025-01-22 13:02:18 │    108921 │    43 │ NA           │ data-20250122-131500.csv │
│ 2025-01-22 13:02:29 │    108922 │    54 │ Valve Open   │ data-20250122-131500.csv │
│ 2025-01-22 13:04:03 │    108901 │    19 │ Valve Closed │ data-20250122-131500.csv │
└─────────────────────┴───────────┴───────┴──────────────┴──────────────────────────┘

Now, we see that DuckDB shows the first field as a timestamp rather than VARCHAR in earlier exploration. This saves you a lot of hassle, going ahead.

We used columns that requires us to provide the data types for each of the columns. We can also use types which allows us to provide data types for only some columns as used below.

D select *
┬╖ from read_csv ('*.csv', header = true, filename = true,
┬╖ types = {'Timestamp': 'TIMESTAMP', 'Value': INT},
┬╖ timestampformat = '%Y%m%d%H%M%S'
┬╖ )
┬╖ LIMIT 5;
┌─────────────────────┬───────────┬───────┬──────────────┬──────────────────────────┐
│      Timestamp      │ Sensor ID │ Value │ Description  │         filename         │
│      timestamp      │   int64   │ int32 │   varchar    │         varchar          │
├─────────────────────┼───────────┼───────┼──────────────┼──────────────────────────┤
│ 2025-01-22 13:01:03 │    108923 │    80 │ Light OFF    │ data-20250122-131500.csv │
│ 2025-01-22 13:01:44 │    108911 │    41 │ Valve Open   │ data-20250122-131500.csv │
│ 2025-01-22 13:02:18 │    108921 │    43 │ NA           │ data-20250122-131500.csv │
│ 2025-01-22 13:02:29 │    108922 │    54 │ Valve Open   │ data-20250122-131500.csv │
│ 2025-01-22 13:04:03 │    108901 │    19 │ Valve Closed │ data-20250122-131500.csv │
└─────────────────────┴───────────┴───────┴──────────────┴──────────────────────────┘

Adding another Column on Import

You might remember that we created the original schema by doing CTAS (Create Table As Select) - which means that the schema matches the return dataset from doing the SELECT query. So far, we have been quite faithful to the schema that we are importing but we can also select an additional field that is based on using a function that parses the data fields that we are loading.

We already know that strptime(filename, 'data-%Y%m%d-%H%M%S.csv') AS FileDate gives us a column called FileDate which is a timestamp. We can simply add that to our SELECT clause to also add a new field that has the FileData in it. The query changes to:

D select *, strptime(filename, 'data-%Y%m%d-%H%M%S.csv') AS FileDate
┬╖   from read_csv ('*.csv', header = true, filename = true,
┬╖     types = {'Timestamp': 'TIMESTAMP', 'Value': INT},
┬╖     timestampformat = '%Y%m%d%H%M%S'
┬╖   )
┬╖ LIMIT 5;

┌─────────────────────┬───────────┬───────┬──────────────┬──────────────────────────┬─────────────────────┐
│      Timestamp      │ Sensor ID │ Value │ Description  │         filename         │      FileDate       │
│      timestamp      │   int64   │ int32 │   varchar    │         varchar          │      timestamp      │
├─────────────────────┼───────────┼───────┼──────────────┼──────────────────────────┼─────────────────────┤
│ 2025-01-22 13:01:03 │    108923 │    80 │ Light OFF    │ data-20250122-131500.csv │ 2025-01-22 13:15:00 │
│ 2025-01-22 13:01:44 │    108911 │    41 │ Valve Open   │ data-20250122-131500.csv │ 2025-01-22 13:15:00 │
│ 2025-01-22 13:02:18 │    108921 │    43 │ NA           │ data-20250122-131500.csv │ 2025-01-22 13:15:00 │
│ 2025-01-22 13:02:29 │    108922 │    54 │ Valve Open   │ data-20250122-131500.csv │ 2025-01-22 13:15:00 │
│ 2025-01-22 13:04:03 │    108901 │    19 │ Valve Closed │ data-20250122-131500.csv │ 2025-01-22 13:15:00 │
└─────────────────────┴───────────┴───────┴──────────────┴──────────────────────────┴─────────────────────┘

This creates an additional column called FileDate and stores the timestamp of the file into it so that we don’t really need to have the data conversion in our queries when we execute them. Of course, this means that more data is stored in the database.

Command Line Options

You can just do duckdb -help to see the list of options but I find the following most useful.

(1) -csv : sets the output mode to CSV. This is especially useful if you want to pipe CSV output to a file or to another process)
(2) -s COMMAND : this runs the command after -s and then exits. This allows you to run a command to get an output without going into an interactive mode, e.g., you can use -s and -csv to send CSV output to YouPlot.
(3) -separator : to set a different separator (the default is |). If you use -csv, then it automatically sets it to a comma but in case you want something else, e.g., ^, you can specify it here.
(4) Other output mode specifiers such as -json, -html and -markdown.
(5) -readonly to prevent you from executing any queries that change the database.

See the CLI Arguments page for more information, especially about how you can pass a sequence of arguments (e.g. multiple queries for execution).

Command Line Client Dot Commands

As you explore the DuckDB command line client, you will find these commands useful.

(1) Change the mode of the output between different types by uisng .mode - common options are csv, duckbox, box, table and markdown (the full list is in the output formats documentation)
(2) Send the output of the query to a file by doing .once filename which will send the result of the next query to the file. Subsequent queries will output the result to stdout. If tiy use .output filename the result of every query you execute after that is sent to the file called filename - to get output back to stdout, just do .output without a filename.
(3) Use .excel to open an Excel file with the results showing in a spreadsheet.
(4) Use .headers to switch headers on or off.
(5) To see the execution time of queries, use .timer on so that execution time is displayed after each query.
(6) You can use .shell <command> to execute a command in the shell. It’s silly but I often use it to do .shell cls so that it clears the screen, but you can also use it to do things like list the files in a directory and so on.
(7) .tables shows you the list of tables that exist in your current database.
(8) .schema TABLE_NAME will show you the schema of the table so that you know what fields it has.
(9) Finally, .cd was something we used in Part 1 to change the directory to the path that we wanted.

As always, you will find much more in the official documentation.

Conclusion

In this post, we have built further on what we saw in Part 1. We learnt a bit about using data and time functions and saw examples of solving some specific problems using that ability. We also looked at some of the dot commands and command line options for the CLI application. Finally, we revisited how we were importing our data so that we didn’t need to do some of the data wrangling that we did.

Links and References

The official documentation is fantastic and you really should read it, as and when you need it. Here are links to specific pages that I referred to while putting these posts together.

Most of the links are to the DuckDB documentation itself but to specific sections that we referred to:

CLI Arguments: https://duckdb.org/docs/clients/cli/arguments
CLI Dot Commands: https://duckdb.org/docs/clients/cli/dot_commands
CLI Output Formats: https://duckdb.org/docs/clients/cli/output_formats
Typecasting: https://duckdb.org/docs/sql/data_types/typecasting
Date Formatting: https://duckdb.org/docs/sql/functions/dateformat.html
Date Part Specifiers and Functions for Date/ Time truncation: https://duckdb.org/docs/sql/functions/datepart.html
Timestamp functions: https://duckdb.org/docs/sql/functions/timestamp
Importing CSV files: https://duckdb.org/docs/data/csv/overview.html

A couple of other posts:

Top DuckDB CLI Commands That You Should Know: https://habedi.medium.com/top-duckdb-cli-commands-that-you-should-know-7783af9c1fb4
A tutorial for DuckDB beginners: https://motherduck.com/blog/duckdb-tutorial-for-beginners/