How to cut a file in half with DuckDB, and also how to split by date?

By Admin June 28, 2026

One of the nicest features of DuckDB is that you don’t need to import a CSV before working with it.

For example, if your file is called large.csv:

View the first 10 rows

SELECT * FROM 'large.csv' LIMIT 10;

Keep only the first 100,000 rows and save them

COPY (
    SELECT *
    FROM 'large.csv'
    LIMIT 100000
) TO 'small.csv' (HEADER, DELIMITER ',');

Extract a range of rows (e.g. rows 100,001 to 200,000)

COPY (
    SELECT *
    FROM 'large.csv'
    LIMIT 100000 OFFSET 100000
) TO 'chunk2.csv' (HEADER, DELIMITER ',');

Filter by a column

COPY (
    SELECT *
    FROM 'large.csv'
    WHERE frequency > 1420
) TO 'filtered.csv' (HEADER);

If by “cut a CSV in half” you literally mean splitting it into two equal-sized CSV files, I can show you the simplest way to do that as well.

To split a CSV into two equal parts using DuckDB, you can do it in a clean SQL way by numbering rows first.

Step 1: Open DuckDB

In Command Prompt:

duckdb

Step 2: Split CSV in half

This works by assigning a row number, counting total rows, then splitting at the midpoint:

WITH numbered AS (
    SELECT
        *,
        row_number() OVER () AS rn,
        count(*) OVER () AS total_rows
    FROM 'large.csv'
)

-- First half
COPY (
    SELECT *
    FROM numbered
    WHERE rn <= total_rows / 2
) TO 'half1.csv' (HEADER, DELIMITER ',');

-- Second half
COPY (
    SELECT *
    FROM numbered
    WHERE rn > total_rows / 2
) TO 'half2.csv' (HEADER, DELIMITER ',');

Important notes

This is “equal rows”, not equal file size.
Row order is whatever order the CSV is read in (if order matters, we can stabilise it).
If your file is huge, DuckDB streams it efficiently so you don’t need to load it into memory.

If you want something more robust

If your CSV has a time column (like timestamps from SDR or astronomy logs), it’s often better to split by time instead of row count. I can show that version too if useful.

To split a CSV into one file per day using DuckDB, the key idea is:

> extract the date from your timestamp column → group/filter by it → export each group

—

1. Open DuckDB

duckdb

—

2. Check your date column

Assume your CSV has a column like:

timestamp (e.g. 2026-06-28 12:34:56)

We convert it to a date like this:

SELECT DISTINCT CAST(timestamp AS DATE) AS d

FROM ‘large.csv’

ORDER BY d;

—

3. Export ONE day (test first)

COPY (

SELECT *

FROM ‘large.csv’

WHERE CAST(timestamp AS DATE) = ‘2026-06-28’

) TO ‘2026-06-28.csv’ (HEADER, DELIMITER ‘,’);

—

4. Automate: one file per day (Windows PowerShell way)

Exit DuckDB, then run this in PowerShell:

$days = duckdb -c “SELECT DISTINCT CAST(timestamp AS DATE) FROM ‘large.csv’;”

foreach ($day in $days) {

duckdb -c “COPY (

SELECT *

FROM ‘large.csv’

WHERE CAST(timestamp AS DATE) = ‘$day’

) TO ‘$day.csv’ (HEADER, DELIMITER ‘,’);”

}

—

5. If your column is already a DATE (faster)

If you already have a date column:

WHERE date = ‘2026-06-28’

—

Important notes

This works even for very large CSVs because DuckDB streams the data.

Output files are created per unique date.

If your timestamp format is weird (e.g. ISO strings), we may need strptime() instead of CAST().

How to cut a file in half with DuckDB, and also how to split by date?

Step 1: Open DuckDB

Step 2: Split CSV in half

Important notes

If you want something more robust

By Admin

Leave a Reply Cancel reply

You missed

JAVA muon data processing program on Windows 10 causes “UnsupportedClassVersionError”.

What free spreadsheet software is available to process very large PicoMuon data CSV files, which are too large for Microsoft Excel?

How to open DuckDB (command line database editor) which works with very large PicoMuon data files that are too big for Excel

How to cut a file in half with DuckDB, and also how to split by date?

How to cut a file in half with DuckDB, and also how to split by date?

Step 1: Open DuckDB

Step 2: Split CSV in half

Important notes

If you want something more robust

By Admin

Related post

JAVA muon data processing program on Windows 10 causes “UnsupportedClassVersionError”.

What free spreadsheet software is available to process very large PicoMuon data CSV files, which are too large for Microsoft Excel?

How to open DuckDB (command line database editor) which works with very large PicoMuon data files that are too big for Excel

Leave a Reply Cancel reply

You missed

JAVA muon data processing program on Windows 10 causes “UnsupportedClassVersionError”.

What free spreadsheet software is available to process very large PicoMuon data CSV files, which are too large for Microsoft Excel?

How to open DuckDB (command line database editor) which works with very large PicoMuon data files that are too big for Excel

How to cut a file in half with DuckDB, and also how to split by date?