One of the nicest features of DuckDB is that you don’t need to import a CSV before working with it.
For example, if your file is called large.csv:
View the first 10 rows
SELECT * FROM 'large.csv' LIMIT 10;
Keep only the first 100,000 rows and save them
COPY (
SELECT *
FROM 'large.csv'
LIMIT 100000
) TO 'small.csv' (HEADER, DELIMITER ',');
Extract a range of rows (e.g. rows 100,001 to 200,000)
COPY (
SELECT *
FROM 'large.csv'
LIMIT 100000 OFFSET 100000
) TO 'chunk2.csv' (HEADER, DELIMITER ',');
Filter by a column
COPY (
SELECT *
FROM 'large.csv'
WHERE frequency > 1420
) TO 'filtered.csv' (HEADER);
If by “cut a CSV in half” you literally mean splitting it into two equal-sized CSV files, I can show you the simplest way to do that as well.
To split a CSV into two equal parts using DuckDB, you can do it in a clean SQL way by numbering rows first.
Step 1: Open DuckDB
In Command Prompt:
duckdb
Step 2: Split CSV in half
This works by assigning a row number, counting total rows, then splitting at the midpoint:
WITH numbered AS (
SELECT
*,
row_number() OVER () AS rn,
count(*) OVER () AS total_rows
FROM 'large.csv'
)
-- First half
COPY (
SELECT *
FROM numbered
WHERE rn <= total_rows / 2
) TO 'half1.csv' (HEADER, DELIMITER ',');
-- Second half
COPY (
SELECT *
FROM numbered
WHERE rn > total_rows / 2
) TO 'half2.csv' (HEADER, DELIMITER ',');
Important notes
- This is “equal rows”, not equal file size.
- Row order is whatever order the CSV is read in (if order matters, we can stabilise it).
- If your file is huge, DuckDB streams it efficiently so you don’t need to load it into memory.
If you want something more robust
If your CSV has a time column (like timestamps from SDR or astronomy logs), it’s often better to split by time instead of row count. I can show that version too if useful.
To split a CSV into one file per day using DuckDB, the key idea is:
> extract the date from your timestamp column → group/filter by it → export each group
—
1. Open DuckDB
duckdb
—
2. Check your date column
Assume your CSV has a column like:
timestamp (e.g. 2026-06-28 12:34:56)
We convert it to a date like this:
SELECT DISTINCT CAST(timestamp AS DATE) AS d
FROM ‘large.csv’
ORDER BY d;
—
3. Export ONE day (test first)
COPY (
SELECT *
FROM ‘large.csv’
WHERE CAST(timestamp AS DATE) = ‘2026-06-28’
) TO ‘2026-06-28.csv’ (HEADER, DELIMITER ‘,’);
—
4. Automate: one file per day (Windows PowerShell way)
Exit DuckDB, then run this in PowerShell:
$days = duckdb -c “SELECT DISTINCT CAST(timestamp AS DATE) FROM ‘large.csv’;”
foreach ($day in $days) {
duckdb -c “COPY (
SELECT *
FROM ‘large.csv’
WHERE CAST(timestamp AS DATE) = ‘$day’
) TO ‘$day.csv’ (HEADER, DELIMITER ‘,’);”
}
—
5. If your column is already a DATE (faster)
If you already have a date column:
WHERE date = ‘2026-06-28’
—
Important notes
This works even for very large CSVs because DuckDB streams the data.
Output files are created per unique date.
If your timestamp format is weird (e.g. ISO strings), we may need strptime() instead of CAST().