Key methods of extracting limited data sets from single very large file in DuckDB

Keep only the first 100,000 rows and save them

COPY (

    SELECT *

    FROM ‘large.csv’

    LIMIT 100000

) TO ‘small.csv’ (HEADER, DELIMITER ‘,’);

Extract a range of rows (e.g. rows 100,001 to 200,000)

COPY (

    SELECT *

    FROM ‘large.csv’

    LIMIT 100000 OFFSET 100000

) TO ‘chunk2.csv’ (HEADER, DELIMITER ‘,’);

Filter by a column

COPY (

    SELECT *

    FROM ‘large.csv’

    WHERE frequency > 1420

) TO ‘filtered.csv’ (HEADER);

By Admin

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.