Automating Data Import from CSV to DuckDB

Learn how to efficiently import multiple CSV files into a DuckDB database using Python.

Overview

This guide will walk you through the process of writing a Python script to automatically loop through a list of CSV files in a directory and import them into a DuckDB database. This method is particularly useful for consolidating data storage and simplifying data management.

Step 1: List All CSV Files

Use Python's os module to list all files in the directory and filter to include only CSV files.


csv_files = [f for f in os.listdir(directory_path) if f.endswith('.csv')]

Step 2: Iterate and Load Each CSV

For each CSV file, use DuckDB's COPY command to load the file into a table.


for file in csv_files:
    table_name = os.path.splitext(file)[0]
    file_path = os.path.join(directory_path, file)
    conn.execute(f"COPY {table_name} FROM '{file_path}' (HEADER)")

Step 3: Complete Python Script

Here's the complete Python script for the process:


import os
import duckdb

def load_csv_to_duckdb(db_path, directory_path):
    conn = duckdb.connect(db_path)

    # List all CSV files in the directory
    csv_files = [f for f in os.listdir(directory_path) if f.endswith('.csv')]

    for file in csv_files:
        table_name = os.path.splitext(file)[0]
        file_path = os.path.join(directory_path, file)
        conn.execute(f"COPY {table_name} FROM '{file_path}' (HEADER)")

    conn.close()

# Usage
script_dir = os.path.dirname(os.path.abspath(__file__))  # Directory of the script
db_path = os.path.join(script_dir, 'my_data.duckdb')     # Path for the DuckDB database

load_csv_to_duckdb(db_path, script_dir)

This script will automatically create tables in DuckDB with names corresponding to the CSV filenames and import the data from those files. It's a convenient way to consolidate multiple data sources into a single, queryable database.

Comments

Popular posts from this blog

Blog Topics

Drawing Tables with ReportLab: A Comprehensive Example

DataFrame groupby agg style bar