Automating Data Import from CSV to DuckDB
Learn how to efficiently import multiple CSV files into a DuckDB database using Python.
Overview
This guide will walk you through the process of writing a Python script to automatically loop through a list of CSV files in a directory and import them into a DuckDB database. This method is particularly useful for consolidating data storage and simplifying data management.
Step 1: List All CSV Files
Use Python's os
module to list all files in the directory and filter to include only CSV files.
csv_files = [f for f in os.listdir(directory_path) if f.endswith('.csv')]
Step 2: Iterate and Load Each CSV
For each CSV file, use DuckDB's COPY
command to load the file into a table.
for file in csv_files:
table_name = os.path.splitext(file)[0]
file_path = os.path.join(directory_path, file)
conn.execute(f"COPY {table_name} FROM '{file_path}' (HEADER)")
Step 3: Complete Python Script
Here's the complete Python script for the process:
import os
import duckdb
def load_csv_to_duckdb(db_path, directory_path):
conn = duckdb.connect(db_path)
# List all CSV files in the directory
csv_files = [f for f in os.listdir(directory_path) if f.endswith('.csv')]
for file in csv_files:
table_name = os.path.splitext(file)[0]
file_path = os.path.join(directory_path, file)
conn.execute(f"COPY {table_name} FROM '{file_path}' (HEADER)")
conn.close()
# Usage
script_dir = os.path.dirname(os.path.abspath(__file__)) # Directory of the script
db_path = os.path.join(script_dir, 'my_data.duckdb') # Path for the DuckDB database
load_csv_to_duckdb(db_path, script_dir)
This script will automatically create tables in DuckDB with names corresponding to the CSV filenames and import the data from those files. It's a convenient way to consolidate multiple data sources into a single, queryable database.
Comments
Post a Comment