Posts

Showing posts from December, 2023
Image
Automating Data Import from CSV to DuckDB Learn how to efficiently import multiple CSV files into a DuckDB database using Python. Overview This guide will walk you through the process of writing a Python script to automatically loop through a list of CSV files in a directory and import them into a DuckDB database. This method is particularly useful for consolidating data storage and simplifying data management. Step 1: List All CSV Files Use Python's os module to list all files in the directory and filter to include only CSV files. csv_files = [f for f in os.listdir(directory_path) if f.endswith('.csv')] Step 2: Iterate and Load Each CSV For each CSV file, use DuckDB's COPY command to load the file into a table. for file in csv_files: table_name = os.path.splitext(file)[0] file_path = os.path.join(directory_path, file) conn.execute(f"COPY {table_name} FROM '{file_path}' (HEADER)") Step 3: Complete Python Script Here's

Using DuckDB as a Centralized Data Storage Solution

Image
Using DuckDB as a Centralized Data Storage Solution Using DuckDB as a Centralized Data Storage Solution Transitioning from multiple CSV files to DuckDB for data storage and management. Step 1: Import CSV Data into DuckDB Start by importing your existing CSV data into DuckDB. DuckDB can load data directly from CSV files and infer the schema automatically. import duckdb conn = duckdb.connect('my_data.duckdb') conn.execute("COPY my_table FROM 'path/to/your/csvfile.csv' (HEADER)") print(conn.execute("SELECT * FROM my_table LIMIT 10").fetchall()) Step 2: Insert New Data For new data, insert it directly into the DuckDB table: conn.execute("INSERT INTO my_table SELECT * FROM ?", (new_data,)) Step 3: Query and Analyze Use SQL queries for data analysis, which is more efficient compared to handling CSV files: