Getting Started with DuckDB
PostgreSQL is the default choice for most applications, and rightly so. It handles transactions, CRUD, and relational queries exceptionally well. But once you need analytical queries over large datasets (aggregations, rollups, pivots), it starts to struggle. PostgreSQL stores data in rows, optimized for reading and writing individual records. Analytical queries that scan millions of rows across a few columns pay the cost of reading every column in every row.
DuckDB is built for the opposite workload. Columnar storage, vectorized execution, and query optimizations designed for aggregation-heavy work. It can also query CSV and Parquet files directly, no import step required. Think of it as SQLite for analytics.
We'll build a working analytics pipeline over a sales dataset in one TypeScript file. SpinDB runs DuckDB behind a PostgreSQL-compatible proxy, so you connect with the standard pg package. Run everything locally, or use a managed instance on Layerbase Cloud to skip the install.
Contents
- Create a DuckDB Instance
- Set Up the Project
- The Sales Dataset
- Connect and Create the Table
- Insert the Data
- Basic Queries
- Analytical Queries
- What This Looks Like in PostgreSQL
- When to Reach for DuckDB
- Wrapping Up
Create a DuckDB Instance
Local with SpinDB
SpinDB is the quickest path to a local DuckDB instance. It's a CLI that runs databases without Docker. (What is SpinDB?)
Install SpinDB globally:
npm i -g spindb # npm
pnpm add -g spindb # pnpmOr run it directly without installing:
npx spindb create duck1 -e duckdb --start # npm
pnpx spindb create duck1 -e duckdb --start # pnpmIf you installed globally, create and start a DuckDB instance:
spindb create duck1 -e duckdb --startSpinDB downloads the DuckDB binary for your platform, sets up a PostgreSQL-compatible proxy, and starts the server. Verify it's running:
spindb url duck1postgresql://127.0.0.1:5432/duck1Leave the server running. We'll connect to it from TypeScript in the next section.
Layerbase Cloud
Rather skip the local setup? Layerbase Cloud gives you a managed DuckDB instance in a few clicks. Select DuckDB, and your PostgreSQL-compatible connection string shows up in the Quick Connect panel.
Cloud instances use TLS, so the connection code is slightly different:
import pg from 'pg'
const client = new pg.Client({
connectionString: 'postgresql://user:pass@cloud.layerbase.dev:15432/duck1?sslmode=require',
})
await client.connect()Everything else in this guide works identically whether you're running locally or on Layerbase Cloud. Just swap in your connection details.
Set Up the Project
mkdir duckdb-analytics && cd duckdb-analytics
pnpm init
pnpm add pg
pnpm add -D tsx typescript @types/pgCreate a file called analytics.ts. All the code in this post goes into that one file.
The Sales Dataset
Here are 25 sales transactions across four product categories. In production this would come from your transactional database or a CSV export:
const sales = [
{ id: 1, product: 'Laptop', category: 'Electronics', region: 'North', quantity: 2, price: 999.99, sale_date: '2026-01-05' },
{ id: 2, product: 'T-Shirt', category: 'Clothing', region: 'South', quantity: 15, price: 24.99, sale_date: '2026-01-08' },
{ id: 3, product: 'Desk Lamp', category: 'Home', region: 'East', quantity: 5, price: 45.00, sale_date: '2026-01-10' },
{ id: 4, product: 'Running Shoes', category: 'Sports', region: 'West', quantity: 3, price: 129.99, sale_date: '2026-01-12' },
{ id: 5, product: 'Headphones', category: 'Electronics', region: 'North', quantity: 8, price: 79.99, sale_date: '2026-01-15' },
{ id: 6, product: 'Jeans', category: 'Clothing', region: 'East', quantity: 10, price: 59.99, sale_date: '2026-01-18' },
{ id: 7, product: 'Yoga Mat', category: 'Sports', region: 'South', quantity: 12, price: 34.99, sale_date: '2026-01-20' },
{ id: 8, product: 'Bookshelf', category: 'Home', region: 'North', quantity: 1, price: 189.99, sale_date: '2026-01-22' },
{ id: 9, product: 'Tablet', category: 'Electronics', region: 'West', quantity: 4, price: 499.99, sale_date: '2026-01-25' },
{ id: 10, product: 'Winter Jacket', category: 'Clothing', region: 'North', quantity: 6, price: 149.99, sale_date: '2026-01-28' },
{ id: 11, product: 'Basketball', category: 'Sports', region: 'East', quantity: 20, price: 29.99, sale_date: '2026-02-02' },
{ id: 12, product: 'Monitor', category: 'Electronics', region: 'South', quantity: 3, price: 349.99, sale_date: '2026-02-05' },
{ id: 13, product: 'Throw Pillow', category: 'Home', region: 'West', quantity: 8, price: 22.99, sale_date: '2026-02-08' },
{ id: 14, product: 'Sneakers', category: 'Clothing', region: 'East', quantity: 7, price: 89.99, sale_date: '2026-02-10' },
{ id: 15, product: 'Dumbbell Set', category: 'Sports', region: 'North', quantity: 2, price: 199.99, sale_date: '2026-02-12' },
{ id: 16, product: 'Keyboard', category: 'Electronics', region: 'East', quantity: 10, price: 69.99, sale_date: '2026-02-15' },
{ id: 17, product: 'Coffee Table', category: 'Home', region: 'South', quantity: 1, price: 259.99, sale_date: '2026-02-18' },
{ id: 18, product: 'Polo Shirt', category: 'Clothing', region: 'West', quantity: 9, price: 44.99, sale_date: '2026-02-20' },
{ id: 19, product: 'Tennis Racket', category: 'Sports', region: 'South', quantity: 4, price: 159.99, sale_date: '2026-02-22' },
{ id: 20, product: 'Webcam', category: 'Electronics', region: 'North', quantity: 6, price: 59.99, sale_date: '2026-02-25' },
{ id: 21, product: 'Curtains', category: 'Home', region: 'East', quantity: 3, price: 79.99, sale_date: '2026-03-01' },
{ id: 22, product: 'Hoodie', category: 'Clothing', region: 'North', quantity: 11, price: 64.99, sale_date: '2026-03-05' },
{ id: 23, product: 'Laptop', category: 'Electronics', region: 'South', quantity: 1, price: 999.99, sale_date: '2026-03-08' },
{ id: 24, product: 'Resistance Bands', category: 'Sports', region: 'West', quantity: 15, price: 19.99, sale_date: '2026-03-10' },
{ id: 25, product: 'Floor Lamp', category: 'Home', region: 'North', quantity: 2, price: 129.99, sale_date: '2026-03-12' },
]Connect and Create the Table
Since DuckDB runs behind a PostgreSQL-compatible proxy in SpinDB, you connect with the standard pg package. Every SQL statement runs as a normal query:
import pg from 'pg'
const connectionString = 'postgresql://127.0.0.1:5432/duck1' // from `spindb url duck1`
const client = new pg.Client({ connectionString })
await client.connect()
await client.query(`
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
id INTEGER PRIMARY KEY,
product VARCHAR,
category VARCHAR,
region VARCHAR,
quantity INTEGER,
price DECIMAL(10, 2),
sale_date DATE
)
`)
console.log('Created sales table')Insert the Data
Insert all 25 rows using a single parameterized query per row:
for (const s of sales) {
await client.query(
`INSERT INTO sales (id, product, category, region, quantity, price, sale_date)
VALUES ($1, $2, $3, $4, $5, $6, $7)`,
[s.id, s.product, s.category, s.region, s.quantity, s.price, s.sale_date],
)
}
console.log(`Inserted ${sales.length} sales records`)Basic Queries
Start with a simple aggregation to make sure everything is working:
const totalRevenue = await client.query(`
SELECT
category,
SUM(quantity * price) AS revenue,
SUM(quantity) AS units_sold
FROM sales
GROUP BY category
ORDER BY revenue DESC
`)
console.log('\nRevenue by category:')
for (const row of totalRevenue.rows) {
console.log(` ${row.category}: $${Number(row.revenue).toFixed(2)} (${row.units_sold} units)`)
}Revenue by category:
Electronics: $7,749.66 (34 units)
Clothing: $3,624.42 (58 units)
Sports: $2,749.44 (56 units)
Home: $1,358.85 (20 units)This works fine in PostgreSQL too. The interesting part comes next.
Analytical Queries
Revenue by Category with ROLLUP
ROLLUP generates subtotals and a grand total in one query. PostgreSQL supports ROLLUP too, but DuckDB executes it significantly faster over large datasets thanks to columnar storage and vectorized execution:
const rollup = await client.query(`
SELECT
COALESCE(category, 'GRAND TOTAL') AS category,
COALESCE(region, 'Subtotal') AS region,
SUM(quantity * price) AS revenue,
COUNT(*) AS transactions
FROM sales
GROUP BY ROLLUP(category, region)
ORDER BY category, region
`)
console.log('\nRevenue with ROLLUP (subtotals + grand total):')
for (const row of rollup.rows) {
const indent = row.region === 'Subtotal' || row.category === 'GRAND TOTAL' ? '' : ' '
console.log(
`${indent}${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)} (${row.transactions} txns)`,
)
}Revenue with ROLLUP (subtotals + grand total):
Clothing / East: $1,229.86 (2 txns)
Clothing / North: $1,614.83 (2 txns)
Clothing / South: $374.85 (1 txns)
Clothing / West: $404.91 (1 txns)
Clothing / Subtotal: $3,624.45 (6 txns)
Electronics / East: $699.90 (1 txns)
Electronics / North: $2,999.90 (3 txns)
Electronics / South: $2,049.96 (2 txns)
Electronics / West: $1,999.96 (1 txns)
Electronics / Subtotal: $7,749.72 (7 txns)
Home / East: $464.97 (2 txns)
Home / North: $449.97 (2 txns)
Home / South: $259.99 (1 txns)
Home / West: $183.92 (1 txns)
Home / Subtotal: $1,358.85 (6 txns)
Sports / East: $599.80 (1 txns)
Sports / North: $399.98 (1 txns)
Sports / South: $1,059.84 (2 txns)
Sports / West: $689.82 (2 txns)
Sports / Subtotal: $2,749.44 (6 txns)
GRAND TOTAL / Subtotal: $15,482.46 (25 txns)One query, full breakdown with subtotals per category and a grand total at the bottom. Without ROLLUP, you'd need multiple queries or UNION ALL blocks to produce this.
Running Total by Date
Window functions calculate cumulative revenue over time. PostgreSQL handles these too, but DuckDB's vectorized engine processes them more efficiently on analytical workloads:
const running = await client.query(`
SELECT
sale_date,
SUM(quantity * price) AS daily_revenue,
SUM(SUM(quantity * price)) OVER (ORDER BY sale_date) AS running_total
FROM sales
GROUP BY sale_date
ORDER BY sale_date
`)
console.log('\nRunning revenue total by date:')
for (const row of running.rows) {
const date = new Date(row.sale_date).toISOString().split('T')[0]
console.log(
` ${date}: $${Number(row.daily_revenue).toFixed(2)} (cumulative: $${Number(row.running_total).toFixed(2)})`,
)
}Running revenue total by date:
2026-01-05: $1,999.98 (cumulative: $1,999.98)
2026-01-08: $374.85 (cumulative: $2,374.83)
2026-01-10: $225.00 (cumulative: $2,599.83)
2026-01-12: $389.97 (cumulative: $2,989.80)
2026-01-15: $639.92 (cumulative: $3,629.72)
2026-01-18: $599.90 (cumulative: $4,229.62)
...PIVOT Table
This is where DuckDB really pulls ahead. Native PIVOT syntax lets you transform rows into columns with a single, readable statement:
const pivot = await client.query(`
PIVOT sales
ON MONTH(sale_date)
USING SUM(quantity * price) AS revenue
GROUP BY category
ORDER BY category
`)
console.log('\nPIVOT: Revenue by category across months:')
console.log(' Category | January | February | March')
console.log(' -------------|------------|------------|----------')
for (const row of pivot.rows) {
const jan = Number(row['1_revenue'] || 0).toFixed(2).padStart(10)
const feb = Number(row['2_revenue'] || 0).toFixed(2).padStart(10)
const mar = Number(row['3_revenue'] || 0).toFixed(2).padStart(10)
console.log(` ${row.category.padEnd(13)}|${jan} |${feb} |${mar}`)
}PIVOT: Revenue by category across months:
Category | January | February | March
-------------|------------|------------|----------
Clothing | 1,874.69 | 1,034.84 | 714.89
Electronics | 4,639.86 | 2,109.81 | 999.99
Home | 414.99 | 443.91 | 499.95
Sports | 809.85 | 1,639.74 | 299.85One line of SQL. PIVOT takes the distinct values from MONTH(sale_date), creates one column per value, and fills each cell with the aggregation result. No manual column definitions, no conditional logic.
What This Looks Like in PostgreSQL
PostgreSQL has no PIVOT keyword. To produce the same output, you need conditional aggregation with CASE WHEN for every column:
-- PostgreSQL equivalent of the PIVOT query above
SELECT
category,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN quantity * price ELSE 0 END) AS january,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN quantity * price ELSE 0 END) AS february,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN quantity * price ELSE 0 END) AS march
FROM sales
GROUP BY category
ORDER BY category;This works, but you have to know the month values ahead of time and write a CASE WHEN per column. Add April data? Update the query. DuckDB's PIVOT handles new values automatically.
For more complex pivots (say, by region and month), the CASE WHEN approach gets deeply nested and hard to read. DuckDB's PIVOT stays flat:
PIVOT sales
ON MONTH(sale_date)
USING SUM(quantity * price) AS revenue
GROUP BY category, region
ORDER BY category, regionThat difference compounds fast as reports grow.
DuckDB can also query external files directly, no import needed:
SELECT * FROM read_csv_auto('/path/to/sales.csv');
SELECT * FROM read_parquet('/path/to/sales.parquet');Super handy for ad-hoc analysis. Point DuckDB at a CSV and start running SQL immediately, no schema definition or data loading required.
When to Reach for DuckDB
DuckDB fits a specific set of problems well:
- Ad-hoc analytics: explore datasets with SQL without setting up an ETL pipeline
- CSV and Parquet analysis: query files directly from disk with
read_csv_auto()andread_parquet() - Embedded analytics: add analytical queries to an application without standing up a data warehouse
- Data transformation pipelines: clean, reshape, and aggregate data with SQL instead of Python/pandas
- Aggregation-heavy workloads: any query pattern that touches many rows but few columns benefits from columnar storage
- Reports with pivots and rollups: native
PIVOTandROLLUPsyntax keeps complex reports readable
If your queries spend most of their time scanning and aggregating rather than doing point lookups or single-row writes, DuckDB will be significantly faster than a row-oriented database. For transactional workloads, PostgreSQL is still the better choice.
Wrapping Up
The full script is under 80 lines of meaningful code. You created a sales table, inserted data, and ran analytical queries where DuckDB has real advantages over PostgreSQL: rollups for hierarchical totals, window functions for running aggregations, and native PIVOT for cross-tab reports. The same pg client you already know works unchanged because SpinDB provides a PostgreSQL-compatible proxy.
The DuckDB documentation covers window functions, nested types, JSON processing, spatial extensions, and direct Parquet/CSV integration.
To manage your local DuckDB instance:
spindb stop duck1 # Stop the server
spindb start duck1 # Start it again
spindb list # See all your database instancesSpinDB handles 20+ engines, so you can keep DuckDB running next to your PostgreSQL app database and a MongoDB instance for document storage, all from one CLI. Layerbase Desktop wraps the same thing in a GUI if you prefer that on macOS.