Getting Started with DuckDB

DuckDBDatabasesAnalytics

PostgreSQL is the default choice for most applications, and rightly so. It handles transactions, CRUD, and relational queries exceptionally well. But once you need analytical queries over large datasets (aggregations, rollups, pivots), it starts to struggle. PostgreSQL stores data in rows, optimized for reading and writing individual records. Analytical queries that scan millions of rows across a few columns pay the cost of reading every column in every row.

DuckDB is built for the opposite workload. Columnar storage, vectorized execution, and query optimizations designed for aggregation-heavy work. It can also query CSV and Parquet files directly, no import step required. Think of it as SQLite for analytics.

We'll build a working analytics pipeline over a sales dataset in one TypeScript file. SpinDB runs DuckDB behind a PostgreSQL-compatible proxy, so you connect with the standard pg package. Run everything locally, or use a managed instance on Layerbase Cloud to skip the install.

Contents

Create a DuckDB Instance

Local with SpinDB

SpinDB is the quickest path to a local DuckDB instance. It's a CLI that runs databases without Docker. (What is SpinDB?)

Install SpinDB globally:

bash
npm i -g spindb    # npm
pnpm add -g spindb # pnpm

Or run it directly without installing:

bash
npx spindb create duck1 -e duckdb --start  # npm
pnpx spindb create duck1 -e duckdb --start # pnpm

If you installed globally, create and start a DuckDB instance:

bash
spindb create duck1 -e duckdb --start

SpinDB downloads the DuckDB binary for your platform, sets up a PostgreSQL-compatible proxy, and starts the server. Verify it's running:

bash
spindb url duck1
text
postgresql://127.0.0.1:5432/duck1

Leave the server running. We'll connect to it from TypeScript in the next section.

Layerbase Cloud

Rather skip the local setup? Layerbase Cloud gives you a managed DuckDB instance in a few clicks. Select DuckDB, and your PostgreSQL-compatible connection string shows up in the Quick Connect panel.

Cloud instances use TLS, so the connection code is slightly different:

typescript
import pg from 'pg'

const client = new pg.Client({
  connectionString: 'postgresql://user:pass@cloud.layerbase.dev:15432/duck1?sslmode=require',
})
await client.connect()

Everything else in this guide works identically whether you're running locally or on Layerbase Cloud. Just swap in your connection details.

Set Up the Project

bash
mkdir duckdb-analytics && cd duckdb-analytics
pnpm init
pnpm add pg
pnpm add -D tsx typescript @types/pg

Create a file called analytics.ts. All the code in this post goes into that one file.

The Sales Dataset

Here are 25 sales transactions across four product categories. In production this would come from your transactional database or a CSV export:

typescript
const sales = [
  { id: 1, product: 'Laptop', category: 'Electronics', region: 'North', quantity: 2, price: 999.99, sale_date: '2026-01-05' },
  { id: 2, product: 'T-Shirt', category: 'Clothing', region: 'South', quantity: 15, price: 24.99, sale_date: '2026-01-08' },
  { id: 3, product: 'Desk Lamp', category: 'Home', region: 'East', quantity: 5, price: 45.00, sale_date: '2026-01-10' },
  { id: 4, product: 'Running Shoes', category: 'Sports', region: 'West', quantity: 3, price: 129.99, sale_date: '2026-01-12' },
  { id: 5, product: 'Headphones', category: 'Electronics', region: 'North', quantity: 8, price: 79.99, sale_date: '2026-01-15' },
  { id: 6, product: 'Jeans', category: 'Clothing', region: 'East', quantity: 10, price: 59.99, sale_date: '2026-01-18' },
  { id: 7, product: 'Yoga Mat', category: 'Sports', region: 'South', quantity: 12, price: 34.99, sale_date: '2026-01-20' },
  { id: 8, product: 'Bookshelf', category: 'Home', region: 'North', quantity: 1, price: 189.99, sale_date: '2026-01-22' },
  { id: 9, product: 'Tablet', category: 'Electronics', region: 'West', quantity: 4, price: 499.99, sale_date: '2026-01-25' },
  { id: 10, product: 'Winter Jacket', category: 'Clothing', region: 'North', quantity: 6, price: 149.99, sale_date: '2026-01-28' },
  { id: 11, product: 'Basketball', category: 'Sports', region: 'East', quantity: 20, price: 29.99, sale_date: '2026-02-02' },
  { id: 12, product: 'Monitor', category: 'Electronics', region: 'South', quantity: 3, price: 349.99, sale_date: '2026-02-05' },
  { id: 13, product: 'Throw Pillow', category: 'Home', region: 'West', quantity: 8, price: 22.99, sale_date: '2026-02-08' },
  { id: 14, product: 'Sneakers', category: 'Clothing', region: 'East', quantity: 7, price: 89.99, sale_date: '2026-02-10' },
  { id: 15, product: 'Dumbbell Set', category: 'Sports', region: 'North', quantity: 2, price: 199.99, sale_date: '2026-02-12' },
  { id: 16, product: 'Keyboard', category: 'Electronics', region: 'East', quantity: 10, price: 69.99, sale_date: '2026-02-15' },
  { id: 17, product: 'Coffee Table', category: 'Home', region: 'South', quantity: 1, price: 259.99, sale_date: '2026-02-18' },
  { id: 18, product: 'Polo Shirt', category: 'Clothing', region: 'West', quantity: 9, price: 44.99, sale_date: '2026-02-20' },
  { id: 19, product: 'Tennis Racket', category: 'Sports', region: 'South', quantity: 4, price: 159.99, sale_date: '2026-02-22' },
  { id: 20, product: 'Webcam', category: 'Electronics', region: 'North', quantity: 6, price: 59.99, sale_date: '2026-02-25' },
  { id: 21, product: 'Curtains', category: 'Home', region: 'East', quantity: 3, price: 79.99, sale_date: '2026-03-01' },
  { id: 22, product: 'Hoodie', category: 'Clothing', region: 'North', quantity: 11, price: 64.99, sale_date: '2026-03-05' },
  { id: 23, product: 'Laptop', category: 'Electronics', region: 'South', quantity: 1, price: 999.99, sale_date: '2026-03-08' },
  { id: 24, product: 'Resistance Bands', category: 'Sports', region: 'West', quantity: 15, price: 19.99, sale_date: '2026-03-10' },
  { id: 25, product: 'Floor Lamp', category: 'Home', region: 'North', quantity: 2, price: 129.99, sale_date: '2026-03-12' },
]

Connect and Create the Table

Since DuckDB runs behind a PostgreSQL-compatible proxy in SpinDB, you connect with the standard pg package. Every SQL statement runs as a normal query:

typescript
import pg from 'pg'

const connectionString = 'postgresql://127.0.0.1:5432/duck1' // from `spindb url duck1`

const client = new pg.Client({ connectionString })
await client.connect()

await client.query(`
  DROP TABLE IF EXISTS sales;
  CREATE TABLE sales (
    id INTEGER PRIMARY KEY,
    product VARCHAR,
    category VARCHAR,
    region VARCHAR,
    quantity INTEGER,
    price DECIMAL(10, 2),
    sale_date DATE
  )
`)

console.log('Created sales table')

Insert the Data

Insert all 25 rows using a single parameterized query per row:

typescript
for (const s of sales) {
  await client.query(
    `INSERT INTO sales (id, product, category, region, quantity, price, sale_date)
     VALUES ($1, $2, $3, $4, $5, $6, $7)`,
    [s.id, s.product, s.category, s.region, s.quantity, s.price, s.sale_date],
  )
}

console.log(`Inserted ${sales.length} sales records`)

Basic Queries

Start with a simple aggregation to make sure everything is working:

typescript
const totalRevenue = await client.query(`
  SELECT
    category,
    SUM(quantity * price) AS revenue,
    SUM(quantity) AS units_sold
  FROM sales
  GROUP BY category
  ORDER BY revenue DESC
`)

console.log('\nRevenue by category:')
for (const row of totalRevenue.rows) {
  console.log(`  ${row.category}: $${Number(row.revenue).toFixed(2)} (${row.units_sold} units)`)
}
text
Revenue by category:
  Electronics: $7,749.66 (34 units)
  Clothing: $3,624.42 (58 units)
  Sports: $2,749.44 (56 units)
  Home: $1,358.85 (20 units)

This works fine in PostgreSQL too. The interesting part comes next.

Analytical Queries

Revenue by Category with ROLLUP

ROLLUP generates subtotals and a grand total in one query. PostgreSQL supports ROLLUP too, but DuckDB executes it significantly faster over large datasets thanks to columnar storage and vectorized execution:

typescript
const rollup = await client.query(`
  SELECT
    COALESCE(category, 'GRAND TOTAL') AS category,
    COALESCE(region, 'Subtotal') AS region,
    SUM(quantity * price) AS revenue,
    COUNT(*) AS transactions
  FROM sales
  GROUP BY ROLLUP(category, region)
  ORDER BY category, region
`)

console.log('\nRevenue with ROLLUP (subtotals + grand total):')
for (const row of rollup.rows) {
  const indent = row.region === 'Subtotal' || row.category === 'GRAND TOTAL' ? '' : '  '
  console.log(
    `${indent}${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)} (${row.transactions} txns)`,
  )
}
text
Revenue with ROLLUP (subtotals + grand total):
  Clothing / East: $1,229.86 (2 txns)
  Clothing / North: $1,614.83 (2 txns)
  Clothing / South: $374.85 (1 txns)
  Clothing / West: $404.91 (1 txns)
Clothing / Subtotal: $3,624.45 (6 txns)
  Electronics / East: $699.90 (1 txns)
  Electronics / North: $2,999.90 (3 txns)
  Electronics / South: $2,049.96 (2 txns)
  Electronics / West: $1,999.96 (1 txns)
Electronics / Subtotal: $7,749.72 (7 txns)
  Home / East: $464.97 (2 txns)
  Home / North: $449.97 (2 txns)
  Home / South: $259.99 (1 txns)
  Home / West: $183.92 (1 txns)
Home / Subtotal: $1,358.85 (6 txns)
  Sports / East: $599.80 (1 txns)
  Sports / North: $399.98 (1 txns)
  Sports / South: $1,059.84 (2 txns)
  Sports / West: $689.82 (2 txns)
Sports / Subtotal: $2,749.44 (6 txns)
GRAND TOTAL / Subtotal: $15,482.46 (25 txns)

One query, full breakdown with subtotals per category and a grand total at the bottom. Without ROLLUP, you'd need multiple queries or UNION ALL blocks to produce this.

Running Total by Date

Window functions calculate cumulative revenue over time. PostgreSQL handles these too, but DuckDB's vectorized engine processes them more efficiently on analytical workloads:

typescript
const running = await client.query(`
  SELECT
    sale_date,
    SUM(quantity * price) AS daily_revenue,
    SUM(SUM(quantity * price)) OVER (ORDER BY sale_date) AS running_total
  FROM sales
  GROUP BY sale_date
  ORDER BY sale_date
`)

console.log('\nRunning revenue total by date:')
for (const row of running.rows) {
  const date = new Date(row.sale_date).toISOString().split('T')[0]
  console.log(
    `  ${date}: $${Number(row.daily_revenue).toFixed(2)} (cumulative: $${Number(row.running_total).toFixed(2)})`,
  )
}
text
Running revenue total by date:
  2026-01-05: $1,999.98 (cumulative: $1,999.98)
  2026-01-08: $374.85 (cumulative: $2,374.83)
  2026-01-10: $225.00 (cumulative: $2,599.83)
  2026-01-12: $389.97 (cumulative: $2,989.80)
  2026-01-15: $639.92 (cumulative: $3,629.72)
  2026-01-18: $599.90 (cumulative: $4,229.62)
  ...

PIVOT Table

This is where DuckDB really pulls ahead. Native PIVOT syntax lets you transform rows into columns with a single, readable statement:

typescript
const pivot = await client.query(`
  PIVOT sales
  ON MONTH(sale_date)
  USING SUM(quantity * price) AS revenue
  GROUP BY category
  ORDER BY category
`)

console.log('\nPIVOT: Revenue by category across months:')
console.log('  Category     | January    | February   | March')
console.log('  -------------|------------|------------|----------')
for (const row of pivot.rows) {
  const jan = Number(row['1_revenue'] || 0).toFixed(2).padStart(10)
  const feb = Number(row['2_revenue'] || 0).toFixed(2).padStart(10)
  const mar = Number(row['3_revenue'] || 0).toFixed(2).padStart(10)
  console.log(`  ${row.category.padEnd(13)}|${jan}  |${feb}  |${mar}`)
}
text
PIVOT: Revenue by category across months:
  Category     | January    | February   | March
  -------------|------------|------------|----------
  Clothing     |   1,874.69  |   1,034.84  |    714.89
  Electronics  |   4,639.86  |   2,109.81  |    999.99
  Home         |    414.99  |    443.91  |    499.95
  Sports       |    809.85  |   1,639.74  |    299.85

One line of SQL. PIVOT takes the distinct values from MONTH(sale_date), creates one column per value, and fills each cell with the aggregation result. No manual column definitions, no conditional logic.

What This Looks Like in PostgreSQL

PostgreSQL has no PIVOT keyword. To produce the same output, you need conditional aggregation with CASE WHEN for every column:

sql
-- PostgreSQL equivalent of the PIVOT query above
SELECT
  category,
  SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN quantity * price ELSE 0 END) AS january,
  SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN quantity * price ELSE 0 END) AS february,
  SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN quantity * price ELSE 0 END) AS march
FROM sales
GROUP BY category
ORDER BY category;

This works, but you have to know the month values ahead of time and write a CASE WHEN per column. Add April data? Update the query. DuckDB's PIVOT handles new values automatically.

For more complex pivots (say, by region and month), the CASE WHEN approach gets deeply nested and hard to read. DuckDB's PIVOT stays flat:

sql
PIVOT sales
ON MONTH(sale_date)
USING SUM(quantity * price) AS revenue
GROUP BY category, region
ORDER BY category, region

That difference compounds fast as reports grow.

DuckDB can also query external files directly, no import needed:

sql
SELECT * FROM read_csv_auto('/path/to/sales.csv');
SELECT * FROM read_parquet('/path/to/sales.parquet');

Super handy for ad-hoc analysis. Point DuckDB at a CSV and start running SQL immediately, no schema definition or data loading required.

When to Reach for DuckDB

DuckDB fits a specific set of problems well:

  • Ad-hoc analytics: explore datasets with SQL without setting up an ETL pipeline
  • CSV and Parquet analysis: query files directly from disk with read_csv_auto() and read_parquet()
  • Embedded analytics: add analytical queries to an application without standing up a data warehouse
  • Data transformation pipelines: clean, reshape, and aggregate data with SQL instead of Python/pandas
  • Aggregation-heavy workloads: any query pattern that touches many rows but few columns benefits from columnar storage
  • Reports with pivots and rollups: native PIVOT and ROLLUP syntax keeps complex reports readable

If your queries spend most of their time scanning and aggregating rather than doing point lookups or single-row writes, DuckDB will be significantly faster than a row-oriented database. For transactional workloads, PostgreSQL is still the better choice.

Wrapping Up

The full script is under 80 lines of meaningful code. You created a sales table, inserted data, and ran analytical queries where DuckDB has real advantages over PostgreSQL: rollups for hierarchical totals, window functions for running aggregations, and native PIVOT for cross-tab reports. The same pg client you already know works unchanged because SpinDB provides a PostgreSQL-compatible proxy.

The DuckDB documentation covers window functions, nested types, JSON processing, spatial extensions, and direct Parquet/CSV integration.

To manage your local DuckDB instance:

bash
spindb stop duck1    # Stop the server
spindb start duck1   # Start it again
spindb list          # See all your database instances

SpinDB handles 20+ engines, so you can keep DuckDB running next to your PostgreSQL app database and a MongoDB instance for document storage, all from one CLI. Layerbase Desktop wraps the same thing in a GUI if you prefer that on macOS.

Something not working?