ClickHouse vs DuckDB
Your dashboard queries are getting slow. You're running GROUP BY over millions of rows and PostgreSQL is taking seconds when your users expect milliseconds. Or maybe your PM just asked for a new analytics view and you know the current setup won't handle it. You need a columnar analytics engine, but should you reach for ClickHouse or DuckDB?
They solve the same fundamental problem (fast aggregations over large datasets) in completely different ways. ClickHouse is a production server: you deploy it, connect multiple clients, and it ingests millions of rows per second while serving queries. It's what powers the analytics behind Cloudflare, Uber, and eBay. DuckDB is basically SQLite for analytics: an embedded engine with no server, no configuration, and the ability to query CSV and Parquet files directly from disk. You pip install it or pnpm add it and start running SQL.
We'll run the same analytical query in both, then dig into what each one can do that the other can't.
Contents
- Quick Comparison
- Set Up Both Locally
- The Same Query in Both
- What ClickHouse Can Do That DuckDB Can't
- What DuckDB Can Do That ClickHouse Can't
- Key Architectural Differences
- When to Pick Which
- Wrapping Up
Quick Comparison
| ClickHouse | DuckDB | |
|---|---|---|
| Architecture | Client-server (standalone process) | Embedded (in-process, no server) |
| Deployment | Runs as a service, supports clustering | Single file or in-memory, no daemon |
| Scale | Horizontal (sharding, replication) | Single machine, single process |
| Concurrency | Many concurrent readers/writers | Single writer, multiple readers |
| Real-time ingestion | Yes, millions of rows/sec | Batch-oriented |
| File querying | Limited (external table functions) | Native CSV, Parquet, JSON from disk |
| SQL dialect | ClickHouse SQL (custom functions) | PostgreSQL-compatible with extensions |
| Table engines | MergeTree family (configurable) | Standard tables (no engine choice) |
| PIVOT syntax | No native PIVOT | Yes, native PIVOT/UNPIVOT |
| Funnel analysis | Built-in windowFunnel | Manual (window functions + CTEs) |
| Best for | Production dashboards at scale | Local/embedded analytics and ad-hoc exploration |
Set Up Both Locally
SpinDB runs both engines locally. No Docker. (What is SpinDB?)
Install it globally:
npm i -g spindb # npm
pnpm add -g spindb # pnpmClickHouse with SpinDB
spindb create click1 -e clickhouse --startCheck the URL:
spindb url click1http://127.0.0.1:8123ClickHouse exposes an HTTP interface. You connect with the @clickhouse/client package.
DuckDB with SpinDB
spindb create duck1 -e duckdb --startCheck the URL:
spindb url duck1postgresql://127.0.0.1:5432/duck1DuckDB runs behind a PostgreSQL-compatible proxy in SpinDB, so you connect with the standard pg package. Same driver you'd use for PostgreSQL.
Project Setup
mkdir analytics-compare && cd analytics-compare
pnpm init
pnpm add @clickhouse/client pg
pnpm add -D tsx typescript @types/pgCreate a file called compare.ts. All the code in this post goes into that one file.
The Same Query in Both
Best way to compare: run the exact same workload. Here are 20 sales records we'll load into both engines:
const sales = [
{ id: 1, product: 'Laptop', category: 'Electronics', region: 'North', quantity: 2, price: 999.99, sale_date: '2026-01-05' },
{ id: 2, product: 'T-Shirt', category: 'Clothing', region: 'South', quantity: 15, price: 24.99, sale_date: '2026-01-08' },
{ id: 3, product: 'Desk Lamp', category: 'Home', region: 'East', quantity: 5, price: 45.00, sale_date: '2026-01-10' },
{ id: 4, product: 'Running Shoes', category: 'Sports', region: 'West', quantity: 3, price: 129.99, sale_date: '2026-01-12' },
{ id: 5, product: 'Headphones', category: 'Electronics', region: 'North', quantity: 8, price: 79.99, sale_date: '2026-01-15' },
{ id: 6, product: 'Jeans', category: 'Clothing', region: 'East', quantity: 10, price: 59.99, sale_date: '2026-01-18' },
{ id: 7, product: 'Yoga Mat', category: 'Sports', region: 'South', quantity: 12, price: 34.99, sale_date: '2026-01-20' },
{ id: 8, product: 'Bookshelf', category: 'Home', region: 'North', quantity: 1, price: 189.99, sale_date: '2026-01-22' },
{ id: 9, product: 'Tablet', category: 'Electronics', region: 'West', quantity: 4, price: 499.99, sale_date: '2026-01-25' },
{ id: 10, product: 'Winter Jacket', category: 'Clothing', region: 'North', quantity: 6, price: 149.99, sale_date: '2026-01-28' },
{ id: 11, product: 'Basketball', category: 'Sports', region: 'East', quantity: 20, price: 29.99, sale_date: '2026-02-02' },
{ id: 12, product: 'Monitor', category: 'Electronics', region: 'South', quantity: 3, price: 349.99, sale_date: '2026-02-05' },
{ id: 13, product: 'Throw Pillow', category: 'Home', region: 'West', quantity: 8, price: 22.99, sale_date: '2026-02-08' },
{ id: 14, product: 'Sneakers', category: 'Clothing', region: 'East', quantity: 7, price: 89.99, sale_date: '2026-02-10' },
{ id: 15, product: 'Dumbbell Set', category: 'Sports', region: 'North', quantity: 2, price: 199.99, sale_date: '2026-02-12' },
{ id: 16, product: 'Keyboard', category: 'Electronics', region: 'East', quantity: 10, price: 69.99, sale_date: '2026-02-15' },
{ id: 17, product: 'Coffee Table', category: 'Home', region: 'South', quantity: 1, price: 259.99, sale_date: '2026-02-18' },
{ id: 18, product: 'Polo Shirt', category: 'Clothing', region: 'West', quantity: 9, price: 44.99, sale_date: '2026-02-20' },
{ id: 19, product: 'Tennis Racket', category: 'Sports', region: 'South', quantity: 4, price: 159.99, sale_date: '2026-02-22' },
{ id: 20, product: 'Webcam', category: 'Electronics', region: 'North', quantity: 6, price: 59.99, sale_date: '2026-02-25' },
]The query: revenue by category with a rollup that includes subtotals and a grand total. Bread-and-butter analytics.
ClickHouse: Revenue by Category
ClickHouse uses its own client library and MergeTree table engine:
import { createClient } from '@clickhouse/client'
const ch = createClient({ url: 'http://localhost:8123' })
// Create a MergeTree table (ClickHouse requires an engine and ORDER BY)
await ch.command({
query: `
CREATE TABLE IF NOT EXISTS sales (
id UInt32,
product String,
category String,
region String,
quantity UInt32,
price Float64,
sale_date Date
) ENGINE = MergeTree()
ORDER BY (category, sale_date)
`,
})
// Insert the data
await ch.insert({ table: 'sales', values: sales, format: 'JSONEachRow' })
// Revenue by category with rollup
const result = await ch.query({
query: `
SELECT
COALESCE(category, 'GRAND TOTAL') AS category,
COALESCE(region, 'Subtotal') AS region,
SUM(quantity * price) AS revenue,
COUNT(*) AS transactions
FROM sales
GROUP BY ROLLUP(category, region)
ORDER BY category, region
`,
format: 'JSONEachRow',
})
const rows = await result.json()
console.log('ClickHouse results:')
for (const row of rows) {
console.log(` ${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)}`)
}A few things to note. You must pick a table engine (MergeTree is the default for almost everything). You must specify ORDER BY, which defines both the primary index and the physical sort order on disk. No separate index creation step. The sort order is the index.
DuckDB: Revenue by Category
DuckDB uses the standard pg client through SpinDB's PostgreSQL proxy:
import pg from 'pg'
const duck = new pg.Client({ connectionString: 'postgresql://127.0.0.1:5432/duck1' })
await duck.connect()
// Create a standard table (no engine, no ORDER BY required)
await duck.query(`
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
id INTEGER PRIMARY KEY,
product VARCHAR,
category VARCHAR,
region VARCHAR,
quantity INTEGER,
price DECIMAL(10, 2),
sale_date DATE
)
`)
// Insert the data
for (const s of sales) {
await duck.query(
`INSERT INTO sales (id, product, category, region, quantity, price, sale_date)
VALUES ($1, $2, $3, $4, $5, $6, $7)`,
[s.id, s.product, s.category, s.region, s.quantity, s.price, s.sale_date],
)
}
// Revenue by category with rollup (identical SQL)
const result = await duck.query(`
SELECT
COALESCE(category, 'GRAND TOTAL') AS category,
COALESCE(region, 'Subtotal') AS region,
SUM(quantity * price) AS revenue,
COUNT(*) AS transactions
FROM sales
GROUP BY ROLLUP(category, region)
ORDER BY category, region
`)
console.log('DuckDB results:')
for (const row of result.rows) {
console.log(` ${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)}`)
}The SQL is identical. The results match. The difference is everything around the SQL: the client library, how you create the table, and what happens at scale.
DuckDB tables are simpler. No engine selection, no mandatory sort order. Standard relational DDL. The trade-off: you don't get to control the storage layout the way ClickHouse lets you.
What ClickHouse Can Do That DuckDB Can't
Funnel Analysis with windowFunnel
ClickHouse has a built-in windowFunnel function for tracking user journeys through a sequence of events. Given timestamped events, it tells you how far each user got through a defined funnel within a time window.
// Track how many users went from /home -> /pricing -> /docs in sequence
const funnel = await ch.query({
query: `
SELECT
step,
count() AS users
FROM (
SELECT
windowFunnel(7200)(timestamp, page = '/home', page = '/pricing', page = '/docs') AS step
FROM page_views
GROUP BY user_id
)
GROUP BY step
ORDER BY step
`,
format: 'JSONEachRow',
}) { step: 0, users: 4 } -- didn't enter the funnel
{ step: 1, users: 3 } -- visited /home only
{ step: 2, users: 5 } -- reached /pricingwindowFunnel(7200) defines a 2-hour window. It checks whether events happen in the specified order within that window and returns the furthest step each user reached.
Building this in DuckDB (or PostgreSQL) requires window functions, self-joins, or multi-step CTEs. Doable, but a lot more code for a pattern ClickHouse treats as a first-class operation.
Real-Time Ingestion
ClickHouse ingests millions of rows per second while simultaneously serving queries. Built for append-heavy workloads where data streams in continuously: click events, log lines, sensor readings. MergeTree handles concurrent writes by merging data parts in the background.
DuckDB is single-writer. Ingest data while queries are running and you'll hit contention. It's built for batch loads followed by analytical reads, not streaming pipelines.
Horizontal Scaling
ClickHouse clusters shard data across multiple nodes and replicate for fault tolerance. Production deployments can have dozens of nodes processing petabytes, with the query engine parallelizing across shards transparently.
DuckDB runs on one machine, in one process. Fast on that machine because it uses all available CPU cores, but it doesn't distribute work across a cluster.
What DuckDB Can Do That ClickHouse Can't
Native PIVOT
DuckDB has a native PIVOT keyword that transforms rows into columns in one statement:
const pivot = await duck.query(`
PIVOT sales
ON category
USING SUM(quantity * price) AS revenue
GROUP BY region
ORDER BY region
`)
console.log('DuckDB PIVOT results:')
for (const row of pivot.rows) {
console.log(` ${row.region}: Electronics=$${Number(row.Electronics_revenue).toFixed(2)}, Clothing=$${Number(row.Clothing_revenue).toFixed(2)}`)
}DuckDB PIVOT results:
East: Electronics=$699.90, Clothing=$1,229.86
North: Electronics=$2,999.90, Clothing=$1,614.83
South: Electronics=$2,049.96, Clothing=$374.85
West: Electronics=$1,999.96, Clothing=$404.91DuckDB discovers the distinct category values, creates one column per value, and fills in the aggregation. No manual column definitions, no conditional logic.
ClickHouse has no PIVOT keyword. You'd write conditional aggregation with if() for every column:
-- ClickHouse workaround for pivot
SELECT
region,
SUM(if(category = 'Electronics', quantity * price, 0)) AS electronics_revenue,
SUM(if(category = 'Clothing', quantity * price, 0)) AS clothing_revenue,
SUM(if(category = 'Sports', quantity * price, 0)) AS sports_revenue,
SUM(if(category = 'Home', quantity * price, 0)) AS home_revenue
FROM sales
GROUP BY region
ORDER BY regionWorks, but you have to know every category value ahead of time. Add a new category and the query breaks. DuckDB's PIVOT handles new values automatically.
Direct File Querying
DuckDB can query CSV, Parquet, and JSON files directly from disk without importing anything:
-- Query a CSV file with no schema definition or import step
SELECT category, SUM(quantity * price) AS revenue
FROM read_csv_auto('/path/to/sales.csv')
GROUP BY category
ORDER BY revenue DESC;
-- Same with Parquet (columnar format, even faster)
SELECT * FROM read_parquet('/data/events/*.parquet')
WHERE event_date >= '2026-01-01';This is where DuckDB really shines. Someone sends you a CSV, you point DuckDB at it, and you're running SQL immediately. No schema definition, no import step, no ETL pipeline.
ClickHouse can read external files through table functions like file() and url(), but it's more limited. File querying is core to DuckDB's identity. In ClickHouse, it's a side feature.
PostgreSQL Wire Protocol
DuckDB (through SpinDB's PG proxy) speaks the PostgreSQL wire protocol. Any tool that connects to PostgreSQL connects to DuckDB: psql, pgAdmin, DBeaver, TablePlus, the pg npm package. No specialized client library needed.
ClickHouse has its own HTTP and native TCP protocols. You need @clickhouse/client for TypeScript, and most PostgreSQL tools won't connect directly. ClickHouse does have an experimental PG compatibility layer, but it's limited.
Key Architectural Differences
MergeTree vs Standard Tables
ClickHouse's MergeTree engine is central to how it works. When you define ORDER BY (category, sale_date), ClickHouse physically sorts data on disk in that order. Queries that filter or group by those columns skip entire blocks without scanning them. You're designing a custom index at the storage layer.
DuckDB doesn't expose table engines. It manages storage internally with automatic zone maps, min/max indexes, and adaptive compression. Simpler, but you have less control over query performance for specific access patterns.
Write Patterns
ClickHouse is optimized for large batch inserts and append-only workloads. Individual row updates and deletes exist but are expensive (they rewrite entire data parts). The mental model: data flows in, gets merged in the background, and is queryable immediately.
DuckDB handles standard INSERT/UPDATE/DELETE with ACID transactions. Proper relational write semantics. But it doesn't handle high-throughput concurrent writes the way ClickHouse does.
Concurrency
ClickHouse serves many concurrent clients. Multiple dashboards, multiple users, multiple ingestion pipelines hitting the same cluster simultaneously. That's the whole point of a server architecture.
DuckDB is single-process. Fast for one user running queries, but not designed to serve hundreds of concurrent dashboard users. Embedding DuckDB in a backend service works for pre-computed reports or low-concurrency internal tools, not customer-facing analytics.
When to Pick Which
Pick ClickHouse when:
- You're building production analytics dashboards that serve multiple concurrent users
- You need to ingest data in real time from event streams, logs, or click tracking
- Your dataset is large and growing (hundreds of millions to billions of rows)
- You need funnel analysis, retention curves, or session replay with built-in functions
- You're running a multi-node cluster for fault tolerance and horizontal scaling
- Your team already operates server infrastructure and can manage a database service
Pick DuckDB when:
- You're doing ad-hoc analysis on local files (CSVs, Parquet exports, data dumps)
- You want embedded analytics inside an application with no external database to manage
- Your dataset fits on one machine (DuckDB handles hundreds of gigabytes comfortably)
- You need PIVOT tables and other advanced SQL syntax without workarounds
- You're building data transformation pipelines where SQL is cleaner than Python/pandas
- You want the simplest possible setup with no server, no configuration, no ops overhead
The short version
DuckDB is for when you're the one asking the questions. You have data, you want answers, and you want them fast without standing up infrastructure.
ClickHouse is for when you're building a system that answers questions for other people. It's the production engine behind the dashboard, the monitoring tool, the analytics API.
They solve different problems. Running both is completely reasonable: ClickHouse for your production analytics pipeline, DuckDB for local exploration and prototyping. Both are just spindb create commands away.
Wrapping Up
The choice comes down to where your workload lives.
Building a product that shows analytics to users? ClickHouse. Exploring datasets locally as a developer or data engineer? DuckDB.
To manage your local instances:
spindb stop click1 # Stop ClickHouse
spindb stop duck1 # Stop DuckDB
spindb start click1 # Start ClickHouse again
spindb start duck1 # Start DuckDB again
spindb list # See all your database instancesSpinDB runs 20+ database engines from one CLI, so you can keep ClickHouse, DuckDB, PostgreSQL, and whatever else you need running side by side. For managed instances, Layerbase Cloud handles provisioning, TLS, and backups.
For deeper dives into each engine individually, check out Getting Started with ClickHouse and Getting Started with DuckDB.