ClickHouse vs DuckDB

ClickHouseDuckDBAnalytics

Your dashboard queries are getting slow. You're running GROUP BY over millions of rows and PostgreSQL is taking seconds when your users expect milliseconds. Or maybe your PM just asked for a new analytics view and you know the current setup won't handle it. You need a columnar analytics engine, but should you reach for ClickHouse or DuckDB?

They solve the same fundamental problem (fast aggregations over large datasets) in completely different ways. ClickHouse is a production server: you deploy it, connect multiple clients, and it ingests millions of rows per second while serving queries. It's what powers the analytics behind Cloudflare, Uber, and eBay. DuckDB is basically SQLite for analytics: an embedded engine with no server, no configuration, and the ability to query CSV and Parquet files directly from disk. You pip install it or pnpm add it and start running SQL.

We'll run the same analytical query in both, then dig into what each one can do that the other can't.

Contents

Quick Comparison

ClickHouseDuckDB
ArchitectureClient-server (standalone process)Embedded (in-process, no server)
DeploymentRuns as a service, supports clusteringSingle file or in-memory, no daemon
ScaleHorizontal (sharding, replication)Single machine, single process
ConcurrencyMany concurrent readers/writersSingle writer, multiple readers
Real-time ingestionYes, millions of rows/secBatch-oriented
File queryingLimited (external table functions)Native CSV, Parquet, JSON from disk
SQL dialectClickHouse SQL (custom functions)PostgreSQL-compatible with extensions
Table enginesMergeTree family (configurable)Standard tables (no engine choice)
PIVOT syntaxNo native PIVOTYes, native PIVOT/UNPIVOT
Funnel analysisBuilt-in windowFunnelManual (window functions + CTEs)
Best forProduction dashboards at scaleLocal/embedded analytics and ad-hoc exploration

Set Up Both Locally

SpinDB runs both engines locally. No Docker. (What is SpinDB?)

Install it globally:

bash
npm i -g spindb    # npm
pnpm add -g spindb # pnpm

ClickHouse with SpinDB

bash
spindb create click1 -e clickhouse --start

Check the URL:

bash
spindb url click1
text
http://127.0.0.1:8123

ClickHouse exposes an HTTP interface. You connect with the @clickhouse/client package.

DuckDB with SpinDB

bash
spindb create duck1 -e duckdb --start

Check the URL:

bash
spindb url duck1
text
postgresql://127.0.0.1:5432/duck1

DuckDB runs behind a PostgreSQL-compatible proxy in SpinDB, so you connect with the standard pg package. Same driver you'd use for PostgreSQL.

Project Setup

bash
mkdir analytics-compare && cd analytics-compare
pnpm init
pnpm add @clickhouse/client pg
pnpm add -D tsx typescript @types/pg

Create a file called compare.ts. All the code in this post goes into that one file.

The Same Query in Both

Best way to compare: run the exact same workload. Here are 20 sales records we'll load into both engines:

typescript
const sales = [
  { id: 1, product: 'Laptop', category: 'Electronics', region: 'North', quantity: 2, price: 999.99, sale_date: '2026-01-05' },
  { id: 2, product: 'T-Shirt', category: 'Clothing', region: 'South', quantity: 15, price: 24.99, sale_date: '2026-01-08' },
  { id: 3, product: 'Desk Lamp', category: 'Home', region: 'East', quantity: 5, price: 45.00, sale_date: '2026-01-10' },
  { id: 4, product: 'Running Shoes', category: 'Sports', region: 'West', quantity: 3, price: 129.99, sale_date: '2026-01-12' },
  { id: 5, product: 'Headphones', category: 'Electronics', region: 'North', quantity: 8, price: 79.99, sale_date: '2026-01-15' },
  { id: 6, product: 'Jeans', category: 'Clothing', region: 'East', quantity: 10, price: 59.99, sale_date: '2026-01-18' },
  { id: 7, product: 'Yoga Mat', category: 'Sports', region: 'South', quantity: 12, price: 34.99, sale_date: '2026-01-20' },
  { id: 8, product: 'Bookshelf', category: 'Home', region: 'North', quantity: 1, price: 189.99, sale_date: '2026-01-22' },
  { id: 9, product: 'Tablet', category: 'Electronics', region: 'West', quantity: 4, price: 499.99, sale_date: '2026-01-25' },
  { id: 10, product: 'Winter Jacket', category: 'Clothing', region: 'North', quantity: 6, price: 149.99, sale_date: '2026-01-28' },
  { id: 11, product: 'Basketball', category: 'Sports', region: 'East', quantity: 20, price: 29.99, sale_date: '2026-02-02' },
  { id: 12, product: 'Monitor', category: 'Electronics', region: 'South', quantity: 3, price: 349.99, sale_date: '2026-02-05' },
  { id: 13, product: 'Throw Pillow', category: 'Home', region: 'West', quantity: 8, price: 22.99, sale_date: '2026-02-08' },
  { id: 14, product: 'Sneakers', category: 'Clothing', region: 'East', quantity: 7, price: 89.99, sale_date: '2026-02-10' },
  { id: 15, product: 'Dumbbell Set', category: 'Sports', region: 'North', quantity: 2, price: 199.99, sale_date: '2026-02-12' },
  { id: 16, product: 'Keyboard', category: 'Electronics', region: 'East', quantity: 10, price: 69.99, sale_date: '2026-02-15' },
  { id: 17, product: 'Coffee Table', category: 'Home', region: 'South', quantity: 1, price: 259.99, sale_date: '2026-02-18' },
  { id: 18, product: 'Polo Shirt', category: 'Clothing', region: 'West', quantity: 9, price: 44.99, sale_date: '2026-02-20' },
  { id: 19, product: 'Tennis Racket', category: 'Sports', region: 'South', quantity: 4, price: 159.99, sale_date: '2026-02-22' },
  { id: 20, product: 'Webcam', category: 'Electronics', region: 'North', quantity: 6, price: 59.99, sale_date: '2026-02-25' },
]

The query: revenue by category with a rollup that includes subtotals and a grand total. Bread-and-butter analytics.

ClickHouse: Revenue by Category

ClickHouse uses its own client library and MergeTree table engine:

typescript
import { createClient } from '@clickhouse/client'

const ch = createClient({ url: 'http://localhost:8123' })

// Create a MergeTree table (ClickHouse requires an engine and ORDER BY)
await ch.command({
  query: `
    CREATE TABLE IF NOT EXISTS sales (
      id         UInt32,
      product    String,
      category   String,
      region     String,
      quantity   UInt32,
      price      Float64,
      sale_date  Date
    ) ENGINE = MergeTree()
    ORDER BY (category, sale_date)
  `,
})

// Insert the data
await ch.insert({ table: 'sales', values: sales, format: 'JSONEachRow' })

// Revenue by category with rollup
const result = await ch.query({
  query: `
    SELECT
      COALESCE(category, 'GRAND TOTAL') AS category,
      COALESCE(region, 'Subtotal') AS region,
      SUM(quantity * price) AS revenue,
      COUNT(*) AS transactions
    FROM sales
    GROUP BY ROLLUP(category, region)
    ORDER BY category, region
  `,
  format: 'JSONEachRow',
})

const rows = await result.json()
console.log('ClickHouse results:')
for (const row of rows) {
  console.log(`  ${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)}`)
}

A few things to note. You must pick a table engine (MergeTree is the default for almost everything). You must specify ORDER BY, which defines both the primary index and the physical sort order on disk. No separate index creation step. The sort order is the index.

DuckDB: Revenue by Category

DuckDB uses the standard pg client through SpinDB's PostgreSQL proxy:

typescript
import pg from 'pg'

const duck = new pg.Client({ connectionString: 'postgresql://127.0.0.1:5432/duck1' })
await duck.connect()

// Create a standard table (no engine, no ORDER BY required)
await duck.query(`
  DROP TABLE IF EXISTS sales;
  CREATE TABLE sales (
    id INTEGER PRIMARY KEY,
    product VARCHAR,
    category VARCHAR,
    region VARCHAR,
    quantity INTEGER,
    price DECIMAL(10, 2),
    sale_date DATE
  )
`)

// Insert the data
for (const s of sales) {
  await duck.query(
    `INSERT INTO sales (id, product, category, region, quantity, price, sale_date)
     VALUES ($1, $2, $3, $4, $5, $6, $7)`,
    [s.id, s.product, s.category, s.region, s.quantity, s.price, s.sale_date],
  )
}

// Revenue by category with rollup (identical SQL)
const result = await duck.query(`
  SELECT
    COALESCE(category, 'GRAND TOTAL') AS category,
    COALESCE(region, 'Subtotal') AS region,
    SUM(quantity * price) AS revenue,
    COUNT(*) AS transactions
  FROM sales
  GROUP BY ROLLUP(category, region)
  ORDER BY category, region
`)

console.log('DuckDB results:')
for (const row of result.rows) {
  console.log(`  ${row.category} / ${row.region}: $${Number(row.revenue).toFixed(2)}`)
}

The SQL is identical. The results match. The difference is everything around the SQL: the client library, how you create the table, and what happens at scale.

DuckDB tables are simpler. No engine selection, no mandatory sort order. Standard relational DDL. The trade-off: you don't get to control the storage layout the way ClickHouse lets you.

What ClickHouse Can Do That DuckDB Can't

Funnel Analysis with windowFunnel

ClickHouse has a built-in windowFunnel function for tracking user journeys through a sequence of events. Given timestamped events, it tells you how far each user got through a defined funnel within a time window.

typescript
// Track how many users went from /home -> /pricing -> /docs in sequence
const funnel = await ch.query({
  query: `
    SELECT
      step,
      count() AS users
    FROM (
      SELECT
        windowFunnel(7200)(timestamp, page = '/home', page = '/pricing', page = '/docs') AS step
      FROM page_views
      GROUP BY user_id
    )
    GROUP BY step
    ORDER BY step
  `,
  format: 'JSONEachRow',
})
text
  { step: 0, users: 4 }   -- didn't enter the funnel
  { step: 1, users: 3 }   -- visited /home only
  { step: 2, users: 5 }   -- reached /pricing

windowFunnel(7200) defines a 2-hour window. It checks whether events happen in the specified order within that window and returns the furthest step each user reached.

Building this in DuckDB (or PostgreSQL) requires window functions, self-joins, or multi-step CTEs. Doable, but a lot more code for a pattern ClickHouse treats as a first-class operation.

Real-Time Ingestion

ClickHouse ingests millions of rows per second while simultaneously serving queries. Built for append-heavy workloads where data streams in continuously: click events, log lines, sensor readings. MergeTree handles concurrent writes by merging data parts in the background.

DuckDB is single-writer. Ingest data while queries are running and you'll hit contention. It's built for batch loads followed by analytical reads, not streaming pipelines.

Horizontal Scaling

ClickHouse clusters shard data across multiple nodes and replicate for fault tolerance. Production deployments can have dozens of nodes processing petabytes, with the query engine parallelizing across shards transparently.

DuckDB runs on one machine, in one process. Fast on that machine because it uses all available CPU cores, but it doesn't distribute work across a cluster.

What DuckDB Can Do That ClickHouse Can't

Native PIVOT

DuckDB has a native PIVOT keyword that transforms rows into columns in one statement:

typescript
const pivot = await duck.query(`
  PIVOT sales
  ON category
  USING SUM(quantity * price) AS revenue
  GROUP BY region
  ORDER BY region
`)

console.log('DuckDB PIVOT results:')
for (const row of pivot.rows) {
  console.log(`  ${row.region}: Electronics=$${Number(row.Electronics_revenue).toFixed(2)}, Clothing=$${Number(row.Clothing_revenue).toFixed(2)}`)
}
text
DuckDB PIVOT results:
  East: Electronics=$699.90, Clothing=$1,229.86
  North: Electronics=$2,999.90, Clothing=$1,614.83
  South: Electronics=$2,049.96, Clothing=$374.85
  West: Electronics=$1,999.96, Clothing=$404.91

DuckDB discovers the distinct category values, creates one column per value, and fills in the aggregation. No manual column definitions, no conditional logic.

ClickHouse has no PIVOT keyword. You'd write conditional aggregation with if() for every column:

sql
-- ClickHouse workaround for pivot
SELECT
  region,
  SUM(if(category = 'Electronics', quantity * price, 0)) AS electronics_revenue,
  SUM(if(category = 'Clothing', quantity * price, 0)) AS clothing_revenue,
  SUM(if(category = 'Sports', quantity * price, 0)) AS sports_revenue,
  SUM(if(category = 'Home', quantity * price, 0)) AS home_revenue
FROM sales
GROUP BY region
ORDER BY region

Works, but you have to know every category value ahead of time. Add a new category and the query breaks. DuckDB's PIVOT handles new values automatically.

Direct File Querying

DuckDB can query CSV, Parquet, and JSON files directly from disk without importing anything:

sql
-- Query a CSV file with no schema definition or import step
SELECT category, SUM(quantity * price) AS revenue
FROM read_csv_auto('/path/to/sales.csv')
GROUP BY category
ORDER BY revenue DESC;

-- Same with Parquet (columnar format, even faster)
SELECT * FROM read_parquet('/data/events/*.parquet')
WHERE event_date >= '2026-01-01';

This is where DuckDB really shines. Someone sends you a CSV, you point DuckDB at it, and you're running SQL immediately. No schema definition, no import step, no ETL pipeline.

ClickHouse can read external files through table functions like file() and url(), but it's more limited. File querying is core to DuckDB's identity. In ClickHouse, it's a side feature.

PostgreSQL Wire Protocol

DuckDB (through SpinDB's PG proxy) speaks the PostgreSQL wire protocol. Any tool that connects to PostgreSQL connects to DuckDB: psql, pgAdmin, DBeaver, TablePlus, the pg npm package. No specialized client library needed.

ClickHouse has its own HTTP and native TCP protocols. You need @clickhouse/client for TypeScript, and most PostgreSQL tools won't connect directly. ClickHouse does have an experimental PG compatibility layer, but it's limited.

Key Architectural Differences

MergeTree vs Standard Tables

ClickHouse's MergeTree engine is central to how it works. When you define ORDER BY (category, sale_date), ClickHouse physically sorts data on disk in that order. Queries that filter or group by those columns skip entire blocks without scanning them. You're designing a custom index at the storage layer.

DuckDB doesn't expose table engines. It manages storage internally with automatic zone maps, min/max indexes, and adaptive compression. Simpler, but you have less control over query performance for specific access patterns.

Write Patterns

ClickHouse is optimized for large batch inserts and append-only workloads. Individual row updates and deletes exist but are expensive (they rewrite entire data parts). The mental model: data flows in, gets merged in the background, and is queryable immediately.

DuckDB handles standard INSERT/UPDATE/DELETE with ACID transactions. Proper relational write semantics. But it doesn't handle high-throughput concurrent writes the way ClickHouse does.

Concurrency

ClickHouse serves many concurrent clients. Multiple dashboards, multiple users, multiple ingestion pipelines hitting the same cluster simultaneously. That's the whole point of a server architecture.

DuckDB is single-process. Fast for one user running queries, but not designed to serve hundreds of concurrent dashboard users. Embedding DuckDB in a backend service works for pre-computed reports or low-concurrency internal tools, not customer-facing analytics.

When to Pick Which

Pick ClickHouse when:

  • You're building production analytics dashboards that serve multiple concurrent users
  • You need to ingest data in real time from event streams, logs, or click tracking
  • Your dataset is large and growing (hundreds of millions to billions of rows)
  • You need funnel analysis, retention curves, or session replay with built-in functions
  • You're running a multi-node cluster for fault tolerance and horizontal scaling
  • Your team already operates server infrastructure and can manage a database service

Pick DuckDB when:

  • You're doing ad-hoc analysis on local files (CSVs, Parquet exports, data dumps)
  • You want embedded analytics inside an application with no external database to manage
  • Your dataset fits on one machine (DuckDB handles hundreds of gigabytes comfortably)
  • You need PIVOT tables and other advanced SQL syntax without workarounds
  • You're building data transformation pipelines where SQL is cleaner than Python/pandas
  • You want the simplest possible setup with no server, no configuration, no ops overhead

The short version

DuckDB is for when you're the one asking the questions. You have data, you want answers, and you want them fast without standing up infrastructure.

ClickHouse is for when you're building a system that answers questions for other people. It's the production engine behind the dashboard, the monitoring tool, the analytics API.

They solve different problems. Running both is completely reasonable: ClickHouse for your production analytics pipeline, DuckDB for local exploration and prototyping. Both are just spindb create commands away.

Wrapping Up

The choice comes down to where your workload lives.

Building a product that shows analytics to users? ClickHouse. Exploring datasets locally as a developer or data engineer? DuckDB.

To manage your local instances:

bash
spindb stop click1   # Stop ClickHouse
spindb stop duck1    # Stop DuckDB
spindb start click1  # Start ClickHouse again
spindb start duck1   # Start DuckDB again
spindb list          # See all your database instances

SpinDB runs 20+ database engines from one CLI, so you can keep ClickHouse, DuckDB, PostgreSQL, and whatever else you need running side by side. For managed instances, Layerbase Cloud handles provisioning, TLS, and backups.

For deeper dives into each engine individually, check out Getting Started with ClickHouse and Getting Started with DuckDB.

Something not working?