Getting Started with ClickHouse

ClickHouseDatabasesAnalytics

PostgreSQL can run aggregation queries, but they get slow as tables grow. Once you're scanning millions of rows to build a dashboard, queries that should feel instant start taking seconds. Add a few GROUP BYs and a JOIN, and you're looking at timeouts.

ClickHouse is built for exactly this. Columnar storage reads only the columns a query touches instead of entire rows. Vectorized execution processes data in batches using CPU SIMD instructions. The result: aggregating massive datasets in milliseconds. It's the database you reach for when your PostgreSQL dashboard queries start timing out.

We'll wire up a web analytics pipeline in one TypeScript file: page view tracking, visitor breakdowns, and funnel analysis. Run everything locally, or grab a managed instance from Layerbase Cloud.

Contents

Create a ClickHouse Instance

Local with SpinDB

SpinDB gets you a local ClickHouse server in about 30 seconds. It grabs the binary for your platform and runs it directly, no Docker required. (What is SpinDB?)

Install SpinDB globally:

bash
npm i -g spindb    # npm
pnpm add -g spindb # pnpm

Or run it directly without installing:

bash
npx spindb create click1 -e clickhouse --start  # npm
pnpx spindb create click1 -e clickhouse --start # pnpm

If you installed globally, create and start a ClickHouse instance:

bash
spindb create click1 -e clickhouse --start

SpinDB downloads the ClickHouse binary for your platform, configures it, and starts the server. Verify it's running:

bash
spindb url click1
text
http://127.0.0.1:8123

Leave the server running.

Layerbase Cloud

Rather not run it locally? Layerbase Cloud spins up a managed ClickHouse instance for you. Select ClickHouse on the create page, and the Quick Connect panel will show your connection URL, username, and password.

Cloud instances use TLS, so the connection code is slightly different:

typescript
const client = createClient({
  url: 'https://cloud.layerbase.dev:PORT',
  username: 'default',
  password: 'YOUR_PASSWORD',
})

For the ClickHouse CLI, add the --secure flag when connecting to cloud instances.

Everything else in this guide works identically whether you're running locally or on Layerbase Cloud. Just swap in your connection details.

Set Up the Project

bash
mkdir clickhouse-analytics && cd clickhouse-analytics
pnpm init
pnpm add @clickhouse/client
pnpm add -D tsx typescript

Create a file called analytics.ts. All the code in this post goes into that one file.

The Page View Dataset

Here are 25 page view events representing a day of website traffic. In production this would stream in from your tracking pixel or server-side logger:

typescript
type PageView = {
  event_id: number
  user_id: number
  page: string
  referrer: string
  country: string
  browser: string
  timestamp: string
}

const pageViews: PageView[] = [
  { event_id: 1, user_id: 101, page: '/home', referrer: 'google', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:00:00' },
  { event_id: 2, user_id: 102, page: '/home', referrer: 'twitter', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 08:05:00' },
  { event_id: 3, user_id: 101, page: '/pricing', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:10:00' },
  { event_id: 4, user_id: 103, page: '/docs', referrer: 'google', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 08:15:00' },
  { event_id: 5, user_id: 104, page: '/home', referrer: 'github', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:20:00' },
  { event_id: 6, user_id: 102, page: '/pricing', referrer: '', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 08:25:00' },
  { event_id: 7, user_id: 105, page: '/blog/intro-to-clickhouse', referrer: 'google', country: 'FR', browser: 'Chrome', timestamp: '2026-03-14 08:30:00' },
  { event_id: 8, user_id: 106, page: '/home', referrer: 'linkedin', country: 'CA', browser: 'Safari', timestamp: '2026-03-14 09:00:00' },
  { event_id: 9, user_id: 101, page: '/docs', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 09:05:00' },
  { event_id: 10, user_id: 107, page: '/blog/getting-started', referrer: 'google', country: 'US', browser: 'Firefox', timestamp: '2026-03-14 09:10:00' },
  { event_id: 11, user_id: 103, page: '/pricing', referrer: '', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 09:15:00' },
  { event_id: 12, user_id: 108, page: '/home', referrer: 'twitter', country: 'JP', browser: 'Chrome', timestamp: '2026-03-14 09:20:00' },
  { event_id: 13, user_id: 104, page: '/pricing', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 09:25:00' },
  { event_id: 14, user_id: 109, page: '/docs', referrer: 'github', country: 'BR', browser: 'Firefox', timestamp: '2026-03-14 09:30:00' },
  { event_id: 15, user_id: 105, page: '/pricing', referrer: '', country: 'FR', browser: 'Chrome', timestamp: '2026-03-14 09:35:00' },
  { event_id: 16, user_id: 110, page: '/blog/analytics-tips', referrer: 'google', country: 'US', browser: 'Safari', timestamp: '2026-03-14 10:00:00' },
  { event_id: 17, user_id: 106, page: '/pricing', referrer: '', country: 'CA', browser: 'Safari', timestamp: '2026-03-14 10:05:00' },
  { event_id: 18, user_id: 111, page: '/home', referrer: 'google', country: 'IN', browser: 'Chrome', timestamp: '2026-03-14 10:10:00' },
  { event_id: 19, user_id: 108, page: '/docs', referrer: '', country: 'JP', browser: 'Chrome', timestamp: '2026-03-14 10:15:00' },
  { event_id: 20, user_id: 112, page: '/home', referrer: 'twitter', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 10:20:00' },
  { event_id: 21, user_id: 103, page: '/home', referrer: '', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 10:25:00' },
  { event_id: 22, user_id: 107, page: '/pricing', referrer: '', country: 'US', browser: 'Firefox', timestamp: '2026-03-14 10:30:00' },
  { event_id: 23, user_id: 111, page: '/pricing', referrer: '', country: 'IN', browser: 'Chrome', timestamp: '2026-03-14 10:35:00' },
  { event_id: 24, user_id: 112, page: '/docs', referrer: '', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 10:40:00' },
  { event_id: 25, user_id: 104, page: '/docs', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 10:45:00' },
]

The data models realistic browsing: users land on /home from a referrer, some navigate to /pricing, some explore /docs or read blog posts. Enough variety for meaningful analytics.

Create the Table

ClickHouse tables use engines that control storage and query behavior. The most common is MergeTree, which stores data sorted by a primary key for fast range scans and aggregations.

typescript
import { createClient } from '@clickhouse/client'

const client = createClient({ url: 'http://localhost:8123' })

await client.command({
  query: `
    CREATE TABLE IF NOT EXISTS page_views (
      event_id   UInt32,
      user_id    UInt32,
      page       String,
      referrer   String,
      country    String,
      browser    String,
      timestamp  DateTime
    ) ENGINE = MergeTree()
    ORDER BY (timestamp, user_id)
  `,
})

console.log('Created table "page_views"')

ORDER BY defines the primary index. ClickHouse physically sorts data on disk in this order, so queries filtering or grouping by timestamp or user_id can skip large chunks without scanning every row. Unlike PostgreSQL, where you add indexes after the fact, ClickHouse bakes the sort order into the storage layer itself.

Insert the Data

The ClickHouse client's insert method accepts an array of objects and handles serialization for you:

typescript
await client.insert({
  table: 'page_views',
  values: pageViews,
  format: 'JSONEachRow',
})

console.log(`Inserted ${pageViews.length} page view events`)

In production, ClickHouse ingests millions of rows per second. Our 25-row dataset is obviously trivial, but the same code works at any scale.

Run Analytical Queries

Here's where ClickHouse earns its keep. Each of these queries would need careful indexing in PostgreSQL and would still slow down at scale. ClickHouse handles them in sub-millisecond time.

A helper to run queries and print the results:

typescript
async function query<T>(sql: string, label: string): Promise<T[]> {
  const result = await client.query({ query: sql, format: 'JSONEachRow' })
  const rows = await result.json<T>()
  console.log(`\n${label}`)
  for (const row of rows) {
    console.log(' ', row)
  }
  return rows
}

Top Pages by Unique Visitors

typescript
await query(
  `SELECT
     page,
     uniq(user_id) AS unique_visitors
   FROM page_views
   GROUP BY page
   ORDER BY unique_visitors DESC
   LIMIT 5`,
  'Top 5 pages by unique visitors:',
)
text
Top 5 pages by unique visitors:
  { page: '/home', unique_visitors: '8' }
  { page: '/pricing', unique_visitors: '8' }
  { page: '/docs', unique_visitors: '6' }
  { page: '/blog/intro-to-clickhouse', unique_visitors: '1' }
  { page: '/blog/getting-started', unique_visitors: '1' }

Notice uniq. In PostgreSQL, you'd write COUNT(DISTINCT user_id), which scans every value for an exact count. ClickHouse's uniq uses an adaptive approximation algorithm: exact for small cardinalities, still performant across billions of rows. uniqExact is there if you need guaranteed exact counts, but uniq is the standard choice for analytics dashboards.

Visitors by Country per Page

Which countries drive traffic to which pages?

typescript
await query(
  `SELECT
     page,
     country,
     uniq(user_id) AS visitors
   FROM page_views
   GROUP BY page, country
   ORDER BY page, visitors DESC`,
  'Visitors by country per page:',
)
text
Visitors by country per page:
  { page: '/blog/analytics-tips', country: 'US', visitors: '1' }
  { page: '/blog/getting-started', country: 'US', visitors: '1' }
  { page: '/blog/intro-to-clickhouse', country: 'FR', visitors: '1' }
  { page: '/docs', country: 'US', visitors: '2' }
  { page: '/docs', country: 'UK', visitors: '1' }
  { page: '/docs', country: 'DE', visitors: '1' }
  { page: '/docs', country: 'JP', visitors: '1' }
  { page: '/docs', country: 'BR', visitors: '1' }
  { page: '/home', country: 'US', visitors: '2' }
  { page: '/home', country: 'UK', visitors: '2' }
  { page: '/home', country: 'CA', visitors: '1' }
  { page: '/home', country: 'DE', visitors: '1' }
  { page: '/home', country: 'IN', visitors: '1' }
  { page: '/home', country: 'JP', visitors: '1' }
  { page: '/pricing', country: 'US', visitors: '3' }
  { page: '/pricing', country: 'UK', visitors: '1' }
  { page: '/pricing', country: 'DE', visitors: '1' }
  { page: '/pricing', country: 'FR', visitors: '1' }
  { page: '/pricing', country: 'CA', visitors: '1' }
  { page: '/pricing', country: 'IN', visitors: '1' }

In PostgreSQL, this query is easy enough to write, but it scans every row and sorts in memory. ClickHouse's columnar storage reads only three columns (page, country, user_id) and skips the rest. At scale, that difference is orders of magnitude.

Funnel Analysis: Home to Pricing

The big analytics question: how many users who landed on /home later visited /pricing? ClickHouse has a built-in windowFunnel function for exactly this:

typescript
await query(
  `SELECT
     user_id,
     windowFunnel(7200)(timestamp, page = '/home', page = '/pricing') AS funnel_step
   FROM page_views
   GROUP BY user_id
   ORDER BY funnel_step DESC, user_id`,
  'Funnel: /home → /pricing (steps reached per user):',
)
text
Funnel: /home → /pricing (steps reached per user):
  { user_id: '101', funnel_step: 2 }
  { user_id: '102', funnel_step: 2 }
  { user_id: '104', funnel_step: 2 }
  { user_id: '106', funnel_step: 2 }
  { user_id: '111', funnel_step: 2 }
  { user_id: '112', funnel_step: 1 }
  { user_id: '108', funnel_step: 1 }
  { user_id: '103', funnel_step: 1 }
  { user_id: '105', funnel_step: 0 }
  { user_id: '107', funnel_step: 0 }
  { user_id: '109', funnel_step: 0 }
  { user_id: '110', funnel_step: 0 }

windowFunnel(7200) defines a 2-hour window and checks whether events happen in order within it. funnel_step of 2 means the user completed both steps (visited /home then /pricing). A 1 means they hit /home but never reached /pricing. A 0 means they did neither in order.

Five of twelve users completed the funnel. Building this in PostgreSQL would need window functions, self-joins, or a CTE chain. In ClickHouse, it's one function call.

To get a summary of the funnel conversion rate:

typescript
await query(
  `SELECT
     step,
     count() AS users
   FROM (
     SELECT
       windowFunnel(7200)(timestamp, page = '/home', page = '/pricing') AS step
     FROM page_views
     GROUP BY user_id
   )
   GROUP BY step
   ORDER BY step`,
  'Funnel summary:',
)
text
Funnel summary:
  { step: '0', users: '4' }
  { step: '1', users: '3' }
  { step: '2', users: '5' }

Performance at Scale

With 25 rows, every database is fast. The difference shows at scale. ClickHouse has structural advantages that compound as data grows:

  • Columnar storage: a query touching 3 of 7 columns reads ~43% of the data. PostgreSQL reads 100%
  • Vectorized execution: operations batch across thousands of rows using CPU SIMD instructions, not row by row
  • No index maintenance: ORDER BY is the index. No separate B-tree to build, update, or vacuum
  • Compression: columns with repeated values (like country and browser) compress extremely well, often 10-20x

These queries return in under 1ms here. On 100 million rows, ClickHouse would still return them in tens of milliseconds. PostgreSQL, even with tuned indexes, would measure the same queries in seconds.

When to Reach for ClickHouse

ClickHouse tends to be the right call for:

  • Web and product analytics: page views, events, sessions, funnels. This is ClickHouse's home turf
  • Log analysis: application logs, access logs, error tracking. Billions of append-only rows queried by time range
  • Real-time dashboards: any visualization that aggregates recent data and needs sub-second refresh
  • Ad tech: click tracking, impression counting, bid analysis. High write throughput with fast aggregation reads
  • Time-series metrics: server monitoring, IoT sensor data, financial tick data. Anything ordered by time and queried in ranges

All of these share a shape: lots of rows written, few columns read per query. If your aggregation queries touch millions of rows but only need a handful of columns, ClickHouse will beat a row-oriented database by a wide margin.

ClickHouse is not the right fit for transactional workloads (frequent single-row updates, foreign key constraints, ACID transactions across tables). Keep PostgreSQL for your application data and use ClickHouse for the analytical layer beside it.

Wrapping Up

The full script is under 80 lines of meaningful code. You created a MergeTree table, inserted page view events, and ran analytical queries that would strain a traditional database at scale: unique visitor counts, country breakdowns, and funnel analysis with windowFunnel. Same pattern works from 25 rows to 25 billion.

The ClickHouse documentation covers materialized views, projections, dictionaries, and the full catalog of aggregate functions.

To manage your local ClickHouse instance:

bash
spindb stop click1    # Stop the server
spindb start click1   # Start it again
spindb list           # See all your database instances

SpinDB works with 20+ engines, so you can keep ClickHouse running alongside PostgreSQL for transactional data and Qdrant for vector search, all from one CLI.

Something not working?