Getting Started with ClickHouse
PostgreSQL can run aggregation queries, but they get slow as tables grow. Once you're scanning millions of rows to build a dashboard, queries that should feel instant start taking seconds. Add a few GROUP BYs and a JOIN, and you're looking at timeouts.
ClickHouse is built for exactly this. Columnar storage reads only the columns a query touches instead of entire rows. Vectorized execution processes data in batches using CPU SIMD instructions. The result: aggregating massive datasets in milliseconds. It's the database you reach for when your PostgreSQL dashboard queries start timing out.
We'll wire up a web analytics pipeline in one TypeScript file: page view tracking, visitor breakdowns, and funnel analysis. Run everything locally, or grab a managed instance from Layerbase Cloud.
Contents
- Create a ClickHouse Instance
- Set Up the Project
- The Page View Dataset
- Create the Table
- Insert the Data
- Run Analytical Queries
- When to Reach for ClickHouse
- Wrapping Up
Create a ClickHouse Instance
Local with SpinDB
SpinDB gets you a local ClickHouse server in about 30 seconds. It grabs the binary for your platform and runs it directly, no Docker required. (What is SpinDB?)
Install SpinDB globally:
npm i -g spindb # npm
pnpm add -g spindb # pnpmOr run it directly without installing:
npx spindb create click1 -e clickhouse --start # npm
pnpx spindb create click1 -e clickhouse --start # pnpmIf you installed globally, create and start a ClickHouse instance:
spindb create click1 -e clickhouse --startSpinDB downloads the ClickHouse binary for your platform, configures it, and starts the server. Verify it's running:
spindb url click1http://127.0.0.1:8123Leave the server running.
Layerbase Cloud
Rather not run it locally? Layerbase Cloud spins up a managed ClickHouse instance for you. Select ClickHouse on the create page, and the Quick Connect panel will show your connection URL, username, and password.
Cloud instances use TLS, so the connection code is slightly different:
const client = createClient({
url: 'https://cloud.layerbase.dev:PORT',
username: 'default',
password: 'YOUR_PASSWORD',
})For the ClickHouse CLI, add the --secure flag when connecting to cloud instances.
Everything else in this guide works identically whether you're running locally or on Layerbase Cloud. Just swap in your connection details.
Set Up the Project
mkdir clickhouse-analytics && cd clickhouse-analytics
pnpm init
pnpm add @clickhouse/client
pnpm add -D tsx typescriptCreate a file called analytics.ts. All the code in this post goes into that one file.
The Page View Dataset
Here are 25 page view events representing a day of website traffic. In production this would stream in from your tracking pixel or server-side logger:
type PageView = {
event_id: number
user_id: number
page: string
referrer: string
country: string
browser: string
timestamp: string
}
const pageViews: PageView[] = [
{ event_id: 1, user_id: 101, page: '/home', referrer: 'google', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:00:00' },
{ event_id: 2, user_id: 102, page: '/home', referrer: 'twitter', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 08:05:00' },
{ event_id: 3, user_id: 101, page: '/pricing', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:10:00' },
{ event_id: 4, user_id: 103, page: '/docs', referrer: 'google', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 08:15:00' },
{ event_id: 5, user_id: 104, page: '/home', referrer: 'github', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 08:20:00' },
{ event_id: 6, user_id: 102, page: '/pricing', referrer: '', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 08:25:00' },
{ event_id: 7, user_id: 105, page: '/blog/intro-to-clickhouse', referrer: 'google', country: 'FR', browser: 'Chrome', timestamp: '2026-03-14 08:30:00' },
{ event_id: 8, user_id: 106, page: '/home', referrer: 'linkedin', country: 'CA', browser: 'Safari', timestamp: '2026-03-14 09:00:00' },
{ event_id: 9, user_id: 101, page: '/docs', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 09:05:00' },
{ event_id: 10, user_id: 107, page: '/blog/getting-started', referrer: 'google', country: 'US', browser: 'Firefox', timestamp: '2026-03-14 09:10:00' },
{ event_id: 11, user_id: 103, page: '/pricing', referrer: '', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 09:15:00' },
{ event_id: 12, user_id: 108, page: '/home', referrer: 'twitter', country: 'JP', browser: 'Chrome', timestamp: '2026-03-14 09:20:00' },
{ event_id: 13, user_id: 104, page: '/pricing', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 09:25:00' },
{ event_id: 14, user_id: 109, page: '/docs', referrer: 'github', country: 'BR', browser: 'Firefox', timestamp: '2026-03-14 09:30:00' },
{ event_id: 15, user_id: 105, page: '/pricing', referrer: '', country: 'FR', browser: 'Chrome', timestamp: '2026-03-14 09:35:00' },
{ event_id: 16, user_id: 110, page: '/blog/analytics-tips', referrer: 'google', country: 'US', browser: 'Safari', timestamp: '2026-03-14 10:00:00' },
{ event_id: 17, user_id: 106, page: '/pricing', referrer: '', country: 'CA', browser: 'Safari', timestamp: '2026-03-14 10:05:00' },
{ event_id: 18, user_id: 111, page: '/home', referrer: 'google', country: 'IN', browser: 'Chrome', timestamp: '2026-03-14 10:10:00' },
{ event_id: 19, user_id: 108, page: '/docs', referrer: '', country: 'JP', browser: 'Chrome', timestamp: '2026-03-14 10:15:00' },
{ event_id: 20, user_id: 112, page: '/home', referrer: 'twitter', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 10:20:00' },
{ event_id: 21, user_id: 103, page: '/home', referrer: '', country: 'DE', browser: 'Safari', timestamp: '2026-03-14 10:25:00' },
{ event_id: 22, user_id: 107, page: '/pricing', referrer: '', country: 'US', browser: 'Firefox', timestamp: '2026-03-14 10:30:00' },
{ event_id: 23, user_id: 111, page: '/pricing', referrer: '', country: 'IN', browser: 'Chrome', timestamp: '2026-03-14 10:35:00' },
{ event_id: 24, user_id: 112, page: '/docs', referrer: '', country: 'UK', browser: 'Firefox', timestamp: '2026-03-14 10:40:00' },
{ event_id: 25, user_id: 104, page: '/docs', referrer: '', country: 'US', browser: 'Chrome', timestamp: '2026-03-14 10:45:00' },
]The data models realistic browsing: users land on /home from a referrer, some navigate to /pricing, some explore /docs or read blog posts. Enough variety for meaningful analytics.
Create the Table
ClickHouse tables use engines that control storage and query behavior. The most common is MergeTree, which stores data sorted by a primary key for fast range scans and aggregations.
import { createClient } from '@clickhouse/client'
const client = createClient({ url: 'http://localhost:8123' })
await client.command({
query: `
CREATE TABLE IF NOT EXISTS page_views (
event_id UInt32,
user_id UInt32,
page String,
referrer String,
country String,
browser String,
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
`,
})
console.log('Created table "page_views"')ORDER BY defines the primary index. ClickHouse physically sorts data on disk in this order, so queries filtering or grouping by timestamp or user_id can skip large chunks without scanning every row. Unlike PostgreSQL, where you add indexes after the fact, ClickHouse bakes the sort order into the storage layer itself.
Insert the Data
The ClickHouse client's insert method accepts an array of objects and handles serialization for you:
await client.insert({
table: 'page_views',
values: pageViews,
format: 'JSONEachRow',
})
console.log(`Inserted ${pageViews.length} page view events`)In production, ClickHouse ingests millions of rows per second. Our 25-row dataset is obviously trivial, but the same code works at any scale.
Run Analytical Queries
Here's where ClickHouse earns its keep. Each of these queries would need careful indexing in PostgreSQL and would still slow down at scale. ClickHouse handles them in sub-millisecond time.
A helper to run queries and print the results:
async function query<T>(sql: string, label: string): Promise<T[]> {
const result = await client.query({ query: sql, format: 'JSONEachRow' })
const rows = await result.json<T>()
console.log(`\n${label}`)
for (const row of rows) {
console.log(' ', row)
}
return rows
}Top Pages by Unique Visitors
await query(
`SELECT
page,
uniq(user_id) AS unique_visitors
FROM page_views
GROUP BY page
ORDER BY unique_visitors DESC
LIMIT 5`,
'Top 5 pages by unique visitors:',
)Top 5 pages by unique visitors:
{ page: '/home', unique_visitors: '8' }
{ page: '/pricing', unique_visitors: '8' }
{ page: '/docs', unique_visitors: '6' }
{ page: '/blog/intro-to-clickhouse', unique_visitors: '1' }
{ page: '/blog/getting-started', unique_visitors: '1' }Notice uniq. In PostgreSQL, you'd write COUNT(DISTINCT user_id), which scans every value for an exact count. ClickHouse's uniq uses an adaptive approximation algorithm: exact for small cardinalities, still performant across billions of rows. uniqExact is there if you need guaranteed exact counts, but uniq is the standard choice for analytics dashboards.
Visitors by Country per Page
Which countries drive traffic to which pages?
await query(
`SELECT
page,
country,
uniq(user_id) AS visitors
FROM page_views
GROUP BY page, country
ORDER BY page, visitors DESC`,
'Visitors by country per page:',
)Visitors by country per page:
{ page: '/blog/analytics-tips', country: 'US', visitors: '1' }
{ page: '/blog/getting-started', country: 'US', visitors: '1' }
{ page: '/blog/intro-to-clickhouse', country: 'FR', visitors: '1' }
{ page: '/docs', country: 'US', visitors: '2' }
{ page: '/docs', country: 'UK', visitors: '1' }
{ page: '/docs', country: 'DE', visitors: '1' }
{ page: '/docs', country: 'JP', visitors: '1' }
{ page: '/docs', country: 'BR', visitors: '1' }
{ page: '/home', country: 'US', visitors: '2' }
{ page: '/home', country: 'UK', visitors: '2' }
{ page: '/home', country: 'CA', visitors: '1' }
{ page: '/home', country: 'DE', visitors: '1' }
{ page: '/home', country: 'IN', visitors: '1' }
{ page: '/home', country: 'JP', visitors: '1' }
{ page: '/pricing', country: 'US', visitors: '3' }
{ page: '/pricing', country: 'UK', visitors: '1' }
{ page: '/pricing', country: 'DE', visitors: '1' }
{ page: '/pricing', country: 'FR', visitors: '1' }
{ page: '/pricing', country: 'CA', visitors: '1' }
{ page: '/pricing', country: 'IN', visitors: '1' }In PostgreSQL, this query is easy enough to write, but it scans every row and sorts in memory. ClickHouse's columnar storage reads only three columns (page, country, user_id) and skips the rest. At scale, that difference is orders of magnitude.
Funnel Analysis: Home to Pricing
The big analytics question: how many users who landed on /home later visited /pricing? ClickHouse has a built-in windowFunnel function for exactly this:
await query(
`SELECT
user_id,
windowFunnel(7200)(timestamp, page = '/home', page = '/pricing') AS funnel_step
FROM page_views
GROUP BY user_id
ORDER BY funnel_step DESC, user_id`,
'Funnel: /home → /pricing (steps reached per user):',
)Funnel: /home → /pricing (steps reached per user):
{ user_id: '101', funnel_step: 2 }
{ user_id: '102', funnel_step: 2 }
{ user_id: '104', funnel_step: 2 }
{ user_id: '106', funnel_step: 2 }
{ user_id: '111', funnel_step: 2 }
{ user_id: '112', funnel_step: 1 }
{ user_id: '108', funnel_step: 1 }
{ user_id: '103', funnel_step: 1 }
{ user_id: '105', funnel_step: 0 }
{ user_id: '107', funnel_step: 0 }
{ user_id: '109', funnel_step: 0 }
{ user_id: '110', funnel_step: 0 }windowFunnel(7200) defines a 2-hour window and checks whether events happen in order within it. funnel_step of 2 means the user completed both steps (visited /home then /pricing). A 1 means they hit /home but never reached /pricing. A 0 means they did neither in order.
Five of twelve users completed the funnel. Building this in PostgreSQL would need window functions, self-joins, or a CTE chain. In ClickHouse, it's one function call.
To get a summary of the funnel conversion rate:
await query(
`SELECT
step,
count() AS users
FROM (
SELECT
windowFunnel(7200)(timestamp, page = '/home', page = '/pricing') AS step
FROM page_views
GROUP BY user_id
)
GROUP BY step
ORDER BY step`,
'Funnel summary:',
)Funnel summary:
{ step: '0', users: '4' }
{ step: '1', users: '3' }
{ step: '2', users: '5' }Performance at Scale
With 25 rows, every database is fast. The difference shows at scale. ClickHouse has structural advantages that compound as data grows:
- Columnar storage: a query touching 3 of 7 columns reads ~43% of the data. PostgreSQL reads 100%
- Vectorized execution: operations batch across thousands of rows using CPU SIMD instructions, not row by row
- No index maintenance:
ORDER BYis the index. No separate B-tree to build, update, or vacuum - Compression: columns with repeated values (like
countryandbrowser) compress extremely well, often 10-20x
These queries return in under 1ms here. On 100 million rows, ClickHouse would still return them in tens of milliseconds. PostgreSQL, even with tuned indexes, would measure the same queries in seconds.
When to Reach for ClickHouse
ClickHouse tends to be the right call for:
- Web and product analytics: page views, events, sessions, funnels. This is ClickHouse's home turf
- Log analysis: application logs, access logs, error tracking. Billions of append-only rows queried by time range
- Real-time dashboards: any visualization that aggregates recent data and needs sub-second refresh
- Ad tech: click tracking, impression counting, bid analysis. High write throughput with fast aggregation reads
- Time-series metrics: server monitoring, IoT sensor data, financial tick data. Anything ordered by time and queried in ranges
All of these share a shape: lots of rows written, few columns read per query. If your aggregation queries touch millions of rows but only need a handful of columns, ClickHouse will beat a row-oriented database by a wide margin.
ClickHouse is not the right fit for transactional workloads (frequent single-row updates, foreign key constraints, ACID transactions across tables). Keep PostgreSQL for your application data and use ClickHouse for the analytical layer beside it.
Wrapping Up
The full script is under 80 lines of meaningful code. You created a MergeTree table, inserted page view events, and ran analytical queries that would strain a traditional database at scale: unique visitor counts, country breakdowns, and funnel analysis with windowFunnel. Same pattern works from 25 rows to 25 billion.
The ClickHouse documentation covers materialized views, projections, dictionaries, and the full catalog of aggregate functions.
To manage your local ClickHouse instance:
spindb stop click1 # Stop the server
spindb start click1 # Start it again
spindb list # See all your database instancesSpinDB works with 20+ engines, so you can keep ClickHouse running alongside PostgreSQL for transactional data and Qdrant for vector search, all from one CLI.