Getting Started with Weaviate

WeaviateDatabasesVector Search

Vector databases like Qdrant let you search by meaning instead of keywords. You generate embeddings, insert them, query with vectors. Weaviate takes this further: it can generate embeddings for you using built-in vectorizer modules (text2vec-openai, text2vec-transformers, and others). Insert plain text, Weaviate handles the rest.

But that's not the only trick. Weaviate also supports hybrid search, combining vector similarity with BM25 keyword scoring in a single query. Semantic understanding for "what does this mean?" plus exact keyword matching for "does this contain the word 'PostgreSQL'?" You control the balance with a single parameter.

We'll build semantic and hybrid search over an article dataset in one TypeScript file. Everything runs against a local Weaviate instance, or you can point it at Layerbase Cloud for a hosted setup.

Contents

Create a Weaviate Instance

Local with SpinDB

SpinDB gets you from zero to a running Weaviate in one command. No Docker. (What is SpinDB?)

Install SpinDB globally:

bash
npm i -g spindb    # npm
pnpm add -g spindb # pnpm

Or run it directly without installing:

bash
npx spindb create weav1 -e weaviate --start  # npm
pnpx spindb create weav1 -e weaviate --start # pnpm

If you installed globally, create and start a Weaviate instance:

bash
spindb create weav1 -e weaviate --start

SpinDB downloads the Weaviate binary for your platform, configures it, and starts the server. Verify it's running:

bash
spindb url weav1
text
http://127.0.0.1:8080

Leave the server running. We'll connect to it from TypeScript in the next section.

Layerbase Cloud

Layerbase Cloud provisions a managed Weaviate instance in seconds if you'd rather skip local setup. Grab your connection details from the dashboard's Quick Connect panel.

Cloud instances use TLS, so the connection code is slightly different:

typescript
const client = await weaviate.connectToCustom({
  httpHost: 'cloud.layerbase.dev',
  httpPort: 443,
  httpSecure: true,
  grpcHost: 'cloud.layerbase.dev',
  grpcPort: 50051,
  grpcSecure: true,
})

Everything else in this guide works identically whether you're running locally or on Layerbase Cloud. Just swap in your connection details.

Set Up the Project

bash
mkdir weaviate-article-search && cd weaviate-article-search
pnpm init
pnpm add weaviate-client @xenova/transformers
pnpm add -D tsx typescript

Create a file called search.ts. All the code in this post goes into that one file.

The Article Dataset

Here are 15 articles with titles, summaries, categories, authors, and publication dates. In production this would come from your database or API. The summaries are what we'll convert into embeddings:

typescript
const articles = [
  {
    title: 'The Rise of Edge Computing',
    content:
      'Edge computing moves processing closer to the data source, reducing latency for real-time applications. Companies are deploying micro data centers at cell towers and retail locations to handle IoT workloads locally.',
    category: 'tech',
    author: 'Sarah Chen',
    publishedAt: '2026-01-15',
  },
  {
    title: 'CRISPR Advances in Crop Engineering',
    content:
      'Researchers used CRISPR gene editing to develop drought-resistant wheat varieties that maintain yield in arid conditions. The modified crops require 40% less water while producing comparable harvests.',
    category: 'science',
    author: 'James Okafor',
    publishedAt: '2026-01-22',
  },
  {
    title: 'Remote Work and Commercial Real Estate',
    content:
      'Office vacancy rates hit record highs as companies downsize their physical footprint. Landlords are converting empty office towers into residential units and mixed-use spaces to adapt.',
    category: 'business',
    author: 'Maria Lopez',
    publishedAt: '2026-02-03',
  },
  {
    title: 'Gut Microbiome and Mental Health',
    content:
      'New studies link specific gut bacteria populations to anxiety and depression symptoms. Targeted probiotic treatments showed measurable improvements in patient mood and cognitive function over 12-week trials.',
    category: 'health',
    author: 'David Kim',
    publishedAt: '2026-02-10',
  },
  {
    title: 'The Vinyl Revival Hits Streaming Numbers',
    content:
      'Vinyl record sales surpassed streaming revenue for independent artists for the first time. Collectors value the tactile experience and album artwork, driving a resurgence in small pressing plants.',
    category: 'culture',
    author: 'Amara Johnson',
    publishedAt: '2026-02-14',
  },
  {
    title: 'Kubernetes at Scale: Lessons from Production',
    content:
      'A post-mortem of running 10,000 Kubernetes pods across three regions reveals hard-won lessons about resource limits, pod scheduling, and the hidden costs of over-provisioning.',
    category: 'tech',
    author: 'Sarah Chen',
    publishedAt: '2026-02-18',
  },
  {
    title: 'Deep Ocean Mining Controversy',
    content:
      'Plans to mine polymetallic nodules from the Pacific seafloor face opposition from marine biologists who warn of irreversible damage to deep-sea ecosystems that remain poorly understood.',
    category: 'science',
    author: 'Elena Vasquez',
    publishedAt: '2026-02-20',
  },
  {
    title: 'Central Bank Digital Currencies in Practice',
    content:
      'Three countries launched retail CBDCs this year, giving citizens direct digital accounts at the central bank. Early adoption data shows strong uptake for government disbursements but slow merchant acceptance.',
    category: 'business',
    author: 'Raj Patel',
    publishedAt: '2026-02-25',
  },
  {
    title: 'Sleep Architecture and Athletic Performance',
    content:
      'Professional sports teams now monitor sleep stages with wearable sensors. Athletes who consistently achieve 90+ minutes of deep sleep show 15% faster reaction times and fewer soft-tissue injuries.',
    category: 'health',
    author: 'David Kim',
    publishedAt: '2026-03-01',
  },
  {
    title: 'AI-Generated Art in Major Museums',
    content:
      'The Tate Modern opened its first exhibition of AI-generated artwork, sparking debate about authorship and originality. Critics question whether algorithmic output qualifies as creative expression.',
    category: 'culture',
    author: 'Amara Johnson',
    publishedAt: '2026-03-03',
  },
  {
    title: 'WebAssembly Beyond the Browser',
    content:
      'WebAssembly is gaining traction as a server-side runtime. Its sandboxed execution model and near-native performance make it attractive for plugin systems, edge functions, and portable microservices.',
    category: 'tech',
    author: 'Sarah Chen',
    publishedAt: '2026-03-05',
  },
  {
    title: 'Fusion Energy Milestone at Oxford',
    content:
      'The JET reactor in Oxford sustained a plasma burn for 11 seconds, generating more energy than any previous fusion experiment. Researchers say commercial fusion power could be viable within 15 years.',
    category: 'science',
    author: 'James Okafor',
    publishedAt: '2026-03-07',
  },
  {
    title: 'The Four-Day Work Week Experiment',
    content:
      'A two-year study across 200 companies found that a four-day work week maintained or improved productivity in 88% of participants. Employee burnout dropped by a third and retention rates climbed.',
    category: 'business',
    author: 'Maria Lopez',
    publishedAt: '2026-03-09',
  },
  {
    title: 'Antibiotic Resistance: The Silent Pandemic',
    content:
      'Drug-resistant infections now cause more deaths annually than HIV or malaria. Researchers are turning to bacteriophage therapy and AI-driven drug discovery to find new treatments.',
    category: 'health',
    author: 'Elena Vasquez',
    publishedAt: '2026-03-11',
  },
  {
    title: 'Indie Games Outsell AAA Titles on Steam',
    content:
      'Small studios captured the majority of Steam revenue this quarter, driven by creative gameplay mechanics and community-driven development. Players are gravitating toward novel experiences over high-budget sequels.',
    category: 'culture',
    author: 'Raj Patel',
    publishedAt: '2026-03-13',
  },
]

Choose Your Embedding Provider

An embedding model converts text into vectors. This is the only part of the code that changes depending on your provider. Everything after this section works identically with either option.

A default SpinDB Weaviate install runs without vectorizer modules, so we'll use the bring-your-own-vector approach as the primary path (same as the Qdrant guide). If you have vectorizer modules configured, see the alternative approach at the end.

Option A: Local Embeddings (No API Key)

This uses @xenova/transformers to run the all-MiniLM-L6-v2 model directly on your machine. The model downloads automatically on first run (~80MB). No accounts, no API keys.

typescript
import { pipeline } from '@xenova/transformers'

const extractor = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2',
)

const VECTOR_SIZE = 384

async function getEmbeddings(texts: string[]): Promise<number[][]> {
  const embeddings: number[][] = []
  for (const text of texts) {
    const output = await extractor(text, { pooling: 'mean', normalize: true })
    embeddings.push(Array.from(output.data as Float32Array))
  }
  return embeddings
}

Option B: OpenAI Embeddings

If you have an OpenAI API key, you can use text-embedding-3-small instead. Install the SDK and swap in this code:

bash
pnpm add openai
typescript
import OpenAI from 'openai'

const openai = new OpenAI() // uses OPENAI_API_KEY env var

const VECTOR_SIZE = 1536

async function getEmbeddings(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  })
  return response.data.map((d) => d.embedding)
}

Pick one and add it to search.ts. The rest of the code calls getEmbeddings() and uses VECTOR_SIZE. It doesn't know or care which provider generated the vectors.

Connect and Create a Collection

A collection in Weaviate is the equivalent of a table. Unlike Qdrant's schemaless payloads, Weaviate collections have typed property definitions. You declare each property's name and data type upfront, which gives you type-safe queries and more efficient storage. I appreciate the strictness here.

typescript
import weaviate, { type WeaviateClient } from 'weaviate-client'

const client: WeaviateClient = await weaviate.connectToLocal()

const COLLECTION = 'Article'

// Clean up from previous runs
try {
  await client.collections.delete(COLLECTION)
} catch {
  // Collection doesn't exist yet, that's fine
}

await client.collections.create({
  name: COLLECTION,
  properties: [
    { name: 'title', dataType: 'text' },
    { name: 'content', dataType: 'text' },
    { name: 'category', dataType: 'text' },
    { name: 'author', dataType: 'text' },
    { name: 'publishedAt', dataType: 'date' },
  ],
})

console.log(`Created collection "${COLLECTION}"`)

A few details:

  • Collection names are PascalCase in Weaviate (convention, not a hard requirement). Property names are camelCase.
  • dataType: 'text' is for strings you want tokenized and searchable with BM25. Weaviate also supports int, number, boolean, date, blob, and others.
  • We're not specifying a vectorizer because we're bringing our own embeddings. Weaviate will accept raw vectors on insert.

Generate Embeddings and Insert

Convert every article summary into a vector and insert it into Weaviate:

typescript
console.log('Generating embeddings...')

const contents = articles.map((a) => a.content)
const vectors = await getEmbeddings(contents)

const collection = client.collections.get(COLLECTION)

for (let i = 0; i < articles.length; i++) {
  const article = articles[i]
  await collection.data.insert({
    properties: {
      title: article.title,
      content: article.content,
      category: article.category,
      author: article.author,
      publishedAt: new Date(article.publishedAt).toISOString(),
    },
    vectors: vectors[i],
  })
}

console.log(`Inserted ${articles.length} articles`)

Each object gets a UUID assigned automatically. The vectors field accepts our pre-computed embedding. With a vectorizer module configured, you could skip the vectors field entirely and let Weaviate generate embeddings from the text properties.

Vector Search

Since we're providing our own vectors, we use nearVector to pass the query embedding directly:

typescript
async function vectorSearch(query: string, limit = 5) {
  const [queryVector] = await getEmbeddings([query])
  const collection = client.collections.get(COLLECTION)
  return collection.query.nearVector(queryVector, {
    limit,
    returnMetadata: ['distance'],
  })
}

const queries = [
  'latest breakthroughs in renewable energy',
  'how technology is changing the workplace',
  'health discoveries related to the brain',
  'creative industries fighting back against automation',
]

for (const query of queries) {
  console.log(`\n"${query}"`)
  const result = await vectorSearch(query)
  for (const obj of result.objects) {
    const distance = obj.metadata?.distance?.toFixed(3) ?? 'n/a'
    console.log(`  ${distance}  ${obj.properties.title}`)
  }
}

Run it:

bash
npx tsx search.ts

Expected output (exact distances vary by embedding model):

text
"latest breakthroughs in renewable energy"
  0.782  Fusion Energy Milestone at Oxford
  0.943  Deep Ocean Mining Controversy
  0.987  CRISPR Advances in Crop Engineering
  1.028  The Rise of Edge Computing
  1.041  WebAssembly Beyond the Browser

"how technology is changing the workplace"
  0.815  The Four-Day Work Week Experiment
  0.867  Remote Work and Commercial Real Estate
  0.931  Kubernetes at Scale: Lessons from Production
  0.985  The Rise of Edge Computing
  1.034  WebAssembly Beyond the Browser

"health discoveries related to the brain"
  0.741  Gut Microbiome and Mental Health
  0.923  Sleep Architecture and Athletic Performance
  0.987  Antibiotic Resistance: The Silent Pandemic
  1.048  CRISPR Advances in Crop Engineering
  1.102  Fusion Energy Milestone at Oxford

"creative industries fighting back against automation"
  0.822  AI-Generated Art in Major Museums
  0.876  Indie Games Outsell AAA Titles on Steam
  0.944  The Vinyl Revival Hits Streaming Numbers
  1.003  The Four-Day Work Week Experiment
  1.056  Remote Work and Commercial Real Estate

"Health discoveries related to the brain" matches the gut microbiome article. None of those exact words appear in the content, but "gut bacteria populations linked to anxiety and depression" is semantically close to brain health. Keyword matching would miss this entirely.

Weaviate returns distance rather than similarity score (lower is better, with 0 being identical). This is the inverse of Qdrant's score metric, where higher is better.

Hybrid Search

This is where Weaviate really shines. Hybrid search combines BM25 keyword scoring with vector similarity in a single query. The alpha parameter controls the balance:

  • alpha: 0 = pure keyword (BM25 only)
  • alpha: 1 = pure vector (semantic only)
  • alpha: 0.5 = equal weight to both

Keywords are precise ("find articles that literally mention PostgreSQL"). Vectors understand meaning ("find articles about database technology"). Hybrid search gives you both at once, which is the main reason to consider Weaviate for content-heavy search.

typescript
async function hybridSearch(query: string, alpha = 0.5, limit = 5) {
  const [queryVector] = await getEmbeddings([query])
  const collection = client.collections.get(COLLECTION)
  return collection.query.hybrid(query, {
    vector: queryVector,
    alpha,
    limit,
    returnMetadata: ['score'],
  })
}

console.log('\n--- Hybrid Search ---')

const hybridQueries = [
  'Kubernetes production',
  'energy research breakthroughs',
  'four-day work week productivity',
]

for (const query of hybridQueries) {
  console.log(`\n"${query}" (alpha: 0.5)`)
  const result = await hybridSearch(query, 0.5)
  for (const obj of result.objects) {
    const score = obj.metadata?.score?.toFixed(3) ?? 'n/a'
    console.log(`  ${score}  ${obj.properties.title}`)
  }
}
text
--- Hybrid Search ---

"Kubernetes production" (alpha: 0.5)
  0.850  Kubernetes at Scale: Lessons from Production
  0.432  WebAssembly Beyond the Browser
  0.389  The Rise of Edge Computing
  0.201  The Four-Day Work Week Experiment
  0.178  Remote Work and Commercial Real Estate

"energy research breakthroughs" (alpha: 0.5)
  0.812  Fusion Energy Milestone at Oxford
  0.445  Deep Ocean Mining Controversy
  0.398  CRISPR Advances in Crop Engineering
  0.234  Antibiotic Resistance: The Silent Pandemic
  0.187  The Rise of Edge Computing

"four-day work week productivity" (alpha: 0.5)
  0.891  The Four-Day Work Week Experiment
  0.423  Remote Work and Commercial Real Estate
  0.312  Sleep Architecture and Athletic Performance
  0.198  Kubernetes at Scale: Lessons from Production
  0.145  Central Bank Digital Currencies in Practice

"Kubernetes production" scores the Kubernetes article highest because it matches both semantically and on the exact keywords. Pure vector search might rank other tech articles almost as high, but the BM25 component boosts articles containing the literal words "Kubernetes" and "production."

Tuning Alpha

Try running the same query at different alpha values to see how the balance shifts:

typescript
const tuningQuery = 'Kubernetes production'

for (const alpha of [0, 0.25, 0.5, 0.75, 1]) {
  console.log(`\nalpha: ${alpha}`)
  const result = await hybridSearch(tuningQuery, alpha, 3)
  for (const obj of result.objects) {
    const score = obj.metadata?.score?.toFixed(3) ?? 'n/a'
    console.log(`  ${score}  ${obj.properties.title}`)
  }
}

At alpha: 0 (pure keyword), only articles containing those exact words rank well. At alpha: 1 (pure vector), broader tech articles surface. The sweet spot depends on your use case, but 0.5 is a solid default.

Filtered Search

You can combine vector or hybrid search with property filters. Find the best health articles about physical performance:

typescript
const [queryVector] = await getEmbeddings([
  'improving physical performance and recovery',
])
const collection = client.collections.get(COLLECTION)

const filtered = await collection.query.nearVector(queryVector, {
  limit: 3,
  returnMetadata: ['distance'],
  filters: weaviate.filter
    .byProperty('category')
    .equal('health'),
})

console.log('\n"improving physical performance" (health only)')
for (const obj of filtered.objects) {
  const distance = obj.metadata?.distance?.toFixed(3) ?? 'n/a'
  console.log(
    `  ${distance}  ${obj.properties.title} (${obj.properties.category})`,
  )
}
text
"improving physical performance" (health only)
  0.834  Sleep Architecture and Athletic Performance (health)
  0.956  Gut Microbiome and Mental Health (health)
  1.089  Antibiotic Resistance: The Silent Pandemic (health)

Filters work on any typed property. You can combine multiple conditions:

typescript
const techRecent = await collection.query.hybrid('infrastructure and scale', {
  alpha: 0.5,
  limit: 3,
  returnMetadata: ['score'],
  filters: weaviate.filter.and(
    weaviate.filter.byProperty('category').equal('tech'),
    weaviate.filter
      .byProperty('publishedAt')
      .greaterThan(new Date('2026-02-01').toISOString()),
  ),
})

console.log('\n"infrastructure and scale" (tech, after Feb 2026)')
for (const obj of techRecent.objects) {
  const score = obj.metadata?.score?.toFixed(3) ?? 'n/a'
  console.log(`  ${score}  ${obj.properties.title}`)
}
text
"infrastructure and scale" (tech, after Feb 2026)
  0.789  Kubernetes at Scale: Lessons from Production
  0.534  WebAssembly Beyond the Browser
  0.312  The Rise of Edge Computing

Filters apply before the vector comparison, so performance stays consistent even with large datasets. Available operators: equal, notEqual, greaterThan, lessThan, like (wildcard text matching), and containsAny/containsAll for arrays.

Using a Vectorizer Module (Alternative Approach)

If your Weaviate instance has a vectorizer module configured (text2vec-openai, text2vec-transformers, etc.), you can skip the entire embedding step. Weaviate generates vectors automatically on insert and query.

Here's how the collection creation and querying would look with a vectorizer:

typescript
// Create collection with a vectorizer (requires the module to be enabled)
await client.collections.create({
  name: 'Article',
  vectorizers: [
    weaviate.configure.vectorizer.text2VecOpenAI({
      model: 'text-embedding-3-small',
      sourceProperties: ['title', 'content'],
    }),
  ],
  properties: [
    { name: 'title', dataType: 'text' },
    { name: 'content', dataType: 'text' },
    { name: 'category', dataType: 'text' },
    { name: 'author', dataType: 'text' },
    { name: 'publishedAt', dataType: 'date' },
  ],
})

// Insert without vectors (Weaviate generates them from title + content)
const collection = client.collections.get('Article')
await collection.data.insert({
  properties: {
    title: 'The Rise of Edge Computing',
    content: 'Edge computing moves processing closer...',
    category: 'tech',
    author: 'Sarah Chen',
    publishedAt: new Date('2026-01-15').toISOString(),
  },
  // No vectors field needed!
})

// Query with plain text (no embedding code)
const result = await collection.query.nearText(
  'latest breakthroughs in renewable energy',
  { limit: 5 },
)

// Hybrid search also works with just text
const hybrid = await collection.query.hybrid(
  'Kubernetes production',
  { alpha: 0.5, limit: 5 },
)

With a vectorizer, nearText replaces nearVector. Pass a string, Weaviate embeds it, runs the search. Hybrid search gets even simpler because you don't need to pass a vector at all.

This is what differentiates Weaviate most from Qdrant. With Qdrant, you always generate embeddings yourself. With a Weaviate vectorizer module, the database owns the entire embedding pipeline. Less code on your side, and your insert-time and query-time embeddings always use the same model. That consistency eliminates a common source of bugs.

When to Reach for Weaviate

Weaviate is a good pick when pure vector similarity isn't enough:

  • Integrated vectorization: let the database handle embedding so you don't maintain a separate pipeline. Configure a vectorizer module and insert plain text
  • Hybrid search: you need both semantic understanding and exact keyword matching. The alpha parameter lets you tune the balance per query
  • Schema-aware data: your objects have clear structure with typed properties. Weaviate catches data issues at insert time, not query time
  • RAG pipelines: retrieve relevant context from a knowledge base to feed into an LLM. The combination of hybrid search and filters makes it easy to find precisely relevant documents
  • Multi-modal search: with the right vectorizer modules, Weaviate can handle text, images, and other media types in the same collection

If you want explicit control over dense and sparse retrieval with minimal schema, Qdrant's model might be a better fit. If you want the database to own more of the search pipeline, Weaviate's integrated approach pays off.

Wrapping Up

The full script is under 120 lines of real code. You defined a typed schema, stored embeddings in Weaviate, ran semantic searches, and combined vector similarity with keyword matching via hybrid search. The same pattern scales from 15 articles to millions of documents.

The Weaviate documentation covers advanced features like multi-tenancy, generative search (RAG built into queries), reranking, named vectors, and integrations with embedding providers.

To manage your local Weaviate instance:

bash
spindb stop weav1    # Stop the server
spindb start weav1   # Start it again
spindb list          # See all your database instances

With 20+ supported engines, SpinDB lets you run Weaviate alongside Redis, MariaDB, Meilisearch, or whatever else your project calls for. Layerbase Desktop wraps the same thing in a GUI on macOS.

Something not working?