Full-Text Search Without Elasticsearch: BM25 in Pure JavaScript

You need search in your application. Not "filter a list by substring" search — real, ranked, relevant search. The kind where typing "javscript tutrial" still finds "JavaScript Tutorial" and the most relevant results appear first. So you look at your options.

The Infrastructure Tax

Elasticsearch

$500+/mo

3-node cluster minimum for production. Plus DevOps overhead, JVM tuning, index management, and monitoring.

forge/search

$0/mo

Runs in-process. No cluster, no JVM, no additional infrastructure. Ships with your application.

Elasticsearch is extraordinary software. It powers search at Netflix, GitHub, and Wikipedia. But it is a distributed Java application that requires dedicated infrastructure, cluster management, and ongoing operational expertise. For the vast majority of applications — those with under 10 million documents — it is like renting a cargo ship to carry a backpack.

Algolia is simpler to operate, but you pay per search operation. At scale, costs climb quickly: $1.50 per 1,000 search requests on their standard plan. An application handling 100,000 searches per day hits $4,500/month in Algolia fees alone.

MeiliSearch and Typesense are lighter alternatives, but they are still separate services. You deploy them, connect to them over the network, and maintain them as part of your infrastructure. Another server to patch. Another service to monitor. Another failure point in your architecture.

What If Search Was Just a Function Call?

forge/search implements the BM25 ranking algorithm — the same algorithm that Elasticsearch uses by default — in pure JavaScript. It runs in your application process. No network calls. No separate service. No infrastructure.

forge/search — create an index

import { createIndex } from '@hyperbridge/forge/search';

const index = createIndex({
  fields: {
    title:   { weight: 3.0, stored: true },
    body:    { weight: 1.0, stored: true },
    tags:    { weight: 2.0, facet: true },
    author:  { weight: 1.5, facet: true },
  },
  language: 'en',
});

// Index your documents
await index.addDocuments(articles);
// Indexes 50,000 documents in ~800ms

BM25 Ranking That Actually Works

BM25 (Best Matching 25) is a probabilistic ranking function that considers term frequency, inverse document frequency, and document length normalization. In simpler terms: it ranks documents higher when they contain your search terms more often, penalizes common words that appear everywhere, and does not unfairly favor long documents over short ones.

forge/search — querying

// Simple search with BM25 ranking
const results = index.search('javascript async patterns');
// → [{ id, title, body, score: 12.847, highlights }]

// Search with field boosting
const results = index.search('authentication', {
  boostFields: { title: 5.0 },
  limit: 20,
  offset: 0,
});

Fuzzy Matching for Real-World Typos

Users misspell words. They type "authetication" instead of "authentication", "recieve" instead of "receive". A search engine that only does exact matching is a search engine that frustrates users.

forge/search — fuzzy search

// Fuzzy search with configurable edit distance
const results = index.search('javscript tutrial', {
  fuzzy: true,
  maxDistance: 2,
});
// → Finds "JavaScript Tutorial" (distance 1 + distance 1)

// Prefix matching for autocomplete
const suggestions = index.search('reac', {
  prefix: true,
  limit: 5,
});
// → "React", "Reactive", "React Hooks", ...

Faceted Search for Filtering

E-commerce sites, documentation browsers, and content platforms need faceted search: filtering by category, tag, date range, or any other attribute while showing aggregated counts. Elasticsearch excels at this — but forge/search handles it too.

forge/search — facets and filters

const results = index.search('state management', {
  facets: ['tags', 'author'],
  filters: {
    tags: ['react', 'javascript'],
  },
});

// results.facets:
// {
//   tags: { react: 45, javascript: 38, vue: 12, svelte: 8 },
//   author: { "Dan Abramov": 6, "Ryan Carniato": 4, ... }
// }

Geosearch for Location-Aware Apps

forge/search — geospatial queries

// Index documents with coordinates
const geoIndex = createIndex({
  fields: {
    name: { weight: 2.0 },
    cuisine: { facet: true },
  },
  geo: { field: 'location' },
});

// Find restaurants within 2km
const nearby = geoIndex.search('biryani', {
  geoWithin: {
    center: { lat: 13.0827, lng: 80.2707 }, // Chennai
    radiusKm: 2,
  },
  sortBy: 'distance',
});

Multilingual Support

forge/search — multilingual

// Built-in stemmers and stop words for 15+ languages
const frIndex = createIndex({
  fields: { titre: { weight: 2 }, contenu: {} },
  language: 'fr',
});
// French stemmer: "mangeons" → "mang", "mangez" → "mang"

// Multi-language index
const multiIndex = createIndex({
  fields: { title: {}, body: {} },
  language: 'auto', // detect per-document
});

Performance at Scale

The natural question: can an in-process JavaScript search engine actually perform? We benchmarked forge/search against a 3-node Elasticsearch cluster using the English Wikipedia dataset (6.7 million articles).

For datasets under 1 million documents, forge/search matches or beats Elasticsearch on query latency — because there is zero network overhead.

At 100,000 documents (a typical SaaS application), forge/search returns BM25-ranked results in 2-8ms. That is faster than the network round-trip to Elasticsearch alone, which typically adds 5-15ms even on the same VPC. Indexing 100,000 documents takes approximately 1.6 seconds, and the resulting index consumes around 45MB of memory.

At 1 million documents, query times increase to 15-40ms — still well within acceptable latency for user-facing search. The index occupies roughly 400MB of memory. For most applications, this is entirely reasonable.

At 10 million documents and beyond, you probably do need a dedicated search service. forge/search is not trying to replace Elasticsearch for petabyte-scale log analytics or billion-document search. It is replacing it for the 95% of applications where Elasticsearch is overkill.

Persistence and Serialization

forge/search — persistence

// Serialize index to disk or storage
const snapshot = index.serialize();
await writeFile('search-index.bin', snapshot);

// Restore in ~50ms (no re-indexing)
const data = await readFile('search-index.bin');
const restored = createIndex.fromSnapshot(data);

The serialized format is a compact binary representation that loads orders of magnitude faster than re-indexing from source documents. Ship a pre-built index with your application, or store it in S3 and load it on server start.

When to Reach for Elasticsearch

If you need distributed search across multiple nodes, real-time log aggregation, or complex analytics queries across billions of documents — use Elasticsearch. It is purpose-built for that scale.

But if you are building a blog with search, a SaaS product search, documentation search, an e-commerce catalog under a million products, or any application where search is a feature rather than the product itself — you do not need a $500/month cluster. You need a function call.

HBForge is currently available exclusively for enterprise clients. Opening to the Developer Community on June 25, 2026.

Contact kr@hyperbridge.in for early access.