Skip to main content

Hybrid Search in OpenSearch: Combining BM25 and Vector Search with RRF

· 5 min read
FoundryDB Team
Engineering @ FoundryDB

BM25 is precise on exact terms but blind to meaning. Vector search is rich in semantics but can return spurious matches when query and document happen to point in similar directions by coincidence. Hybrid search combines both signals. This post shows how to configure a Reciprocal Rank Fusion (RRF) pipeline in OpenSearch 2.19.1 and documents the difference in results across three approaches, using real scores from a FoundryDB-managed cluster.

All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders.

Prerequisites

  • A running FoundryDB OpenSearch cluster.
  • An ML model registered and deployed, and a k-NN index populated with articles.

Step 1: Create the Hybrid Search Pipeline

OpenSearch search pipelines apply post-processing to search results. The normalization-processor merges scores from multiple sub-queries using either arithmetic combination or RRF.

curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/_search/pipeline/hybrid-rrf-pipeline" \
-H "Content-Type: application/json" \
-d '{
"description": "Hybrid BM25 + k-NN with RRF",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "rrf",
"parameters": {
"rank_constant": 60
}
}
}
}
]
}'

RRF assigns each result a score of 1 / (rank + rank_constant) from each sub-query, then sums them. A rank_constant of 60 is the standard value used in the original RRF paper. It prevents very high-ranked results from dominating too aggressively.

Step 2: Run a Hybrid Query

The hybrid query type wraps a match query and a knn query as siblings. The pipeline handles merging.

curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search?search_pipeline=hybrid-rrf-pipeline" \
-H "Content-Type: application/json" \
-d '{
"size": 5,
"_source": ["title", "category"],
"query": {
"hybrid": {
"queries": [
{"match": {"body": "training neural networks optimization"}},
{"knn": {"body_embedding": {"vector": [/* embed query */], "k": 5}}}
]
}
}
}'

Step 3: Compare Results for "training neural networks optimization"

Here are the results side by side for the same query across all three approaches.

Hybrid (RRF pipeline):

{"results": [
{"score": 2.0, "title": "Gradient Descent and Backpropagation", "category": "machine-learning"},
{"score": 0.6495297, "title": "Introduction to Neural Networks", "category": "machine-learning"},
{"score": 0.47368562, "title": "Deep Learning for Image Recognition", "category": "machine-learning"},
{"score": 0.09604495, "title": "Transformer Architecture Explained", "category": "machine-learning"},
{"score": 0.001, "title": "SQL Query Optimization", "category": "databases"}
]}

Pure BM25 (3 results only, "Transformer Architecture Explained" is absent):

{"results": [
{"score": 3.9399207, "title": "Gradient Descent and Backpropagation"},
{"score": 1.8835347, "title": "Deep Learning for Image Recognition"},
{"score": 1.8013194, "title": "Introduction to Neural Networks"}
]}

Pure k-NN (5 results, including a spurious SQL result):

{"results": [
{"score": 0.8156017, "title": "Gradient Descent and Backpropagation"},
{"score": 0.7332682, "title": "Introduction to Neural Networks"},
{"score": 0.6833046, "title": "Deep Learning for Image Recognition"},
{"score": 0.60384613, "title": "Transformer Architecture Explained"},
{"score": 0.58134717, "title": "SQL Query Optimization"}
]}

What happened: BM25 missed "Transformer Architecture Explained" because it lacks the words "training", "neural", "networks", or "optimization" at high frequency. k-NN found it (score 0.60) but also ranked "SQL Query Optimization" at 0.58, nearly as high. Hybrid kept "Transformer Architecture Explained" in the results and collapsed "SQL Query Optimization" to a score of 0.001, effectively filtering it out.

Step 4: Compare Results for "what is attention in transformers"

Hybrid:

{"results": [
{"score": 2.0, "title": "Transformer Architecture Explained"},
{"score": 0.4155833, "title": "Deep Learning for Image Recognition"},
{"score": 0.1590425, "title": "PostgreSQL vs MySQL"},
{"score": 0.09368464, "title": "Gradient Descent and Backpropagation"},
{"score": 0.0631755, "title": "Introduction to Neural Networks"}
]}

Pure BM25 (3 results with false positives):

{"total": 3, "results": [
{"score": 4.7423515, "title": "Transformer Architecture Explained"},
{"score": 1.8283734, "title": "PostgreSQL vs MySQL"},
{"score": 1.2772797, "title": "Time Series Databases"}
]}

BM25 surfaced "PostgreSQL vs MySQL" and "Time Series Databases" because short, common words like "in" and "is" appear in many documents and inflate term overlap scores. Hybrid still shows "PostgreSQL vs MySQL" (score 0.159), but it is ranked well below the correct result and its low score makes it easy to cut with a threshold.

Understanding RRF Scores

An RRF score of 2.0 means the document ranked first in both sub-queries. That is the maximum possible score when rank_constant=60: each sub-query contributes 1 / (1 + 60) = 0.01639, but after min_max normalization the scores are scaled to the [0, 1] range, so a document that is first in both ends up at 2.0. A score of 0.001 means the document barely appeared in one of the two ranked lists, typically due to a weak semantic match in k-NN with no corresponding BM25 hit.

When to Use Each Approach

SituationRecommended approach
Exact product codes, IDs, structured fieldsBM25 (match or term query)
Long-form content, conversational queriesk-NN vector search
General search box, mixed contentHybrid with RRF
You need explainable scoresBM25 (scores are interpretable)
You need recall on paraphrased queriesk-NN or hybrid

What's Next

Try hybrid search on your own data at foundrydb.com. Full documentation at docs.foundrydb.com.