Skip to main content

Vector Search in OpenSearch: Embeddings, k-NN Indexes, and HNSW

· 6 min read
FoundryDB Team
Engineering @ FoundryDB

Keyword search breaks when users phrase queries differently from the words in your documents. Vector search fixes this by comparing meaning rather than tokens. This post walks through registering a sentence embedding model, building a k-NN index with an ingest pipeline, and running semantic queries against a FoundryDB-managed OpenSearch 2.19.1 cluster. All scores and outputs are from a real test run.

All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders. Replace them with your cluster domain and app_user password from the FoundryDB dashboard.

Prerequisites

  • A running FoundryDB OpenSearch cluster. See the OpenSearch provisioning guide in the documentation.
  • curl and jq installed locally.

Step 1: Configure ML Commons for a Single-Node Cluster

By default, OpenSearch ML Commons restricts model execution to dedicated ML nodes. On a single-node cluster, you need to relax three settings before you can register or deploy any model.

curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d '{
"persistent": {
"plugins.ml_commons.only_run_on_ml_node": false,
"plugins.ml_commons.native_memory_threshold": 99,
"plugins.ml_commons.jvm_heap_memory_threshold": 99
}
}'

only_run_on_ml_node: false allows the data node to run inference. The two threshold settings prevent ML Commons from refusing to load the model when memory is already in use.

Step 2: Register and Deploy the Embedding Model

The model used in these tests is huggingface/sentence-transformers/all-MiniLM-L6-v2. It produces 384-dimensional embeddings and runs entirely within OpenSearch via ML Commons, so no external embedding API is required.

# Register the model (completes in ~4 seconds)
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/_register" \
-H "Content-Type: application/json" \
-d '{
"name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}'

The response contains a task_id. Poll it until status is COMPLETED, then note the model_id.

# Deploy the model (completes in ~10 seconds)
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_deploy"

Verify with a test inference call:

curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_predict" \
-H "Content-Type: application/json" \
-d '{"text_docs": ["hello world"], "return_number": true, "target_response": ["sentence_embedding"]}'

The response confirms the model is live. The first three values of the 384-dimensional vector from the test run:

{"output": [{"name": "sentence_embedding", "shape": [384], "first_3_values": [-0.019954572, 0.009878026, 0.010249609]}]}

Step 3: Create an Ingest Pipeline

Rather than embedding documents in your application code before indexing, define an ingest pipeline with a text_embedding processor. OpenSearch calls the model at index time and writes the vector into a separate field automatically.

curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/_ingest/pipeline/text-embedding-pipeline" \
-H "Content-Type: application/json" \
-d '{
"description": "Embed body field into body_embedding",
"processors": [
{
"text_embedding": {
"model_id": "YOUR_MODEL_ID",
"field_map": {
"body": "body_embedding"
}
}
}
]
}'

Step 4: Create the k-NN Index

The index needs two things: index.knn: true in settings, and a knn_vector field with the HNSW parameters.

curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/articles-knn" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"index.knn": true,
"default_pipeline": "text-embedding-pipeline",
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {"type": "text"},
"body": {"type": "text"},
"category": {"type": "keyword"},
"body_embedding": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {"ef_construction": 128, "m": 16}
}
}
}
}
}'

cosinesimil measures the angle between vectors rather than their magnitude, which is appropriate for sentence embeddings. ef_construction=128 and m=16 are standard starting values for HNSW. The Lucene engine is the default for OpenSearch 2.x and requires no additional plugins.

Step 5: Ingest Documents

With default_pipeline set on the index, a normal index request triggers embedding automatically. All 8 articles in the test were ingested without errors.

curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_doc" \
-H "Content-Type: application/json" \
-d '{
"title": "Introduction to Neural Networks",
"body": "Neural networks are computational models inspired by the human brain. They learn patterns from data by adjusting weights through backpropagation.",
"category": "machine-learning"
}'

Repeat for each document. The pipeline runs synchronously during ingestion, so by the time the 201 Created response arrives, the embedding is already stored.

Semantic search uses a knn query. The query text is embedded at search time using the same model.

curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
-H "Content-Type: application/json" \
-d '{
"size": 4,
"_source": ["title", "category"],
"query": {
"knn": {
"body_embedding": {
"vector": [/* embed "how do machines learn from data" first */],
"k": 4
}
}
}
}'

Results for the query "how do machines learn from data":

{"total": 4, "results": [
{"score": 0.7610047, "title": "Introduction to Neural Networks", "category": "machine-learning"},
{"score": 0.7246155, "title": "Gradient Descent and Backpropagation", "category": "machine-learning"},
{"score": 0.7074836, "title": "Deep Learning for Image Recognition", "category": "machine-learning"},
{"score": 0.6563286, "title": "Transformer Architecture Explained", "category": "machine-learning"}
]}

Results for the query "speeding up slow database queries":

{"results": [
{"score": 0.8420365, "title": "SQL Query Optimization", "category": "databases"},
{"score": 0.78991985, "title": "Database Indexing Strategies", "category": "databases"},
{"score": 0.666918, "title": "PostgreSQL vs MySQL", "category": "databases"},
{"score": 0.6576359, "title": "Time Series Databases", "category": "databases"}
]}

Step 7: Compare with BM25

Run the same "speeding up slow database queries" query as a standard match search:

curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
-H "Content-Type: application/json" \
-d '{
"size": 5,
"_source": ["title"],
"query": {"match": {"body": "speeding up slow database queries"}}
}'

BM25 returned only 3 results:

{"results": [
{"score": 3.6567469, "title": "Database Indexing Strategies"},
{"score": 2.9955664, "title": "SQL Query Optimization"},
{"score": 1.3071092, "title": "PostgreSQL vs MySQL"}
]}

BM25 missed "Time Series Databases" entirely. The document does not contain the words "speeding", "slow", or "queries", so BM25 assigns it a score of zero. The k-NN search returned it with a score of 0.6576 because the embedding captures the semantic connection between time-series workloads and database query performance.

This is the practical difference: k-NN retrieves by meaning, BM25 retrieves by word overlap.

What's Next