Vector Search in OpenSearch: Embeddings, k-NN Indexes, and HNSW
Keyword search breaks when users phrase queries differently from the words in your documents. Vector search fixes this by comparing meaning rather than tokens. This post walks through registering a sentence embedding model, building a k-NN index with an ingest pipeline, and running semantic queries against a FoundryDB-managed OpenSearch 2.19.1 cluster. All scores and outputs are from a real test run.
All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders. Replace them with your cluster domain and app_user password from the FoundryDB dashboard.
Prerequisites
- A running FoundryDB OpenSearch cluster. See the OpenSearch provisioning guide in the documentation.
curlandjqinstalled locally.
Step 1: Configure ML Commons for a Single-Node Cluster
By default, OpenSearch ML Commons restricts model execution to dedicated ML nodes. On a single-node cluster, you need to relax three settings before you can register or deploy any model.
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d '{
"persistent": {
"plugins.ml_commons.only_run_on_ml_node": false,
"plugins.ml_commons.native_memory_threshold": 99,
"plugins.ml_commons.jvm_heap_memory_threshold": 99
}
}'
only_run_on_ml_node: false allows the data node to run inference. The two threshold settings prevent ML Commons from refusing to load the model when memory is already in use.
Step 2: Register and Deploy the Embedding Model
The model used in these tests is huggingface/sentence-transformers/all-MiniLM-L6-v2. It produces 384-dimensional embeddings and runs entirely within OpenSearch via ML Commons, so no external embedding API is required.
# Register the model (completes in ~4 seconds)
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/_register" \
-H "Content-Type: application/json" \
-d '{
"name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}'
The response contains a task_id. Poll it until status is COMPLETED, then note the model_id.
# Deploy the model (completes in ~10 seconds)
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_deploy"
Verify with a test inference call:
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_predict" \
-H "Content-Type: application/json" \
-d '{"text_docs": ["hello world"], "return_number": true, "target_response": ["sentence_embedding"]}'
The response confirms the model is live. The first three values of the 384-dimensional vector from the test run:
{"output": [{"name": "sentence_embedding", "shape": [384], "first_3_values": [-0.019954572, 0.009878026, 0.010249609]}]}
Step 3: Create an Ingest Pipeline
Rather than embedding documents in your application code before indexing, define an ingest pipeline with a text_embedding processor. OpenSearch calls the model at index time and writes the vector into a separate field automatically.
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/_ingest/pipeline/text-embedding-pipeline" \
-H "Content-Type: application/json" \
-d '{
"description": "Embed body field into body_embedding",
"processors": [
{
"text_embedding": {
"model_id": "YOUR_MODEL_ID",
"field_map": {
"body": "body_embedding"
}
}
}
]
}'
Step 4: Create the k-NN Index
The index needs two things: index.knn: true in settings, and a knn_vector field with the HNSW parameters.
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/articles-knn" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"index.knn": true,
"default_pipeline": "text-embedding-pipeline",
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {"type": "text"},
"body": {"type": "text"},
"category": {"type": "keyword"},
"body_embedding": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {"ef_construction": 128, "m": 16}
}
}
}
}
}'
cosinesimil measures the angle between vectors rather than their magnitude, which is appropriate for sentence embeddings. ef_construction=128 and m=16 are standard starting values for HNSW. The Lucene engine is the default for OpenSearch 2.x and requires no additional plugins.
Step 5: Ingest Documents
With default_pipeline set on the index, a normal index request triggers embedding automatically. All 8 articles in the test were ingested without errors.
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_doc" \
-H "Content-Type: application/json" \
-d '{
"title": "Introduction to Neural Networks",
"body": "Neural networks are computational models inspired by the human brain. They learn patterns from data by adjusting weights through backpropagation.",
"category": "machine-learning"
}'
Repeat for each document. The pipeline runs synchronously during ingestion, so by the time the 201 Created response arrives, the embedding is already stored.
Step 6: Run Semantic Search
Semantic search uses a knn query. The query text is embedded at search time using the same model.
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
-H "Content-Type: application/json" \
-d '{
"size": 4,
"_source": ["title", "category"],
"query": {
"knn": {
"body_embedding": {
"vector": [/* embed "how do machines learn from data" first */],
"k": 4
}
}
}
}'
Results for the query "how do machines learn from data":
{"total": 4, "results": [
{"score": 0.7610047, "title": "Introduction to Neural Networks", "category": "machine-learning"},
{"score": 0.7246155, "title": "Gradient Descent and Backpropagation", "category": "machine-learning"},
{"score": 0.7074836, "title": "Deep Learning for Image Recognition", "category": "machine-learning"},
{"score": 0.6563286, "title": "Transformer Architecture Explained", "category": "machine-learning"}
]}
Results for the query "speeding up slow database queries":
{"results": [
{"score": 0.8420365, "title": "SQL Query Optimization", "category": "databases"},
{"score": 0.78991985, "title": "Database Indexing Strategies", "category": "databases"},
{"score": 0.666918, "title": "PostgreSQL vs MySQL", "category": "databases"},
{"score": 0.6576359, "title": "Time Series Databases", "category": "databases"}
]}
Step 7: Compare with BM25
Run the same "speeding up slow database queries" query as a standard match search:
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
-H "Content-Type: application/json" \
-d '{
"size": 5,
"_source": ["title"],
"query": {"match": {"body": "speeding up slow database queries"}}
}'
BM25 returned only 3 results:
{"results": [
{"score": 3.6567469, "title": "Database Indexing Strategies"},
{"score": 2.9955664, "title": "SQL Query Optimization"},
{"score": 1.3071092, "title": "PostgreSQL vs MySQL"}
]}
BM25 missed "Time Series Databases" entirely. The document does not contain the words "speeding", "slow", or "queries", so BM25 assigns it a score of zero. The k-NN search returned it with a score of 0.6576 because the embedding captures the semantic connection between time-series workloads and database query performance.
This is the practical difference: k-NN retrieves by meaning, BM25 retrieves by word overlap.
What's Next
- Hybrid Search: Combining BM25 and Vector Search with RRF
- To provision your own OpenSearch cluster, visit foundrydb.com. Documentation is at docs.foundrydb.com.