Vector Search in OpenSearch: Embeddings, k-NN Indexes, and HNSW

April 14, 2026 · 6 min read

Engineering @ FoundryDB

Keyword search breaks when users phrase queries differently from the words in your documents. Vector search fixes this by comparing meaning rather than tokens. This post walks through registering a sentence embedding model, building a k-NN index with an ingest pipeline, and running semantic queries against a FoundryDB-managed OpenSearch 2.19.1 cluster. All scores and outputs are from a real test run.

All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders. Replace them with your cluster domain and app_user password from the FoundryDB dashboard.

pgvector similarity search · query → HNSW → top-k

TOP-K vector → HNSW index → filter → nearest rows

Queryvector | text→ANN · HNSWcosine <=>AND filter →Top-kby distance

Query / top-kServer-side embedANN search · tableHNSW indexEquality filter (WHERE)index / predicate edge (dashed)

Prerequisites

A running FoundryDB OpenSearch cluster. See the OpenSearch provisioning guide in the documentation.
curl and jq installed locally.

Step 1: Configure ML Commons for a Single-Node Cluster

By default, OpenSearch ML Commons restricts model execution to dedicated ML nodes. On a single-node cluster, you need to relax three settings before you can register or deploy any model.

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/_cluster/settings" \
  -H "Content-Type: application/json" \
  -d '{
    "persistent": {
      "plugins.ml_commons.only_run_on_ml_node": false,
      "plugins.ml_commons.native_memory_threshold": 99,
      "plugins.ml_commons.jvm_heap_memory_threshold": 99
    }
  }'

only_run_on_ml_node: false allows the data node to run inference. The two threshold settings prevent ML Commons from refusing to load the model when memory is already in use.

Step 2: Register and Deploy the Embedding Model

The model used in these tests is huggingface/sentence-transformers/all-MiniLM-L6-v2. It produces 384-dimensional embeddings and runs entirely within OpenSearch via ML Commons, so no external embedding API is required.

# Register the model (completes in ~4 seconds)
curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/_register" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
    "version": "1.0.1",
    "model_format": "TORCH_SCRIPT"
  }'

The response contains a task_id. Poll it until status is COMPLETED, then note the model_id.

# Deploy the model (completes in ~10 seconds)
curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_deploy"

Verify with a test inference call:

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_MODEL_ID/_predict" \
  -H "Content-Type: application/json" \
  -d '{"text_docs": ["hello world"], "return_number": true, "target_response": ["sentence_embedding"]}'

The response confirms the model is live. The first three values of the 384-dimensional vector from the test run:

{"output": [{"name": "sentence_embedding", "shape": [384], "first_3_values": [-0.019954572, 0.009878026, 0.010249609]}]}

Step 3: Create an Ingest Pipeline

Rather than embedding documents in your application code before indexing, define an ingest pipeline with a text_embedding processor. OpenSearch calls the model at index time and writes the vector into a separate field automatically.

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/_ingest/pipeline/text-embedding-pipeline" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Embed body field into body_embedding",
    "processors": [
      {
        "text_embedding": {
          "model_id": "YOUR_MODEL_ID",
          "field_map": {
            "body": "body_embedding"
          }
        }
      }
    ]
  }'

Step 4: Create the k-NN Index

The index needs two things: index.knn: true in settings, and a knn_vector field with the HNSW parameters.

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/articles-knn" \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "index.knn": true,
      "default_pipeline": "text-embedding-pipeline",
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "title":          {"type": "text"},
        "body":           {"type": "text"},
        "category":       {"type": "keyword"},
        "body_embedding": {
          "type": "knn_vector",
          "dimension": 384,
          "method": {
            "name": "hnsw",
            "space_type": "cosinesimil",
            "engine": "lucene",
            "parameters": {"ef_construction": 128, "m": 16}
          }
        }
      }
    }
  }'

cosinesimil measures the angle between vectors rather than their magnitude, which is appropriate for sentence embeddings. ef_construction=128 and m=16 are standard starting values for HNSW. The Lucene engine is the default for OpenSearch 2.x and requires no additional plugins.

Step 5: Ingest Documents

With default_pipeline set on the index, a normal index request triggers embedding automatically. All 8 articles in the test were ingested without errors.

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_doc" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Introduction to Neural Networks",
    "body": "Neural networks are computational models inspired by the human brain. They learn patterns from data by adjusting weights through backpropagation.",
    "category": "machine-learning"
  }'

Repeat for each document. The pipeline runs synchronously during ingestion, so by the time the 201 Created response arrives, the embedding is already stored.

Step 6: Run Semantic Search

Semantic search uses a knn query. The query text is embedded at search time using the same model.

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "size": 4,
    "_source": ["title", "category"],
    "query": {
      "knn": {
        "body_embedding": {
          "vector": [/* embed "how do machines learn from data" first */],
          "k": 4
        }
      }
    }
  }'

Results for the query "how do machines learn from data":

{"total": 4, "results": [
  {"score": 0.7610047, "title": "Introduction to Neural Networks",           "category": "machine-learning"},
  {"score": 0.7246155, "title": "Gradient Descent and Backpropagation",      "category": "machine-learning"},
  {"score": 0.7074836, "title": "Deep Learning for Image Recognition",       "category": "machine-learning"},
  {"score": 0.6563286, "title": "Transformer Architecture Explained",        "category": "machine-learning"}
]}

Results for the query "speeding up slow database queries":

{"results": [
  {"score": 0.8420365,  "title": "SQL Query Optimization",         "category": "databases"},
  {"score": 0.78991985, "title": "Database Indexing Strategies",   "category": "databases"},
  {"score": 0.666918,   "title": "PostgreSQL vs MySQL",            "category": "databases"},
  {"score": 0.6576359,  "title": "Time Series Databases",          "category": "databases"}
]}

Step 7: Compare with BM25

Run the same "speeding up slow database queries" query as a standard match search:

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-knn/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "size": 5,
    "_source": ["title"],
    "query": {"match": {"body": "speeding up slow database queries"}}
  }'

BM25 returned only 3 results:

{"results": [
  {"score": 3.6567469, "title": "Database Indexing Strategies"},
  {"score": 2.9955664, "title": "SQL Query Optimization"},
  {"score": 1.3071092, "title": "PostgreSQL vs MySQL"}
]}

BM25 missed "Time Series Databases" entirely. The document does not contain the words "speeding", "slow", or "queries", so BM25 assigns it a score of zero. The k-NN search returned it with a score of 0.6576 because the embedding captures the semantic connection between time-series workloads and database query performance.

This is the practical difference: k-NN retrieves by meaning, BM25 retrieves by word overlap.

What's Next

Hybrid Search: Combining BM25 and Vector Search with RRF
To provision your own OpenSearch cluster, visit foundrydb.com. Documentation is at docs.foundrydb.com.

Prerequisites​

Step 1: Configure ML Commons for a Single-Node Cluster​

Step 2: Register and Deploy the Embedding Model​

Step 3: Create an Ingest Pipeline​

Step 4: Create the k-NN Index​

Step 5: Ingest Documents​

Step 6: Run Semantic Search​

Step 7: Compare with BM25​

What's Next​