Neural Sparse Search in OpenSearch: Semantic Matching Without a GPU

April 14, 2026 · 5 min read

Engineering @ FoundryDB

Dense vector search (k-NN) is powerful but requires embedding both documents and queries with a neural model at query time. Neural sparse search takes a different approach: expand tokens with learned weights at index time, store them as a rank_features field, and at query time do a fast lookup rather than a vector computation. The result is semantic search with no GPU requirement at query time. This post shows the full setup on a live OpenSearch 2.19.1 cluster managed by FoundryDB.

All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders.

pgvector similarity search · query → HNSW → top-k

TOP-K vector → HNSW index → filter → nearest rows

Queryvector | text→ANN · HNSWcosine <=>AND filter →Top-kby distance

Query / top-kServer-side embedANN search · tableHNSW indexEquality filter (WHERE)index / predicate edge (dashed)

Prerequisites

A running FoundryDB OpenSearch cluster.
ML Commons settings configured for a single-node cluster (see the vector search k-NN guide, Step 1).

Step 1: Register and Deploy the Sparse Encoding Model

The model used is amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill, version 1.0.0. This is Amazon's distilled sparse encoder, designed specifically for doc-only mode.

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/_register" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill",
    "version": "1.0.0",
    "model_format": "TORCH_SCRIPT"
  }'

Registration completed in approximately 15 seconds. Deployment completed in approximately 10 seconds.

# Deploy after registration completes
curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/_plugins/_ml/models/YOUR_SPARSE_MODEL_ID/_deploy"

Step 2: Create the Ingest Pipeline

The processor type is sparse_encoding, not text_embedding. The output field must be of type rank_features.

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/_ingest/pipeline/sparse-encoding-pipeline" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Sparse encoding for neural sparse search",
    "processors": [
      {
        "sparse_encoding": {
          "model_id": "YOUR_SPARSE_MODEL_ID",
          "field_map": {
            "body": "body_sparse"
          }
        }
      }
    ]
  }'

Step 3: Create the Index with rank_features Field

The critical difference from k-NN: the sparse field uses rank_features, not knn_vector. There is no dimension to specify. OpenSearch stores a map of token strings to float weights.

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/articles-sparse" \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "default_pipeline": "sparse-encoding-pipeline",
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "title":       {"type": "text"},
        "body":        {"type": "text"},
        "category":    {"type": "keyword"},
        "body_sparse": {"type": "rank_features"}
      }
    }
  }'

Step 4: Inspect the Stored Token Weights

After ingesting a document, you can retrieve the raw sparse representation to understand what the model learned. Here are the top 10 tokens by weight for the "Introduction to Neural Networks" article:

[
  {"token": "network",  "weight": 1.3393087},
  {"token": "neural",   "weight": 1.3120507},
  {"token": "brain",    "weight": 1.2210519},
  {"token": "networks", "weight": 1.1138264},
  {"token": "model",    "weight": 0.89889926},
  {"token": "learning", "weight": 0.8906413},
  {"token": "learn",    "weight": 0.8564229},
  {"token": "patterns", "weight": 0.8360702},
  {"token": "models",   "weight": 0.82590413},
  {"token": "neurons",  "weight": 0.8258773}
]

The model automatically expanded the vocabulary: "neural" became "neurons", "network" became "networks", "learn" became both "learn" and "learning". This is term expansion through learned associations, not a stemmer. The weights represent how strongly each token characterises the document in the model's vocabulary space.

Step 5: Run Neural Sparse Search Queries

The query type is neural_sparse. Pass the raw text; the model encodes it into sparse token weights at query time (this is the only moment the model runs at query time in doc-only mode).

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/articles-sparse/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "size": 5,
    "_source": ["title", "category"],
    "query": {
      "neural_sparse": {
        "body_sparse": {
          "query_text": "how machines learn",
          "model_id": "YOUR_SPARSE_MODEL_ID"
        }
      }
    }
  }'

Results for "how machines learn":

{"results": [
  {"score": 4.638024,  "title": "Deep Learning for Image Recognition",  "category": "machine-learning"},
  {"score": 3.7028239, "title": "Introduction to Neural Networks",      "category": "machine-learning"},
  {"score": 3.3865764, "title": "Gradient Descent and Backpropagation", "category": "machine-learning"},
  {"score": 0.5032142, "title": "Transformer Architecture Explained",   "category": "machine-learning"}
]}

Results for "slow database query fix":

{"results": [
  {"score": 19.170017, "title": "SQL Query Optimization",       "category": "databases"},
  {"score": 9.8290825, "title": "Database Indexing Strategies", "category": "databases"},
  {"score": 6.0629683, "title": "Time Series Databases",        "category": "databases"},
  {"score": 5.0210533, "title": "PostgreSQL vs MySQL",          "category": "databases"}
]}

The scores are not cosine similarities (0 to 1) as in dense vector search. They are dot products between the query sparse vector and the document sparse vector, so they can be larger than 1. What matters is the relative ranking and the gap between results.

How Doc-Only Mode Works

In doc-only mode, the model runs once per document at index time and stores the expanded token weights in rank_features. At query time, the raw query text is also encoded into token weights, and OpenSearch computes the dot product between the query weights and each document's stored weights. This is equivalent to a weighted term match, which is fast and can be accelerated by standard inverted index infrastructure.

The alternative is bi-encoder mode, where the model runs at query time for every query. Doc-only mode trades some relevance quality for much lower query latency and zero GPU requirement at query time.

Comparing Sparse vs Dense Vector Search

Property	Neural sparse (doc-only)	Dense k-NN
GPU at query time	Not required	Not required (Lucene engine)
GPU at index time	Not required	Not required
Score range	Unbounded dot product	0 to 1 (cosine)
Field type	`rank_features`	`knn_vector`
Vocabulary	Expanded tokens	384-dim float vector
Explainability	Token weights visible	Black box
Index size	Typically larger	Fixed (dimension x 4 bytes)

What's Next

Hybrid search combining BM25 and vector search with RRF
Vector search in OpenSearch: embeddings, k-NN indexes, and HNSW
Building a RAG pipeline with OpenSearch as the vector store

Provision a FoundryDB OpenSearch cluster at foundrydb.com. Documentation at docs.foundrydb.com.

Prerequisites​

Step 1: Register and Deploy the Sparse Encoding Model​

Step 2: Create the Ingest Pipeline​

Step 3: Create the Index with rank_features Field​

Step 4: Inspect the Stored Token Weights​

Step 5: Run Neural Sparse Search Queries​

How Doc-Only Mode Works​

Comparing Sparse vs Dense Vector Search​

What's Next​