Building a RAG Pipeline with OpenSearch as the Vector Store
Retrieval-Augmented Generation (RAG) augments a language model's response by first retrieving relevant context from a database, then passing that context into the prompt. OpenSearch is a natural fit for the retrieval step: it runs the embedding model internally, stores the vectors, and returns ranked results in a single query. This post shows the retrieval step with real scores from a live OpenSearch 2.19.1 cluster managed by FoundryDB, and explains how to wire the retrieved chunks into a prompt and call an LLM.
This post uses a dedicated knowledge base index with 6 database documentation chunks, embedded using all-MiniLM-L6-v2 (384 dimensions). The retrieval, prompt assembly, and a complete prompt were all tested on a live FoundryDB cluster.