OpenSearch Performance Tuning: Refresh Intervals, Bulk Sizing, and Shard Analysis

April 14, 2026 · 6 min read

Engineering @ FoundryDB

Getting good write throughput from OpenSearch requires understanding three things: the refresh cycle, translog durability, and shard sizing. This post benchmarks bulk indexing with default and tuned settings on a live OpenSearch 2.19.1 cluster managed by FoundryDB. The cluster is a single-node tier-2 (2 CPU, 4 GB RAM), which is the smallest configuration available. Numbers from larger nodes will be better, but the ratios between default and tuned settings hold.

All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders.

OpenSearch cluster, query fan-out & gather

Cluster green · search fans out to one copy per shard, then gathers

Coordinatorfan-out / gatherquery →Data nodesP0 P1 P2 · R0 R1 R2⇠ hitsCluster-managershard allocation

Cluster-managerCoordinatorData nodePrimary shardReplica shardcluster state / gather (dashed)

Understanding the Defaults

OpenSearch refreshes segments every 1 second by default. A refresh makes new documents searchable but has a cost: it flushes in-memory buffers to disk-readable segments, which takes CPU and I/O. During high-ingest windows, 1-second refresh intervals mean the cluster spends significant time managing segments instead of writing data.

The translog (translog.durability: request) syncs to disk after every index operation by default. This guarantees durability but adds latency on every write. For bulk loads where you accept that a machine failure might lose the last few seconds of data, async translog sync is a reasonable trade.

Baseline: Default Settings

Create an index with default settings and bulk-ingest 500 documents:

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-default" \
  -H "Content-Type: application/json" \
  -d '{"settings": {"number_of_shards": 1, "number_of_replicas": 0}}'

# Generate a 500-document bulk payload and POST it
curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/perf-default/_bulk" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @bulk-500.ndjson | jq '{took, errors}'

Result: {"took": 218, "errors": false}

The same 500-document bulk ingest took 218 ms with default settings.

Tuned Settings

Apply three changes to a new index:

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-tuned" \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "refresh_interval": "30s",
      "translog.durability": "async",
      "translog.sync_interval": "10s"
    }
  }'

Run the same 500-document bulk ingest against the tuned index:

Result: {"took": 54, "errors": false}

Tuned settings: 54 ms. That is a 75% improvement (218 ms to 54 ms) on the same hardware for the same data.

2000-Document Comparison

Documents	Default	Tuned	Improvement
500	218 ms	54 ms	75% faster
2000	116 ms	70 ms	40% faster

The 2000-document result is interesting: the default took 116 ms, which is lower than the 500-document default. This is because OpenSearch batches writes internally and a larger bulk request amortises per-request overhead. Tuned settings still win at 70 ms, but the gap narrows because at 2000 documents the segment flushing cost is relatively smaller per document. The lesson: batch size matters, and tuned settings improve throughput regardless of batch size.

Node Resource Usage During Tests

curl -u app_user:YOUR_PASSWORD -k \
  "https://YOUR_OPENSEARCH_HOST:9200/_nodes/stats/jvm,os" | \
  jq '.nodes | to_entries[0].value | {heap_used_percent: .jvm.mem.heap_used_percent, heap_used_mb: (.jvm.mem.heap_used_in_bytes / 1048576 | round), heap_max_mb: (.jvm.mem.heap_max_in_bytes / 1048576 | round), cpu_percent: .os.cpu.percent}'

During the bulk load tests:

{
  "heap_used_percent": 67,
  "heap_used_mb": 696,
  "heap_max_mb": 1024,
  "cpu_percent": 1
}

Heap was at 67% (696 MB of 1024 MB). CPU was at 1%. The bottleneck on this tier-2 node during small bulk tests was not CPU; it was segment management and translog sync latency, which the tuned settings address directly.

Shard State and Sizing

Inspect the current shard layout:

curl -u app_user:YOUR_PASSWORD -k \
  "https://YOUR_OPENSEARCH_HOST:9200/_cat/shards?v&h=index,shard,docs,store"

Output from the test cluster:

index            shard  docs   store
perf-tuned       0      2000   703.3kb
perf-default     0       500   130.9kb
logs-app-000001  0         5     5.7kb

The UNASSIGNED state for replica shards is expected and correct on a single-node cluster. There is nowhere to assign replicas, so they stay unassigned. This is not an error.

From the shard data: 500 documents consumed 130.9 KB, and 2000 documents consumed 703.3 KB. That gives a rough density of 0.26 KB per document for this test data. For production planning, measure your actual document size. The standard shard sizing guidance is 10 to 50 GB per shard for balanced search and indexing performance.

When to Use Tuned Settings

The tuned settings are appropriate during:

Initial bulk loads before the index goes live.
Nightly reindex operations.
High-throughput log ingestion windows where you accept the small durability tradeoff.

Restore default settings after the bulk operation:

curl -u app_user:YOUR_PASSWORD -k \
  -X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-tuned/_settings" \
  -H "Content-Type: application/json" \
  -d '{
    "index": {
      "refresh_interval": "1s",
      "translog.durability": "request"
    }
  }'

This can also be done dynamically (no restart required). For log analytics indices that are append-only and where near-real-time search is acceptable, keeping refresh_interval at 5s or 10s permanently is a reasonable production configuration.

Force Merge for Read-Only Indices

After an index stops receiving writes (for example, after rollover in an ISM policy), reduce its segment count to improve read performance and reduce memory usage.

We tested this by creating an index and ingesting 300 documents in 3 separate batches (with a refresh between each) to produce multiple segments:

curl -u app_user:YOUR_PASSWORD -k \
  "https://YOUR_OPENSEARCH_HOST:9200/_cat/segments/merge-test?v&h=index,shard,segment,docs.count,size"

Before force merge (3 segments):

index      shard segment docs.count   size
merge-test 0     _0             100 25.7kb
merge-test 0     _1             100 25.7kb
merge-test 0     _2             100 25.7kb

Run force merge:

curl -u app_user:YOUR_PASSWORD -k \
  -X POST "https://YOUR_OPENSEARCH_HOST:9200/merge-test/_forcemerge?max_num_segments=1"

Response: {"_shards": {"total": 1, "successful": 1, "failed": 0}}

After force merge and refresh (1 segment):

index      shard segment docs.count   size
merge-test 0     _3             300 70.4kb

Three segments (25.7 KB each, 77.1 KB total) merged into one (70.4 KB), saving roughly 9% in storage. More importantly, search now scans one segment instead of three, reducing per-query overhead. Do not run force merge on active write indices because it fights the natural merge policy and can cause significant I/O load. Reserve it for read-only historical indices.

Summary

Setting	Default	Tuned	When to use tuned
`refresh_interval`	1s	30s	Bulk loads, log ingestion
`translog.durability`	request	async	Bulk loads (accept small data loss risk)
`translog.sync_interval`	5s	10s	Paired with async durability

The refresh interval is the single most impactful setting for write throughput. A 30x increase (1s to 30s) delivered a 75% speedup on 500 documents in these tests.

Provision a FoundryDB OpenSearch cluster at foundrydb.com. Documentation at docs.foundrydb.com.

Understanding the Defaults​

Baseline: Default Settings​

Tuned Settings​

2000-Document Comparison​

Node Resource Usage During Tests​

Shard State and Sizing​

When to Use Tuned Settings​

Force Merge for Read-Only Indices​

Summary​