OpenSearch Performance Tuning: Refresh Intervals, Bulk Sizing, and Shard Analysis
Getting good write throughput from OpenSearch requires understanding three things: the refresh cycle, translog durability, and shard sizing. This post benchmarks bulk indexing with default and tuned settings on a live OpenSearch 2.19.1 cluster managed by FoundryDB. The cluster is a single-node tier-2 (2 CPU, 4 GB RAM), which is the smallest configuration available. Numbers from larger nodes will be better, but the ratios between default and tuned settings hold.
All commands use YOUR_OPENSEARCH_HOST and YOUR_PASSWORD as placeholders.
Understanding the Defaults
OpenSearch refreshes segments every 1 second by default. A refresh makes new documents searchable but has a cost: it flushes in-memory buffers to disk-readable segments, which takes CPU and I/O. During high-ingest windows, 1-second refresh intervals mean the cluster spends significant time managing segments instead of writing data.
The translog (translog.durability: request) syncs to disk after every index operation by default. This guarantees durability but adds latency on every write. For bulk loads where you accept that a machine failure might lose the last few seconds of data, async translog sync is a reasonable trade.
Baseline: Default Settings
Create an index with default settings and bulk-ingest 500 documents:
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-default" \
-H "Content-Type: application/json" \
-d '{"settings": {"number_of_shards": 1, "number_of_replicas": 0}}'
# Generate a 500-document bulk payload and POST it
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/perf-default/_bulk" \
-H "Content-Type: application/x-ndjson" \
--data-binary @bulk-500.ndjson | jq '{took, errors}'
Result: {"took": 218, "errors": false}
The same 500-document bulk ingest took 218 ms with default settings.
Tuned Settings
Apply three changes to a new index:
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-tuned" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"refresh_interval": "30s",
"translog.durability": "async",
"translog.sync_interval": "10s"
}
}'
Run the same 500-document bulk ingest against the tuned index:
Result: {"took": 54, "errors": false}
Tuned settings: 54 ms. That is a 75% improvement (218 ms to 54 ms) on the same hardware for the same data.
2000-Document Comparison
| Documents | Default | Tuned | Improvement |
|---|---|---|---|
| 500 | 218 ms | 54 ms | 75% faster |
| 2000 | 116 ms | 70 ms | 40% faster |
The 2000-document result is interesting: the default took 116 ms, which is lower than the 500-document default. This is because OpenSearch batches writes internally and a larger bulk request amortises per-request overhead. Tuned settings still win at 70 ms, but the gap narrows because at 2000 documents the segment flushing cost is relatively smaller per document. The lesson: batch size matters, and tuned settings improve throughput regardless of batch size.
Node Resource Usage During Tests
curl -u app_user:YOUR_PASSWORD -k \
"https://YOUR_OPENSEARCH_HOST:9200/_nodes/stats/jvm,os" | \
jq '.nodes | to_entries[0].value | {heap_used_percent: .jvm.mem.heap_used_percent, heap_used_mb: (.jvm.mem.heap_used_in_bytes / 1048576 | round), heap_max_mb: (.jvm.mem.heap_max_in_bytes / 1048576 | round), cpu_percent: .os.cpu.percent}'
During the bulk load tests:
{
"heap_used_percent": 67,
"heap_used_mb": 696,
"heap_max_mb": 1024,
"cpu_percent": 1
}
Heap was at 67% (696 MB of 1024 MB). CPU was at 1%. The bottleneck on this tier-2 node during small bulk tests was not CPU; it was segment management and translog sync latency, which the tuned settings address directly.
Shard State and Sizing
Inspect the current shard layout:
curl -u app_user:YOUR_PASSWORD -k \
"https://YOUR_OPENSEARCH_HOST:9200/_cat/shards?v&h=index,shard,docs,store"
Output from the test cluster:
index shard docs store
perf-tuned 0 2000 703.3kb
perf-default 0 500 130.9kb
logs-app-000001 0 5 5.7kb
The UNASSIGNED state for replica shards is expected and correct on a single-node cluster. There is nowhere to assign replicas, so they stay unassigned. This is not an error.
From the shard data: 500 documents consumed 130.9 KB, and 2000 documents consumed 703.3 KB. That gives a rough density of 0.26 KB per document for this test data. For production planning, measure your actual document size. The standard shard sizing guidance is 10 to 50 GB per shard for balanced search and indexing performance.
When to Use Tuned Settings
The tuned settings are appropriate during:
- Initial bulk loads before the index goes live.
- Nightly reindex operations.
- High-throughput log ingestion windows where you accept the small durability tradeoff.
Restore default settings after the bulk operation:
curl -u app_user:YOUR_PASSWORD -k \
-X PUT "https://YOUR_OPENSEARCH_HOST:9200/perf-tuned/_settings" \
-H "Content-Type: application/json" \
-d '{
"index": {
"refresh_interval": "1s",
"translog.durability": "request"
}
}'
This can also be done dynamically (no restart required). For log analytics indices that are append-only and where near-real-time search is acceptable, keeping refresh_interval at 5s or 10s permanently is a reasonable production configuration.
Force Merge for Read-Only Indices
After an index stops receiving writes (for example, after rollover in an ISM policy), reduce its segment count to improve read performance and reduce memory usage.
We tested this by creating an index and ingesting 300 documents in 3 separate batches (with a refresh between each) to produce multiple segments:
curl -u app_user:YOUR_PASSWORD -k \
"https://YOUR_OPENSEARCH_HOST:9200/_cat/segments/merge-test?v&h=index,shard,segment,docs.count,size"
Before force merge (3 segments):
index shard segment docs.count size
merge-test 0 _0 100 25.7kb
merge-test 0 _1 100 25.7kb
merge-test 0 _2 100 25.7kb
Run force merge:
curl -u app_user:YOUR_PASSWORD -k \
-X POST "https://YOUR_OPENSEARCH_HOST:9200/merge-test/_forcemerge?max_num_segments=1"
Response: {"_shards": {"total": 1, "successful": 1, "failed": 0}}
After force merge and refresh (1 segment):
index shard segment docs.count size
merge-test 0 _3 300 70.4kb
Three segments (25.7 KB each, 77.1 KB total) merged into one (70.4 KB), saving roughly 9% in storage. More importantly, search now scans one segment instead of three, reducing per-query overhead. Do not run force merge on active write indices because it fights the natural merge policy and can cause significant I/O load. Reserve it for read-only historical indices.
Summary
| Setting | Default | Tuned | When to use tuned |
|---|---|---|---|
refresh_interval | 1s | 30s | Bulk loads, log ingestion |
translog.durability | request | async | Bulk loads (accept small data loss risk) |
translog.sync_interval | 5s | 10s | Paired with async durability |
The refresh interval is the single most impactful setting for write throughput. A 30x increase (1s to 30s) delivered a 75% speedup on 500 documents in these tests.
Provision a FoundryDB OpenSearch cluster at foundrydb.com. Documentation at docs.foundrydb.com.