Skip to main content

Monitoring & Alerts

Metrics API

Query metrics for any service:

curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/metrics?metric=cpu&period=1h"

Parameters:

ParameterDescriptionExample
metricMetric name (see below)cpu, memory, connections
periodTime range15m, 1h, 6h, 24h, 7d
resolutionData point interval1m, 5m, 1h

Common Metrics

All engines

MetricDescription
cpuCPU utilisation (%)
memoryMemory used (%)
diskDisk used (%)
disk_iopsDisk IOPS
connectionsActive connections
network_inNetwork bytes received
network_outNetwork bytes sent

PostgreSQL

MetricDescription
pg_connectionsActive / idle / waiting connections
pg_transactions_per_secondCommits + rollbacks per second
pg_cache_hit_rateBuffer cache hit ratio (target >99%)
pg_replication_lag_secondsReplica lag in seconds
pg_locksActive lock count
pg_deadlocksDeadlocks per minute
pg_slow_queriesQueries exceeding log_min_duration_statement

MySQL

MetricDescription
mysql_queries_per_secondTotal QPS
mysql_innodb_buffer_pool_hit_rateBuffer pool efficiency (target >99%)
mysql_replication_lag_secondsReplica lag
mysql_open_filesOpen file handles

MongoDB

MetricDescription
mongodb_ops_per_secondOperations per second by type
mongodb_replication_lag_secondsReplica set lag
mongodb_wiredtiger_cache_usedWiredTiger cache utilisation
mongodb_connectionsActive connections

Valkey

MetricDescription
valkey_used_memoryMemory used (bytes)
valkey_keyspace_hitsSuccessful key lookups
valkey_keyspace_missesCache misses
valkey_evicted_keysKeys evicted due to maxmemory
valkey_connected_clientsConnected clients

Kafka

MetricDescription
kafka_messages_in_per_secInbound message rate
kafka_bytes_in_per_secInbound throughput
kafka_bytes_out_per_secOutbound throughput
kafka_under_replicated_partitionsPartitions not fully replicated (should be 0)
kafka_consumer_lagMessages behind for a consumer group

Alerts

Create an alert rule

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/alerts/rules \
-H "Content-Type: application/json" \
-d '{
"metric": "cpu",
"condition": "gt",
"threshold": 80,
"duration_minutes": 5,
"severity": "warning",
"notification_channel_id": "channel_abc"
}'
FieldValues
conditiongt (above), lt (below)
severityinfo, warning, critical
duration_minutesHow long the condition must persist before firing

List rules

curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/alerts/rules

Delete a rule

curl -u admin:password -X DELETE \
https://api.foundrydb.com/managed-services/{id}/alerts/rules/{rule_id}

Notification Channels

Alerts can be sent to multiple channels.

Create a webhook channel

curl -u admin:password -X POST \
https://api.foundrydb.com/alerts/channels \
-H "Content-Type: application/json" \
-d '{
"name": "Slack Production",
"type": "webhook",
"config": {"url": "https://hooks.slack.com/services/..."}
}'

Create an email channel

curl -u admin:password -X POST \
https://api.foundrydb.com/alerts/channels \
-H "Content-Type: application/json" \
-d '{
"name": "On-call",
"type": "email",
"config": {"address": "oncall@example.com"}
}'

Supported channel types

TypeDescription
emailEmail notification
webhookHTTP POST to any URL (Slack, PagerDuty, etc.)

Query Statistics

For PostgreSQL, real-time query stats are available:

curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/metrics/query-stats?limit=20&order=total_time"

Returns the top queries by total execution time, including: calls, mean time, rows, cache hit rate.

Use this to identify slow queries before they become a problem.

Query Statistics (Full Guide)

Query statistics are available for PostgreSQL and MySQL services. For PostgreSQL the data comes from the pg_stat_statements extension. For MySQL it is collected from the slow query log and the performance_schema digest tables on the primary node.

How it works

Collection is asynchronous. First, POST to request a collection task. Then poll the GET endpoint until the task completes.

Step 1: Request collection

# Collect top 20 queries sorted by total execution time (default)
curl -u admin:password -X POST \
"https://api.foundrydb.com/managed-services/{id}/query-stats?limit=20&sort_by=total_time"
# Returns: {"task_id": "b2c3d4e5-..."}

Step 2: Poll for results

curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/query-stats?task_id=b2c3d4e5-..."
# Returns 202 while in progress, 200 when complete

Fields returned

Each entry in the queries array contains:

FieldTypeDescription
querystringNormalized query text (parameters replaced with $1, ?, etc.)
callsintegerTotal number of executions since last reset
total_timefloat (ms)Total cumulative execution time across all calls
mean_timefloat (ms)Average execution time per call
rowsintegerTotal rows returned or affected across all calls
cache_hit_ratiofloat (0-1)Shared block cache hit ratio (PostgreSQL only; null for MySQL)

The response envelope also includes total_count (number of queries returned), collected_at (UTC timestamp of collection), and database_type.

Sorting options

Pass sort_by as a query parameter when requesting collection:

ValueUse case
total_timeQueries consuming the most cumulative database time (default)
callsMost frequently executed queries, regardless of speed
mean_timeSlowest queries on average (catches infrequent but expensive queries)

Resetting statistics

There is no dedicated API endpoint to reset query statistics. To reset pg_stat_statements on a PostgreSQL service, connect as a superuser and run:

SELECT pg_stat_statements_reset();

On MySQL, the performance_schema digest tables reset automatically at server restart. You can also reset them manually:

TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;

After a reset, all counters start from zero. This is useful after a schema change or deployment so that you are measuring only the new workload.

Identifying N+1 queries

N+1 patterns show up as a query with a very high calls count relative to the expected request volume, a low or moderate mean_time, but a very large total_time. Look for queries of the form SELECT ... WHERE id = $1 that are executed thousands of times per minute. The fix is usually to add a batch-loading step (e.g. WHERE id = ANY($1)) or an ORM eager-load option.

Identifying missing indexes

Sort by mean_time and look for queries with high mean execution time but low row counts. A sequential scan on a large table with a low selectivity predicate will appear here. Confirm with EXPLAIN ANALYZE and add an appropriate index. On PostgreSQL you can also query pg_stat_user_tables for tables with high seq_scan counts alongside your query stats to correlate the two.

Exporting Metrics and Logs

Metrics and logs collected by FoundryDB can be pushed continuously to external observability platforms. This lets you consolidate database telemetry alongside your application infrastructure in the tools your team already uses.

Supported destinations are: Datadog, Prometheus Remote Write (Grafana Cloud, Thanos, Cortex, VictoriaMetrics), Generic OTLP (Grafana Cloud, Honeycomb, any OpenTelemetry collector), AWS CloudWatch, Elasticsearch / OpenSearch, BetterStack, and Grafana Loki. Each integration can export metrics, logs, or both, and runs on a configurable interval (default 60 seconds).

To set up an export, go to the Integrations page in the dashboard or use the API. You can create one integration per destination per service, or a single global integration that covers all services. The example below creates a Datadog export via the API:

curl -u admin:password -X POST \
https://api.foundrydb.com/api/v1/metrics-exports \
-H "Content-Type: application/json" \
-d '{
"service_id": "{service-id}",
"name": "Datadog Production",
"destination_type": "datadog",
"data_type": "both",
"export_interval_seconds": 60,
"configuration": {
"api_key": "YOUR_DATADOG_API_KEY",
"site": "datadoghq.com"
}
}'

For Grafana Loki, Prometheus Remote Write, and OTLP destinations, see the full configuration reference on the Integrations page.