Monitoring & Alerts
Metrics API
Query metrics for any service:
curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/metrics?metric=cpu&period=1h"
Parameters:
| Parameter | Description | Example |
|---|---|---|
metric | Metric name (see below) | cpu, memory, connections |
period | Time range | 15m, 1h, 6h, 24h, 7d |
resolution | Data point interval | 1m, 5m, 1h |
Common Metrics
All engines
| Metric | Description |
|---|---|
cpu | CPU utilisation (%) |
memory | Memory used (%) |
disk | Disk used (%) |
disk_iops | Disk IOPS |
connections | Active connections |
network_in | Network bytes received |
network_out | Network bytes sent |
PostgreSQL
| Metric | Description |
|---|---|
pg_connections | Active / idle / waiting connections |
pg_transactions_per_second | Commits + rollbacks per second |
pg_cache_hit_rate | Buffer cache hit ratio (target >99%) |
pg_replication_lag_seconds | Replica lag in seconds |
pg_locks | Active lock count |
pg_deadlocks | Deadlocks per minute |
pg_slow_queries | Queries exceeding log_min_duration_statement |
MySQL
| Metric | Description |
|---|---|
mysql_queries_per_second | Total QPS |
mysql_innodb_buffer_pool_hit_rate | Buffer pool efficiency (target >99%) |
mysql_replication_lag_seconds | Replica lag |
mysql_open_files | Open file handles |
MongoDB
| Metric | Description |
|---|---|
mongodb_ops_per_second | Operations per second by type |
mongodb_replication_lag_seconds | Replica set lag |
mongodb_wiredtiger_cache_used | WiredTiger cache utilisation |
mongodb_connections | Active connections |
Valkey
| Metric | Description |
|---|---|
valkey_used_memory | Memory used (bytes) |
valkey_keyspace_hits | Successful key lookups |
valkey_keyspace_misses | Cache misses |
valkey_evicted_keys | Keys evicted due to maxmemory |
valkey_connected_clients | Connected clients |
Kafka
| Metric | Description |
|---|---|
kafka_messages_in_per_sec | Inbound message rate |
kafka_bytes_in_per_sec | Inbound throughput |
kafka_bytes_out_per_sec | Outbound throughput |
kafka_under_replicated_partitions | Partitions not fully replicated (should be 0) |
kafka_consumer_lag | Messages behind for a consumer group |
Alerts
Create an alert rule
curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/alerts/rules \
-H "Content-Type: application/json" \
-d '{
"metric": "cpu",
"condition": "gt",
"threshold": 80,
"duration_minutes": 5,
"severity": "warning",
"notification_channel_id": "channel_abc"
}'
| Field | Values |
|---|---|
condition | gt (above), lt (below) |
severity | info, warning, critical |
duration_minutes | How long the condition must persist before firing |
List rules
curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/alerts/rules
Delete a rule
curl -u admin:password -X DELETE \
https://api.foundrydb.com/managed-services/{id}/alerts/rules/{rule_id}
Notification Channels
Alerts can be sent to multiple channels.
Create a webhook channel
curl -u admin:password -X POST \
https://api.foundrydb.com/alerts/channels \
-H "Content-Type: application/json" \
-d '{
"name": "Slack Production",
"type": "webhook",
"config": {"url": "https://hooks.slack.com/services/..."}
}'
Create an email channel
curl -u admin:password -X POST \
https://api.foundrydb.com/alerts/channels \
-H "Content-Type: application/json" \
-d '{
"name": "On-call",
"type": "email",
"config": {"address": "oncall@example.com"}
}'
Supported channel types
| Type | Description |
|---|---|
email | Email notification |
webhook | HTTP POST to any URL (Slack, PagerDuty, etc.) |
Query Statistics
For PostgreSQL, real-time query stats are available:
curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/metrics/query-stats?limit=20&order=total_time"
Returns the top queries by total execution time, including: calls, mean time, rows, cache hit rate.
Use this to identify slow queries before they become a problem.
Query Statistics (Full Guide)
Query statistics are available for PostgreSQL and MySQL services. For PostgreSQL the data comes from the pg_stat_statements extension. For MySQL it is collected from the slow query log and the performance_schema digest tables on the primary node.
How it works
Collection is asynchronous. First, POST to request a collection task. Then poll the GET endpoint until the task completes.
Step 1: Request collection
# Collect top 20 queries sorted by total execution time (default)
curl -u admin:password -X POST \
"https://api.foundrydb.com/managed-services/{id}/query-stats?limit=20&sort_by=total_time"
# Returns: {"task_id": "b2c3d4e5-..."}
Step 2: Poll for results
curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/query-stats?task_id=b2c3d4e5-..."
# Returns 202 while in progress, 200 when complete
Fields returned
Each entry in the queries array contains:
| Field | Type | Description |
|---|---|---|
query | string | Normalized query text (parameters replaced with $1, ?, etc.) |
calls | integer | Total number of executions since last reset |
total_time | float (ms) | Total cumulative execution time across all calls |
mean_time | float (ms) | Average execution time per call |
rows | integer | Total rows returned or affected across all calls |
cache_hit_ratio | float (0-1) | Shared block cache hit ratio (PostgreSQL only; null for MySQL) |
The response envelope also includes total_count (number of queries returned), collected_at (UTC timestamp of collection), and database_type.
Sorting options
Pass sort_by as a query parameter when requesting collection:
| Value | Use case |
|---|---|
total_time | Queries consuming the most cumulative database time (default) |
calls | Most frequently executed queries, regardless of speed |
mean_time | Slowest queries on average (catches infrequent but expensive queries) |
Resetting statistics
There is no dedicated API endpoint to reset query statistics. To reset pg_stat_statements on a PostgreSQL service, connect as a superuser and run:
SELECT pg_stat_statements_reset();
On MySQL, the performance_schema digest tables reset automatically at server restart. You can also reset them manually:
TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;
After a reset, all counters start from zero. This is useful after a schema change or deployment so that you are measuring only the new workload.
Identifying N+1 queries
N+1 patterns show up as a query with a very high calls count relative to the expected request volume, a low or moderate mean_time, but a very large total_time. Look for queries of the form SELECT ... WHERE id = $1 that are executed thousands of times per minute. The fix is usually to add a batch-loading step (e.g. WHERE id = ANY($1)) or an ORM eager-load option.
Identifying missing indexes
Sort by mean_time and look for queries with high mean execution time but low row counts. A sequential scan on a large table with a low selectivity predicate will appear here. Confirm with EXPLAIN ANALYZE and add an appropriate index. On PostgreSQL you can also query pg_stat_user_tables for tables with high seq_scan counts alongside your query stats to correlate the two.
Exporting Metrics and Logs
Metrics and logs collected by FoundryDB can be pushed continuously to external observability platforms. This lets you consolidate database telemetry alongside your application infrastructure in the tools your team already uses.
Supported destinations are: Datadog, Prometheus Remote Write (Grafana Cloud, Thanos, Cortex, VictoriaMetrics), Generic OTLP (Grafana Cloud, Honeycomb, any OpenTelemetry collector), AWS CloudWatch, Elasticsearch / OpenSearch, BetterStack, and Grafana Loki. Each integration can export metrics, logs, or both, and runs on a configurable interval (default 60 seconds).
To set up an export, go to the Integrations page in the dashboard or use the API. You can create one integration per destination per service, or a single global integration that covers all services. The example below creates a Datadog export via the API:
curl -u admin:password -X POST \
https://api.foundrydb.com/api/v1/metrics-exports \
-H "Content-Type: application/json" \
-d '{
"service_id": "{service-id}",
"name": "Datadog Production",
"destination_type": "datadog",
"data_type": "both",
"export_interval_seconds": 60,
"configuration": {
"api_key": "YOUR_DATADOG_API_KEY",
"site": "datadoghq.com"
}
}'
For Grafana Loki, Prometheus Remote Write, and OTLP destinations, see the full configuration reference on the Integrations page.