Indexes and Query Performance in Babelfish on FoundryDB
Babelfish translates T-SQL to PostgreSQL internally. For most index operations and query tuning work, this is transparent: you write the same CREATE INDEX syntax you would use on SQL Server, and Babelfish creates the corresponding PostgreSQL index. Some SQL Server performance tooling does not map cleanly to the PostgreSQL layer, and it is useful to know where the boundaries are before you start profiling.
This post documents index behaviour and query performance tooling in Babelfish 4.8.0, based on confirmed results from live testing on a FoundryDB staging instance. All results use a 10,000-row table tested against PostgreSQL 16.11.
Test Setup
10,000 rows were inserted into a perf_orders table using a T-SQL WHILE loop:
CREATE TABLE perf_orders (
order_id INT IDENTITY(1,1) PRIMARY KEY,
customer_id INT NOT NULL,
status NVARCHAR(20) NOT NULL,
amount DECIMAL(10,2) NOT NULL,
region NVARCHAR(50) NOT NULL,
order_date DATETIME2 NOT NULL DEFAULT GETDATE()
);
Row distribution confirmed via SELECT:
status cnt avg_amount
------------ ------ ----------
completed 5500 4996.42
pending 2500 4988.71
shipped 1000 4954.44
cancelled 1000 5071.51
Creating Indexes
All five of the following index types were created without error:
-- Single column
CREATE INDEX idx_status ON perf_orders(status);
CREATE INDEX idx_customer ON perf_orders(customer_id);
-- Composite
CREATE INDEX idx_region_status ON perf_orders(region, status);
-- Descending
CREATE INDEX idx_amount ON perf_orders(amount DESC);
-- Covering index with INCLUDE
CREATE INDEX idx_status_amount_region ON perf_orders(status, region)
INCLUDE (amount, order_date);
Inspecting Indexes via sys.indexes
SELECT i.name, i.type_desc
FROM sys.indexes i
WHERE OBJECT_NAME(i.object_id) = 'perf_orders' AND i.name IS NOT NULL
ORDER BY i.name;
Output:
idx_amount NONCLUSTERED
idx_customer NONCLUSTERED
idx_region_status NONCLUSTERED
idx_status NONCLUSTERED
idx_status_amount_region NONCLUSTERED
perf_orders_pkey NONCLUSTERED
All indexes, including the primary key, are reported as NONCLUSTERED. This is expected behaviour. Babelfish maps SQL Server's heap-plus-clustered-index model to PostgreSQL heap tables, where all indexes are non-clustered by definition. There is no CLUSTERED index type in Babelfish. The primary key is enforced by a unique index on the heap, not by physical row ordering.
If your SQL Server schema relies on clustered index scan order for correctness (not just for a performance hint), verify your queries include an explicit ORDER BY clause before migrating.
Covering Index Query
SELECT region, COUNT(*) AS completed, SUM(amount) AS revenue
FROM perf_orders
WHERE status = 'completed'
GROUP BY region
ORDER BY revenue DESC;
Result:
region completed revenue
-------- ---------- -----------
West 1500 7590187.29
East 1500 7482057.95
North 1500 7461841.02
South 1000 4946242.24
The idx_status_amount_region index covers both the filter column (status) and the projected columns (amount, region), so PostgreSQL can satisfy this query from the index alone without a heap fetch.
Query Hints
WITH (NOLOCK)
SELECT COUNT(*) FROM perf_orders WITH (NOLOCK) WHERE status = 'completed';
-- Returns: 5500
NOLOCK is accepted and maps to READ UNCOMMITTED isolation in PostgreSQL. On PostgreSQL, READ UNCOMMITTED behaves identically to READ COMMITTED (PostgreSQL does not implement dirty reads), so WITH (NOLOCK) in Babelfish provides no performance benefit beyond avoiding explicit lock waits. The hint is accepted without error and the query returns correct results.
OPTION (MAXDOP 1)
SELECT status, AVG(amount) FROM perf_orders GROUP BY status OPTION (MAXDOP 1);
Accepted and executes correctly. PostgreSQL does not have an equivalent parallelism degree hint at the query level, so the hint is parsed but does not alter the query plan.
WITH (INDEX(...))
SELECT COUNT(*) FROM perf_orders WITH (INDEX(idx_status)) WHERE status = 'pending';
-- Returns: 2500
Index hints are accepted and the query executes correctly. Whether PostgreSQL honours the specific index choice in the plan depends on the planner's cost estimates.
SET STATISTICS IO and SET STATISTICS TIME
Both statements are accepted without error:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
Neither produces output in the TDS result stream. On SQL Server these produce messages in the Messages tab. In Babelfish, the statements are parsed but no I/O or timing data is appended to the response. Your query executes normally; you simply do not get the statistics output.
For I/O and timing analysis, connect on port 5432 using a PostgreSQL client and use EXPLAIN (ANALYZE, BUFFERS) on the translated query. This operates on the actual PostgreSQL query plan.
Available DMVs
The following system views were confirmed present by querying sys.all_objects:
sys.dm_exec_connectionssys.dm_exec_sessionssys.dm_hadr_clustersys.dm_hadr_database_replica_statessys.dm_os_host_infosys.dm_os_sys_info
The following were confirmed absent (relation does not exist error when queried):
sys.dm_exec_query_statssys.dm_os_wait_statssys.dm_exec_cached_planssys.dm_db_partition_stats
If your monitoring or DBA tooling relies on dm_exec_query_stats for top-query analysis or dm_os_wait_stats for wait analysis, those queries will not work on the TDS port.
Using PostgreSQL Tooling for Performance Analysis
Because Babelfish is PostgreSQL underneath, the full PostgreSQL performance toolkit is available when you connect on port 5432. The most useful views for index and query performance work are:
pg_stat_user_indexes: index scan counts, showing which indexes are actually being usedpg_statio_user_tables: heap and index I/O statistics per tablepg_stat_statements: aggregated query statistics including total time and row counts (requires the extension, which is typically enabled by default on managed instances)EXPLAIN ANALYZE: per-operator timing and row estimates for any query
Connect with psql or any PostgreSQL driver, using the PostgreSQL port (5432) and your standard database credentials. The babelfish_db database contains the translated schema under the dbo schema for the default SQL Server database.
Index Design Guidance
Do: Use CREATE INDEX with the same syntax as SQL Server. Single-column, composite, descending, and covering indexes with INCLUDE all work.
Do: Use sys.indexes to verify indexes were created. It is the reliable inventory of what exists.
Do: Add explicit ORDER BY to queries that depend on row ordering, since there is no clustered index row order to rely on.
Do: Use EXPLAIN ANALYZE via port 5432 for detailed query plan analysis when SET STATISTICS IO output is needed.
Do not: Expect sys.dm_exec_query_stats, sys.dm_os_wait_stats, or sys.dm_exec_cached_plans to be available over TDS.
Do not: Rely on SET STATISTICS IO ON producing output in your TDS client.
Get Started
Provision a Babelfish instance on FoundryDB. The service is production-ready in under five minutes, and your existing SQL Server tooling connects without driver changes.