Skip to main content

Bulk Data Loading in Babelfish on FoundryDB

· 5 min read
FoundryDB Team
Engineering @ FoundryDB

Loading data efficiently is one of the first things you need to sort out when migrating from SQL Server. Babelfish supports several bulk loading patterns natively, but the BULK INSERT statement is not one of them. This post documents what works, what does not, and the best alternatives, all tested live against Babelfish 4.8.0 on PostgreSQL 16.11.

For provisioning instructions, see the Babelfish getting started guide.

Test Table

All examples use the same target table:

CREATE TABLE bulk_target (
id INT IDENTITY(1,1) PRIMARY KEY,
name NVARCHAR(100) NOT NULL,
value DECIMAL(10,2) NOT NULL
);

Multi-Row VALUES Insert

Insert multiple rows in a single INSERT statement using the VALUES clause. This is the simplest way to batch writes.

INSERT INTO bulk_target (name, value) VALUES
('item_1', 10.50),
('item_2', 20.75),
('item_3', 30.00),
('item_4', 40.25),
('item_5', 50.50),
('item_6', 60.75),
('item_7', 70.00),
('item_8', 80.25),
('item_9', 90.50),
('item_10', 100.00);

Result: 10 rows inserted in a single round trip. This is the recommended approach for small to medium batches (up to a few hundred rows per statement).


INSERT...SELECT from a Source Table

For larger loads, populate a staging table first, then move data with INSERT...SELECT:

-- Create and populate a source table
CREATE TABLE bulk_source (
id INT IDENTITY(1,1) PRIMARY KEY,
name NVARCHAR(100),
value DECIMAL(10,2)
);

-- Populate 1000 rows using a WHILE loop
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO bulk_source (name, value)
VALUES ('source_' + CAST(@i AS NVARCHAR(10)), @i * 1.5);
SET @i = @i + 1;
END;

-- Bulk copy into target
INSERT INTO bulk_target (name, value)
SELECT name, value FROM bulk_source;

Result: 1000 rows inserted. The INSERT...SELECT executes server-side, so there is no per-row network round trip.


SELECT INTO (Create Table from Query)

SELECT INTO creates a new table and populates it in one operation. Useful for creating materialized snapshots or transforming data:

SELECT TOP 100 name, value
INTO bulk_snapshot
FROM bulk_source
WHERE value > 50.0;

Result: 100 rows copied into the new bulk_snapshot table. The table schema is inferred from the query columns.


WHILE Loops for Procedural Generation

T-SQL WHILE loops work for generating test data or running procedural inserts:

DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO bulk_target (name, value)
VALUES ('generated_' + CAST(@i AS NVARCHAR(10)), RAND() * 1000);
SET @i = @i + 1;
END;

Result: 1000 rows inserted. This is the slowest approach since each iteration is a separate insert, but it works reliably for test data and one-time loads.


Performance Comparison

Batching rows into multi-row VALUES statements provides a measurable speedup over individual inserts:

MethodRowsTimeThroughput
100 individual INSERT statements10010ms~10,000 rows/sec
10 batches of 10 rows via VALUES1004ms~25,000 rows/sec

Multi-row VALUES is 2.5x faster than individual inserts for the same row count. The improvement comes from reduced round trips and statement parsing overhead. For larger loads, combine multi-row VALUES with transactions for the best throughput.


What Does Not Work: BULK INSERT

The BULK INSERT T-SQL statement is not supported in Babelfish 4.8.0:

BULK INSERT bulk_target
FROM '/tmp/data.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n');

Error:

invalid insert bulk option FIELDTERMINATOR

This is a known limitation. BULK INSERT relies on SQL Server internals for reading files from the server filesystem, which Babelfish does not implement.

Workarounds

BCP (Bulk Copy Program). The bcp command-line tool works with Babelfish because it uses the TDS protocol directly, bypassing the BULK INSERT SQL statement entirely:

bcp bulk_target in data.csv -S your-service.foundrydb.com,1433 \
-U your_user -P your_password \
-d master -c -t "," -r "\n"

BCP streams rows over TDS using the bulk insert protocol, which Babelfish handles at the wire level.

pymssql executemany. For programmatic loads from Python, executemany batches parameterized inserts over a single TDS connection:

import pymssql

conn = pymssql.connect(
server='your-service.foundrydb.com',
port=1433,
user='your_user',
password='your_password',
database='master'
)

cursor = conn.cursor()

rows = [(f'item_{i}', i * 1.5) for i in range(1, 1001)]

cursor.executemany(
"INSERT INTO bulk_target (name, value) VALUES (%s, %d)",
rows
)
conn.commit()

Tested result: 1000 rows in approximately 17 seconds (~60 rows/sec). This is slower than server-side approaches like INSERT...SELECT, but it works well for loading data from external sources where the data originates in application code.

Other options. Any SQL Server driver that supports parameterized batch execution works the same way. The Node.js mssql package, Java JDBC addBatch/executeBatch, and .NET SqlBulkCopy all communicate over TDS and are compatible with Babelfish. See the language drivers guide for driver setup details.


Summary

MethodSupportedBest For
Multi-row VALUESYesSmall to medium batches (10-500 rows)
INSERT...SELECTYesServer-side data movement, staging tables
SELECT INTOYesCreating new tables from queries
WHILE loopYesTest data generation, procedural inserts
BULK INSERT (SQL)NoUse BCP or driver-level batching instead
BCP (command line)YesLarge file imports via TDS protocol
pymssql executemanyYesProgrammatic loads from Python

For most migration scenarios, the combination of multi-row VALUES for small batches and INSERT...SELECT for large server-side operations covers the majority of use cases. When you need to load data from external files, BCP is the most direct replacement for BULK INSERT.


Ready to try Babelfish? Create a FoundryDB instance and connect on port 1433 in under five minutes.