Bulk Data Loading in Babelfish on FoundryDB
Loading data efficiently is one of the first things you need to sort out when migrating from SQL Server. Babelfish supports several bulk loading patterns natively, but the BULK INSERT statement is not one of them. This post documents what works, what does not, and the best alternatives, all tested live against Babelfish 4.8.0 on PostgreSQL 16.11.
For provisioning instructions, see the Babelfish getting started guide.
Test Table
All examples use the same target table:
CREATE TABLE bulk_target (
id INT IDENTITY(1,1) PRIMARY KEY,
name NVARCHAR(100) NOT NULL,
value DECIMAL(10,2) NOT NULL
);
Multi-Row VALUES Insert
Insert multiple rows in a single INSERT statement using the VALUES clause. This is the simplest way to batch writes.
INSERT INTO bulk_target (name, value) VALUES
('item_1', 10.50),
('item_2', 20.75),
('item_3', 30.00),
('item_4', 40.25),
('item_5', 50.50),
('item_6', 60.75),
('item_7', 70.00),
('item_8', 80.25),
('item_9', 90.50),
('item_10', 100.00);
Result: 10 rows inserted in a single round trip. This is the recommended approach for small to medium batches (up to a few hundred rows per statement).
INSERT...SELECT from a Source Table
For larger loads, populate a staging table first, then move data with INSERT...SELECT:
-- Create and populate a source table
CREATE TABLE bulk_source (
id INT IDENTITY(1,1) PRIMARY KEY,
name NVARCHAR(100),
value DECIMAL(10,2)
);
-- Populate 1000 rows using a WHILE loop
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO bulk_source (name, value)
VALUES ('source_' + CAST(@i AS NVARCHAR(10)), @i * 1.5);
SET @i = @i + 1;
END;
-- Bulk copy into target
INSERT INTO bulk_target (name, value)
SELECT name, value FROM bulk_source;
Result: 1000 rows inserted. The INSERT...SELECT executes server-side, so there is no per-row network round trip.
SELECT INTO (Create Table from Query)
SELECT INTO creates a new table and populates it in one operation. Useful for creating materialized snapshots or transforming data:
SELECT TOP 100 name, value
INTO bulk_snapshot
FROM bulk_source
WHERE value > 50.0;
Result: 100 rows copied into the new bulk_snapshot table. The table schema is inferred from the query columns.
WHILE Loops for Procedural Generation
T-SQL WHILE loops work for generating test data or running procedural inserts:
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO bulk_target (name, value)
VALUES ('generated_' + CAST(@i AS NVARCHAR(10)), RAND() * 1000);
SET @i = @i + 1;
END;
Result: 1000 rows inserted. This is the slowest approach since each iteration is a separate insert, but it works reliably for test data and one-time loads.
Performance Comparison
Batching rows into multi-row VALUES statements provides a measurable speedup over individual inserts:
| Method | Rows | Time | Throughput |
|---|---|---|---|
| 100 individual INSERT statements | 100 | 10ms | ~10,000 rows/sec |
| 10 batches of 10 rows via VALUES | 100 | 4ms | ~25,000 rows/sec |
Multi-row VALUES is 2.5x faster than individual inserts for the same row count. The improvement comes from reduced round trips and statement parsing overhead. For larger loads, combine multi-row VALUES with transactions for the best throughput.
What Does Not Work: BULK INSERT
The BULK INSERT T-SQL statement is not supported in Babelfish 4.8.0:
BULK INSERT bulk_target
FROM '/tmp/data.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n');
Error:
invalid insert bulk option FIELDTERMINATOR
This is a known limitation. BULK INSERT relies on SQL Server internals for reading files from the server filesystem, which Babelfish does not implement.
Workarounds
BCP (Bulk Copy Program). The bcp command-line tool works with Babelfish because it uses the TDS protocol directly, bypassing the BULK INSERT SQL statement entirely:
bcp bulk_target in data.csv -S your-service.foundrydb.com,1433 \
-U your_user -P your_password \
-d master -c -t "," -r "\n"
BCP streams rows over TDS using the bulk insert protocol, which Babelfish handles at the wire level.
pymssql executemany. For programmatic loads from Python, executemany batches parameterized inserts over a single TDS connection:
import pymssql
conn = pymssql.connect(
server='your-service.foundrydb.com',
port=1433,
user='your_user',
password='your_password',
database='master'
)
cursor = conn.cursor()
rows = [(f'item_{i}', i * 1.5) for i in range(1, 1001)]
cursor.executemany(
"INSERT INTO bulk_target (name, value) VALUES (%s, %d)",
rows
)
conn.commit()
Tested result: 1000 rows in approximately 17 seconds (~60 rows/sec). This is slower than server-side approaches like INSERT...SELECT, but it works well for loading data from external sources where the data originates in application code.
Other options. Any SQL Server driver that supports parameterized batch execution works the same way. The Node.js mssql package, Java JDBC addBatch/executeBatch, and .NET SqlBulkCopy all communicate over TDS and are compatible with Babelfish. See the language drivers guide for driver setup details.
Summary
| Method | Supported | Best For |
|---|---|---|
| Multi-row VALUES | Yes | Small to medium batches (10-500 rows) |
| INSERT...SELECT | Yes | Server-side data movement, staging tables |
| SELECT INTO | Yes | Creating new tables from queries |
| WHILE loop | Yes | Test data generation, procedural inserts |
| BULK INSERT (SQL) | No | Use BCP or driver-level batching instead |
| BCP (command line) | Yes | Large file imports via TDS protocol |
| pymssql executemany | Yes | Programmatic loads from Python |
For most migration scenarios, the combination of multi-row VALUES for small batches and INSERT...SELECT for large server-side operations covers the majority of use cases. When you need to load data from external files, BCP is the most direct replacement for BULK INSERT.
Ready to try Babelfish? Create a FoundryDB instance and connect on port 1433 in under five minutes.