-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Environment
-
DuckDB version: v1.4.2
-
Client: @duckdb/node-api (Node Neo)
-
Platform: Node.js (TypeScript)
Summary
When running COPY ... TO ... (FORMAT PARQUET, RETURN_FILES) from Node Neo:
-
A COPY to a local Parquet file returns a row with Count and Files as expected.
-
A COPY to S3 (even without PARTITION_BY) returns no rows at all (runAndReadAll yields an empty result set), despite RETURN_FILES being specified and the files being successfully written.
This makes it impossible to discover the created S3 Parquet file paths from Node without doing additional S3 I/O.
Minimal reproduction
1. Local COPY – works as expected
import duckdb from '@duckdb/node-api';
async function testLocalCopy(connection: duckdb.DuckDBConnection) {
const reader = await connection.runAndReadAll(`
COPY (SELECT 42 AS value)
TO 'local_out.parquet'
(FORMAT PARQUET, RETURN_FILES)
`);
console.log('names+types:', reader.columnNamesAndTypesJson());
console.log('rows:', reader.getRowObjectsJson());
}
Actual output (simplified):
names+types: {
"columnNames": ["Count", "Files"],
"columnTypes": [
{ "typeId": 5 },
{ "typeId": 24, "valueType": { "typeId": 1 } }
]
}
rows: [
{ "Count": "1", "Files": ["local_out.parquet"] }
]
So RETURN_FILES is clearly working and exposed to Node Neo for local file targets: we get a Count and Files column as documented.
2. S3 COPY – no rows (with or without partitioning)
Now switch to S3 (same Node code pattern, different SQL). Example without PARTITION_BY:
async function testS3Copy(connection: duckdb.DuckDBConnection) {
const reader = await connection.runAndReadAll(`
COPY some_table TO 's3://my-bucket/demo-prefix/out.parquet' (
FORMAT PARQUET,
COMPRESSION 'SNAPPY',
RETURN_FILES
)
`);
console.log('names+types:', reader.columnNamesAndTypesJson());
console.log('rows:', reader.getRowObjectsJson());
}
some_table is a regular table with data, and S3 + HTTPFS configuration is working correctly (the Parquet file is created in the bucket).
Actual output:
names+types: {
"columnNames": [],
"columnTypes": []
}
rows: []
I see the same behavior if I write multiple files to an S3 prefix (e.g. using FILENAME_PATTERN or PARTITION_BY): the data lands in S3, but runAndReadAll still returns an empty result set, so there is no Count/Files row available.
Expected behavior
Given that:
The docs for RETURN_FILES state it “includes the created filepath(s) (as a files VARCHAR[] column) in the query result”, and
The local COPY example shows a Count and Files row being returned via Node Neo,
I would expect the S3 COPY to behave the same way, i.e.:
runAndReadAll should return a result with at least one row containing something like:
{ "Count": "<n>", "Files": ["s3://my-bucket/demo-prefix/out.parquet", ...] }
instead of an empty result set.
Actual behavior
For the S3 COPY (with or without PARTITION_BY):
runAndReadAll returns a ResultReader where:
columnNamesAndTypesJson() has no Count/Files columns, and
getRowObjectsJson() is [].
However, the files are successfully written to S3.
Questions
Is the absence of RETURN_FILES output for S3 targets a known limitation in DuckDB 1.4.x?
Is this specific to S3/remote destinations, or to certain COPY code paths?
Should Node Neo (and other clients) always receive the Count/Files row for a successful COPY ... RETURN_FILES regardless of whether the target is local or S3?