Skip to content

COPY ... TO S3 ... (FORMAT PARQUET, PARTITION_BY, RETURN_FILES) returns no rows in Node Neo, while local COPY returns Count/Files #345

@evgenybro

Description

@evgenybro

Environment

  • DuckDB version: v1.4.2

  • Client: @duckdb/node-api (Node Neo)

  • Platform: Node.js (TypeScript)

Summary

When running COPY ... TO ... (FORMAT PARQUET, RETURN_FILES) from Node Neo:

  • A COPY to a local Parquet file returns a row with Count and Files as expected.

  • A COPY to S3 (even without PARTITION_BY) returns no rows at all (runAndReadAll yields an empty result set), despite RETURN_FILES being specified and the files being successfully written.

This makes it impossible to discover the created S3 Parquet file paths from Node without doing additional S3 I/O.

Minimal reproduction

1. Local COPY – works as expected

import duckdb from '@duckdb/node-api';

async function testLocalCopy(connection: duckdb.DuckDBConnection) {
  const reader = await connection.runAndReadAll(`
    COPY (SELECT 42 AS value)
      TO 'local_out.parquet'
      (FORMAT PARQUET, RETURN_FILES)
  `);

  console.log('names+types:', reader.columnNamesAndTypesJson());
  console.log('rows:', reader.getRowObjectsJson());
}

Actual output (simplified):

names+types: {
"columnNames": ["Count", "Files"],
"columnTypes": [
{ "typeId": 5 },
{ "typeId": 24, "valueType": { "typeId": 1 } }
]
}
rows: [
{ "Count": "1", "Files": ["local_out.parquet"] }
]

So RETURN_FILES is clearly working and exposed to Node Neo for local file targets: we get a Count and Files column as documented.

2. S3 COPY – no rows (with or without partitioning)

Now switch to S3 (same Node code pattern, different SQL). Example without PARTITION_BY:

async function testS3Copy(connection: duckdb.DuckDBConnection) {
  const reader = await connection.runAndReadAll(`
    COPY some_table TO 's3://my-bucket/demo-prefix/out.parquet' (
      FORMAT PARQUET,
      COMPRESSION 'SNAPPY',
      RETURN_FILES
    )
  `);

  console.log('names+types:', reader.columnNamesAndTypesJson());
  console.log('rows:', reader.getRowObjectsJson());
}

some_table is a regular table with data, and S3 + HTTPFS configuration is working correctly (the Parquet file is created in the bucket).

Actual output:


names+types: {
  "columnNames": [],
  "columnTypes": []
}
rows: []

I see the same behavior if I write multiple files to an S3 prefix (e.g. using FILENAME_PATTERN or PARTITION_BY): the data lands in S3, but runAndReadAll still returns an empty result set, so there is no Count/Files row available.

Expected behavior

Given that:

The docs for RETURN_FILES state it “includes the created filepath(s) (as a files VARCHAR[] column) in the query result”, and

The local COPY example shows a Count and Files row being returned via Node Neo,

I would expect the S3 COPY to behave the same way, i.e.:

runAndReadAll should return a result with at least one row containing something like:

{ "Count": "<n>", "Files": ["s3://my-bucket/demo-prefix/out.parquet", ...] }

instead of an empty result set.

Actual behavior

For the S3 COPY (with or without PARTITION_BY):

runAndReadAll returns a ResultReader where:

columnNamesAndTypesJson() has no Count/Files columns, and

getRowObjectsJson() is [].

However, the files are successfully written to S3.

Questions

Is the absence of RETURN_FILES output for S3 targets a known limitation in DuckDB 1.4.x?

Is this specific to S3/remote destinations, or to certain COPY code paths?

Should Node Neo (and other clients) always receive the Count/Files row for a successful COPY ... RETURN_FILES regardless of whether the target is local or S3?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions