Skip to content

Progress not reporting correct number of rows #336

@mhkeller

Description

@mhkeller

What happens?

I first reported this on the Node Neo client but was told it would be better to file here.


When logging the progress of an import query, the number of rows doesn't correspond to the rows in the file. If the rows remain the same but the file size changes, the total_rows_to_process changes, which makes me think this number is related to the bytes being processed, not rows. However, I can't figure out how this number relates to the file size. It seems to be 1/60th of the actual file size.

Is the key name here incorrect and it should be something like total_bytes_to_process?

To Reproduce

  1. Clone the reproduction repo https://github.com/mhkeller/duckdb-import-issue
  2. Install dependencies with npm i or npm i
  3. Run the import with npm start or pnpm start

The file being imported is 100,000 rows but the logs come out as

{ percentage: 0, rows_processed: 0n, total_rows_to_process: 283628n }
{
  percentage: 21.97371101680997,
  rows_processed: 62323n,
  total_rows_to_process: 283628n
}
{
  percentage: 48.238472192600014,
  rows_processed: 136817n,
  total_rows_to_process: 283628n
}
{
  percentage: 75.35031028518631,
  rows_processed: 213714n,
  total_rows_to_process: 283628n
}
{
  percentage: 96.18581219394662,
  rows_processed: 272809n,
  total_rows_to_process: 283628n
}
{
  percentage: 99.99929485100202,
  rows_processed: 283626n,
  total_rows_to_process: 283628n
}
{
  percentage: 99.99929485100202,
  rows_processed: 283626n,
  total_rows_to_process: 283628n
}
{
  percentage: 99.99929485100202,
  rows_processed: 283626n,
  total_rows_to_process: 283628n
}

OS:

macOS 15.7.1 arm64

DuckDB Version:

"@duckdb/node-api": "1.4.1-r.4"

DuckDB Client:

NodeJS

Hardware:

No response

Full Name:

Michael Keller

Affiliation:

Self

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant data sets for reproducing the issue?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions