Skip to content

Conversation

@OussamaSaoudi
Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi commented Nov 19, 2025

Comment on lines 21 to 25
/// # Distributability
///
/// This phase is designed to be distributable. To distribute:
/// 1. Partition `files` across N executors
/// 2. Create N `LeafCheckpointReader` instances, one per executor with its file partition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd place to bring up distributability? The log replay phase that consumes this reader is the thing getting distributed, and which needs the careful choreography? Seems like this reader is just a detail in that bigger picture.

let actions = engine
.parquet_handler()
.read_parquet_files(&files, schema, None)?
.map(|batch| batch.map(|b| ActionsBatch::new(b, false)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're splitting up the phases instead of chaining everything into one big iterator, we should probably reconsider whether we still need these (batch, bool) pairs -- I suspect that whoever invokes the dedup visitor will now know, structurally, whether it's a log or checkpoint batch, and can just pass true or false accordingly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agreed. This will likely be a followup of the Deduplicator split. Then we'll just apply the correct deduplicator to the right stream.

Comment on lines 104 to 107
if log_segment.checkpoint_parts.is_empty() {
println!("Test table has no checkpoint parts, skipping");
return Ok(());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very strange to have in a test?? Either the test case has a checkpoint or it doesn't.

And in this case it should have a checkpoint, and panicking at L110 below is a perfectly reasonable test failure mode if the checkpoint somehow went missing.

ManifestPhase::new(manifest_file, log_segment.log_root.clone(), engine.clone())?;

// Drain manifest phase and apply processor
for batch in manifest_phase.by_ref() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: TIL that Iterator::by_ref is a thing!

Comment on lines 118 to 119
let batch = batch?;
processor.process_actions_batch(batch)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let batch = batch?;
processor.process_actions_batch(batch)?;
processor.process_actions_batch(batch?)?;

(not sure which way is more readable?)

let mut sidecar_file_paths = Vec::new();
let mut batch_count = 0;

while let Some(result) = sidecar_phase.next() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just

Suggested change
while let Some(result) = sidecar_phase.next() {
for result in sidecar_phase {

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 84.03990% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.77%. Comparing base (72cb27c) to head (8452f9e).

Files with missing lines Patch % Lines
kernel/src/log_replay/deduplicator.rs 31.11% 28 Missing and 3 partials ⚠️
kernel/src/scan/log_replay.rs 90.32% 23 Missing and 4 partials ⚠️
kernel/src/log_reader/checkpoint_manifest.rs 85.25% 8 Missing and 15 partials ⚠️
kernel/src/parallel/sequential_phase.rs 87.79% 10 Missing and 11 partials ⚠️
kernel/src/log_reader/checkpoint_leaf.rs 79.41% 5 Missing and 9 partials ⚠️
kernel/src/log_reader/commit.rs 87.75% 1 Missing and 5 partials ⚠️
kernel/src/utils.rs 85.71% 0 Missing and 3 partials ⚠️
kernel/src/log_replay.rs 60.00% 0 Missing and 2 partials ⚠️
kernel/src/log_segment.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1501    +/-   ##
========================================
  Coverage   84.77%   84.77%            
========================================
  Files         126      131     +5     
  Lines       35679    36411   +732     
  Branches    35679    36411   +732     
========================================
+ Hits        30246    30867   +621     
- Misses       3965     4032    +67     
- Partials     1468     1512    +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@OussamaSaoudi OussamaSaoudi changed the title feat: Leaf Checkpoint Reader feat: CheckpointLeaf file Reader Dec 4, 2025
@OussamaSaoudi OussamaSaoudi mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants