Skip to content

Conversation

@steFaiz
Copy link
Contributor

@steFaiz steFaiz commented Dec 11, 2025

Purpose

The linked issue: #6734

This PR is about to bring SST File VERSION 1

NOTE: current SST File do not have a VERSION, so this version is not compatible with current SST File Format (At least it's hard to do so)

Core improvements

  1. Introduce leveled data index: user can specify a MaxIndexBlockSize, if the index block memory exceeds this threshold, the index block will be spilled to SST File as a B-Tree like structure. The reader will only load the root index on opening.
  2. Introduce a new FileInfo block, containing some stats and users are free to add new k-v pairs.
  3. Modify the footer structure, including:
    1. add some stats such as uncompressed data size, uncompressed index size, row count and more
    2. compression type is moved from BlockTrailer. This follows Hbase's design, so that we do not have to create a compressionFactory as well as a decompressor for each block
    3. Add a VERSION number.

Tests

Please see

  • org.apache.paimon.sst.IndexTest for index test
  • org.apache.paimon.sst.SstFileTest for file test
  • org.apache.paimon.lookup.sort.SortLookupStoreFactoryTest for lookup store test

API and Format

This pr do not change any public api

Documentation

todo

private static final Logger LOG = LoggerFactory.getLogger(SstFileWriter.class.getName());

public static final int MAGIC_NUMBER = 1481571681;
public static final int VERSION = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change this, MAGIC_NUMBER is the version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Thanks! I will modify this soon. I was thinking about version is easier to understand if we want to evolve the file format. For example, if (version == 1) is clearer than if (magic_number == 187762739)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SST is already a file format, If there is an incompatibility, we should modify magic_number

@steFaiz
Copy link
Contributor Author

steFaiz commented Dec 11, 2025

Converting to draft for compatibility issues. We could move forward this work in the future!

@steFaiz steFaiz marked this pull request as draft December 11, 2025 07:23
@steFaiz steFaiz changed the title [core] introduce SST File Version 1 [core] improve SST File format Dec 11, 2025
@JingsongLi JingsongLi changed the title [core] improve SST File format [wip][core] improve SST File format Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants