fix(server): support dedicated backend data structures and serialization logic for vector index. #2913

hahahahbenny · 2025-11-23T11:46:57Z

Purpose of the PR

To support vector indexing in HugeGraph, dedicated backend data structures and serialization logic need to be added.

Main Changes

In-Memory Data Structure
```
HugeVectorIndexMap
```
- Represents a single vector-index entry at runtime.
- Carries sequence (offset) and IndexVectorState (dirty flag, metadata).

Type-System Extensions

Enum	New Constant	Purpose
`IndexType`	`VECTOR`	Top-level kind for vector indices.
`HugeType`	`VECTOR_INDEX_MAP`	Identifies the per-entry record.
`HugeType`	`VECTOR_SEQUENCE`	Identifies the global sequence counter.

2.1 Column family design

VECTOR_INDEX_MAP

elemId: ID of the vertex being indexed.
sequence: long.
vectorStateCode = IndexVectorState.code(): state of the vector index (e.g., BUILDING / FLUSHED / DELETING).

Item	Design
Key	`[1B type][4B indexId][4B vectorId][4B elemId][(optional) VLong expiredTime]`
Value	Fixed 9 bytes: `[8B sequence][1B vectorStateCode]`

VECTOR_SEQUENCE

vectorId encoded as int.

Item	Design
Key	`[1B dirty_prefix][4B indexId][8B sequence][4B vectorId]`
Value	sequence (8B long) + state (1B IndexVectorState.code)`

2.2 vector index state machine

stateDiagram-v2
    [*] --> BUILDING: user writes vector<br/>GraphTransaction.commit()

    BUILDING --> FLUSHED: VectorIndexManager consumes<br/>and flushes to snapshot

    FLUSHED --> BUILDING: user modifies vector<br/>update operation

    FLUSHED --> DELETING: user deletes vector<br/>delete operation

    BUILDING --> DELETING: vector under construction deleted<br/>delete operation

    DELETING --> BUILDING: deleted vector re-written<br/>write operation

    DELETING --> [*]: VectorIndexManager consumes deletion<br/>physically purged from RocksDB

On-Disk Binary Layout
Serializer entry points:
```
BinarySerializer#writeIndex(HugeVectorIndexMap)
BinarySerializer#writeVectorSequence(...)
BinarySerializer#readVectorSequence(...)
```
Target column families:
- cf_vector_index_map – stores the state of each vector index.
- cf_vector_seq_index – stores monotonically increasing sequence IDs.
Test Coverage
```
VectorIndexSerializerTest
```
Locks down the byte-level format for:
- sequence id
- dirty marker id
- vector index value
- sequence entry record
  Guarantees backward compatibility if the format ever changes.

内存数据结构
```
HugeVectorIndexMap
```
- 运行时承载单条向量索引入口。
- 内置 sequence（偏移）与 IndexVectorState（脏标记及元数据）。
类型系统扩展

枚举新增常量用途

IndexType VECTOR 向量索引顶层类型。

HugeType VECTOR_INDEX_MAP 标识单条记录。

HugeType VECTOR_SEQUENCE 标识全局序列计数器。

2.1 两个CF的key value设计

VECTOR_INDEX_MAP

其中 elemId 为被索引顶点 ID。
sequence 为 long
vectorStateCode = IndexVectorState.code() 表示向量索引状态（如 BUILDING/FLUSHED/DELETING）。

项目	设计
Key	`[1B type][4B indexId][4B vectorId][4B elemId][(可选)VLong expiredTime]`
Value	固定 9 字节：`[8B sequence][1B vectorStateCode]`；

VECTOR_SEQUENCE

其中 vectorId 作为 int 编码

项目	设计
Key	[1B dirty_prefix] + [4B indexId] + [8B Sequence] + [4B vectorId]
Value	sequence(8B long) + state(1B IndexVectorState.code)

2.2 vector state 状态以及状态机变化

stateDiagram-v2
    [*] --> BUILDING: 用户写入向量<br/>GraphTransaction.commit()

    BUILDING --> FLUSHED: VectorIndexManager 消费<br/>并落盘到快照

    FLUSHED --> BUILDING: 用户修改向量<br/>更新操作

    FLUSHED --> DELETING: 用户删除向量<br/>删除操作

    BUILDING --> DELETING: 构建中的向量被删除<br/>删除操作

    DELETING --> BUILDING: 已删除向量被重新写入<br/>写入操作

    DELETING --> [*]: VectorIndexManager 消费删除<br/>物理清理 RocksDB

落盘二进制格式
序列化入口：
```
BinarySerializer#writeIndex(HugeVectorIndexMap)
BinarySerializer#writeVectorSequence(...)
BinarySerializer#readVectorSequence(...)
```
目标列族：
- cf_vector_index_map – 存储每条向量索引状态。
- cf_vector_seq_index – 存储单调递增的序列号。
测试锁定
```
VectorIndexSerializerTest
```
固化以下字段的字节级格式：
- 序列号
- 脏标记号
- 向量索引值
- 序列条目
  防止后续意外变更，保证向后兼容。

Verifying these changes

Trivial rework / code cleanup without any test coverage. (No Need)
Already covered by existing tests, such as (please modify tests here).
Need tests and can be verified as follows:
- xxx

Does this PR potentially affect the following parts?

Dependencies (add/update license info & regenerate_known_dependencies.sh)
Modify configurations
The public API
Other affects (new huge type , new structure)
Nope

Documentation Status

Doc - TODO
Doc - Done
Doc - No Need

…che#2893) * docs(pd): update test commands and improve documentation clarity * Update README.md --------- Co-authored-by: imbajin <jin@apache.org>

* update(store): fix some problem and clean up code - chore(store): clean some comments - chore(store): using Slf4j instead of System.out to print log - update(store): update more reasonable timeout setting - update(store): add close method for CopyOnWriteCache to avoid potential memory leak - update(store): delete duplicated beginTx() statement - update(store): extract parameter for compaction thread pool(move to configuration file in the future) - update(store): add default logic in AggregationFunctions - update(store): fix potential concurrency problem in QueryExecutor * Update hugegraph-store/hg-store-common/src/main/java/org/apache/hugegraph/store/query/func/AggregationFunctions.java --------- Co-authored-by: Peng Junzhi <78788603+Pengzna@users.noreply.github.com>

* fix(store): fix duplicated definition log root

…p ci & remove duplicate module (apache#2910) * add missing license and remove binary license.txt * remove dist in commons * fix tinkerpop test open graph panic and other bugs * empty commit to trigger ci

…fields to the index label.

# This is the 1st commit message: add Licensed to files # This is the commit message apache#2: feat(server): support vector index in graphdb (apache#2856) * feat(server): Add the vector index type and the detection of related fields to the index label. * fix code format * add annsearch API * add doc to explain the plan delete redundency in vertexapi

codecov · 2025-11-23T12:53:40Z

Codecov Report

❌ Patch coverage is 0.22624% with 441 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (vector-index@c92710c). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...java/org/apache/hugegraph/api/auth/ManagerAPI.java	0.00%	105 Missing ⚠️
...apache/hugegraph/structure/HugeVectorIndexMap.java	0.00%	46 Missing ⚠️
...hugegraph/backend/serializer/BinarySerializer.java	0.00%	39 Missing ⚠️
...n/java/org/apache/hugegraph/core/GraphManager.java	0.00%	33 Missing ⚠️
...va/org/apache/hugegraph/api/filter/PathFilter.java	0.00%	22 Missing ⚠️
...apache/hugegraph/type/define/IndexVectorState.java	0.00%	20 Missing ⚠️
...he/hugegraph/store/client/query/QueryExecutor.java	0.00%	15 Missing ⚠️
...he/hugegraph/backend/tx/GraphIndexTransaction.java	0.00%	14 Missing ⚠️
.../apache/hugegraph/store/util/CopyOnWriteCache.java	0.00%	14 Missing ⚠️
.../java/org/apache/hugegraph/api/auth/TargetAPI.java	0.00%	10 Missing ⚠️
... and 34 more

Additional details and impacted files

@@              Coverage Diff               @@
##             vector-index   #2913   +/-   ##
==============================================
  Coverage                ?   1.49%           
  Complexity              ?      21           
==============================================
  Files                   ?     782           
  Lines                   ?   65240           
  Branches                ?    8353           
==============================================
  Hits                    ?     975           
  Misses                  ?   64181           
  Partials                ?      84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JisoLya and others added 18 commits October 31, 2025 19:03

docs(store): update guidance for store module (apache#2894)

126885d

docs(pd): update test commands and improve documentation clarity (apa…

f92c5a4

…che#2893) * docs(pd): update test commands and improve documentation clarity * Update README.md --------- Co-authored-by: imbajin <jin@apache.org>

chore(server): bump rocksdb version from 7.2.2 to 8.10.2 (apache#2896)

d7697f4

fix(store): handle NPE in getVersion for file (apache#2897)

00e040b

* fix(store): fix duplicated definition log root

feat(server): add path filter for graphspace (apache#2898)

2e0cffe

fix(server): support GraphAPI for rocksdb & add tests (apache#2900)

ca5fc0c

refactor(server): remove graph param in auth api path (apache#2899)

b7998c1

fix: migrate to LTS jdk11 in all Dockerfile (apache#2901)

de0360b

feat: init serena memory system & add memories (apache#2902)

496b150

fix(server): fix reflect bug in init-store.sh (apache#2905)

41d0dbc

fix: add missing license and remove binary license.txt & fix tinkerpo…

b12425c

…p ci & remove duplicate module (apache#2910) * add missing license and remove binary license.txt * remove dist in commons * fix tinkerpop test open graph panic and other bugs * empty commit to trigger ci

feat(server): Add the vector index type and the detection of related …

d788d76

…fields to the index label.

fix code format

26eaa28

add annsearch API

cc7342e

add doc to explain the plan

fd578dc

feat: add RocksDB CF for vector index with serialize/deserialize support

1686e03

fix master merge conflict

d9d595f

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. store Store module tests Add or improve test cases labels Nov 23, 2025

github-project-automation bot added this to HugeGraph PD-Store Tasks Nov 23, 2025

github-project-automation bot moved this to In progress in HugeGraph PD-Store Tasks Nov 23, 2025

hahahahbenny added 3 commits November 23, 2025 19:48

Merge branch 'vector-index' into vector-index

9fa6abd

delete redundant method

29c2406

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(server): support dedicated backend data structures and serialization logic for vector index. #2913

fix(server): support dedicated backend data structures and serialization logic for vector index. #2913

Uh oh!

hahahahbenny commented Nov 23, 2025

Uh oh!

codecov bot commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

枚举	新增常量	用途
`IndexType`	`VECTOR`	向量索引顶层类型。
`HugeType`	`VECTOR_INDEX_MAP`	标识单条记录。
`HugeType`	`VECTOR_SEQUENCE`	标识全局序列计数器。

fix(server): support dedicated backend data structures and serialization logic for vector index. #2913

Are you sure you want to change the base?

fix(server): support dedicated backend data structures and serialization logic for vector index. #2913

Uh oh!

Conversation

hahahahbenny commented Nov 23, 2025

Purpose of the PR

Main Changes

VECTOR_INDEX_MAP

VECTOR_SEQUENCE

VECTOR_INDEX_MAP

VECTOR_SEQUENCE

Verifying these changes

Does this PR potentially affect the following parts?

Documentation Status

Uh oh!

codecov bot commented Nov 23, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants