Skip to content

Conversation

@hahahahbenny
Copy link
Contributor

Purpose of the PR

To support vector indexing in HugeGraph, dedicated backend data structures and serialization logic need to be added.

Main Changes

  1. In-Memory Data Structure

    HugeVectorIndexMap
    • Represents a single vector-index entry at runtime.
    • Carries sequence (offset) and IndexVectorState (dirty flag, metadata).
  2. Type-System Extensions

    Enum New Constant Purpose
    IndexType VECTOR Top-level kind for vector indices.
    HugeType VECTOR_INDEX_MAP Identifies the per-entry record.
    HugeType VECTOR_SEQUENCE Identifies the global sequence counter.

2.1 Column family design

VECTOR_INDEX_MAP

  • elemId: ID of the vertex being indexed.
  • sequence: long.
  • vectorStateCode = IndexVectorState.code(): state of the vector index (e.g., BUILDING / FLUSHED / DELETING).
Item Design
Key [1B type][4B indexId][4B vectorId][4B elemId][(optional) VLong expiredTime]
Value Fixed 9 bytes: [8B sequence][1B vectorStateCode]

VECTOR_SEQUENCE

  • vectorId encoded as int.
Item Design
Key [1B dirty_prefix][4B indexId][8B sequence][4B vectorId]
Value sequence (8B long) + state (1B IndexVectorState.code)`

2.2 vector index state machine

stateDiagram-v2
    [*] --> BUILDING: user writes vector<br/>GraphTransaction.commit()

    BUILDING --> FLUSHED: VectorIndexManager consumes<br/>and flushes to snapshot

    FLUSHED --> BUILDING: user modifies vector<br/>update operation

    FLUSHED --> DELETING: user deletes vector<br/>delete operation

    BUILDING --> DELETING: vector under construction deleted<br/>delete operation

    DELETING --> BUILDING: deleted vector re-written<br/>write operation

    DELETING --> [*]: VectorIndexManager consumes deletion<br/>physically purged from RocksDB
Loading
  1. On-Disk Binary Layout
    Serializer entry points:

    BinarySerializer#writeIndex(HugeVectorIndexMap)
    BinarySerializer#writeVectorSequence(...)
    BinarySerializer#readVectorSequence(...)

    Target column families:

    • cf_vector_index_map – stores the state of each vector index.
    • cf_vector_seq_index – stores monotonically increasing sequence IDs.
  2. Test Coverage

    VectorIndexSerializerTest

    Locks down the byte-level format for:

    • sequence id
    • dirty marker id
    • vector index value
    • sequence entry record
      Guarantees backward compatibility if the format ever changes.

  1. 内存数据结构

    HugeVectorIndexMap
    • 运行时承载单条向量索引入口。
    • 内置 sequence(偏移)与 IndexVectorState(脏标记及元数据)。
  2. 类型系统扩展

    枚举 新增常量 用途
    IndexType VECTOR 向量索引顶层类型。
    HugeType VECTOR_INDEX_MAP 标识单条记录。
    HugeType VECTOR_SEQUENCE 标识全局序列计数器。

2.1 两个CF的key value设计

VECTOR_INDEX_MAP

  • 其中 elemId 为被索引顶点 ID。
  • sequencelong
  • vectorStateCode = IndexVectorState.code() 表示向量索引状态(如 BUILDING/FLUSHED/DELETING)。
项目 设计
Key [1B type][4B indexId][4B vectorId][4B elemId][(可选)VLong expiredTime]
Value 固定 9 字节:[8B sequence][1B vectorStateCode]

VECTOR_SEQUENCE

  • 其中 vectorId 作为 int 编码
项目 设计
Key [1B dirty_prefix] + [4B indexId] + [8B Sequence] + [4B vectorId]
Value sequence(8B long) + state(1B IndexVectorState.code)

2.2 vector state 状态以及状态机变化

stateDiagram-v2
    [*] --> BUILDING: 用户写入向量<br/>GraphTransaction.commit()

    BUILDING --> FLUSHED: VectorIndexManager 消费<br/>并落盘到快照

    FLUSHED --> BUILDING: 用户修改向量<br/>更新操作

    FLUSHED --> DELETING: 用户删除向量<br/>删除操作

    BUILDING --> DELETING: 构建中的向量被删除<br/>删除操作

    DELETING --> BUILDING: 已删除向量被重新写入<br/>写入操作

    DELETING --> [*]: VectorIndexManager 消费删除<br/>物理清理 RocksDB
Loading
  1. 落盘二进制格式
    序列化入口:

    BinarySerializer#writeIndex(HugeVectorIndexMap)
    BinarySerializer#writeVectorSequence(...)
    BinarySerializer#readVectorSequence(...)

    目标列族:

    • cf_vector_index_map – 存储每条向量索引状态。
    • cf_vector_seq_index – 存储单调递增的序列号。
  2. 测试锁定

    VectorIndexSerializerTest

    固化以下字段的字节级格式:

    • 序列号
    • 脏标记号
    • 向量索引值
    • 序列条目
      防止后续意外变更,保证向后兼容。

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • xxx

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

JisoLya and others added 18 commits October 31, 2025 19:03
…che#2893)

* docs(pd): update test commands and improve documentation clarity

* Update README.md

---------

Co-authored-by: imbajin <jin@apache.org>
* update(store): fix some problem and clean up code

- chore(store): clean some comments
- chore(store): using Slf4j instead of System.out to print log
- update(store): update more reasonable timeout setting
- update(store): add close method for CopyOnWriteCache to avoid potential memory leak
- update(store): delete duplicated beginTx() statement
- update(store): extract parameter for compaction thread pool(move to configuration file in the future)
- update(store): add default logic in AggregationFunctions
- update(store): fix potential concurrency problem in QueryExecutor

* Update hugegraph-store/hg-store-common/src/main/java/org/apache/hugegraph/store/query/func/AggregationFunctions.java

---------

Co-authored-by: Peng Junzhi <78788603+Pengzna@users.noreply.github.com>
* fix(store): fix duplicated definition log root
…p ci & remove duplicate module (apache#2910)

* add missing license and remove binary license.txt

* remove dist in commons

* fix tinkerpop test open graph panic and other bugs

* empty commit to trigger ci
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. store Store module tests Add or improve test cases labels Nov 23, 2025
# This is the 1st commit message:

add Licensed to files

# This is the commit message apache#2:

feat(server): support vector index in graphdb  (apache#2856)

* feat(server): Add the vector index type and the detection of related fields to the index label.

* fix code format

* add annsearch API

* add doc to explain the plan

delete redundency in vertexapi
@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

❌ Patch coverage is 0.22624% with 441 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (vector-index@c92710c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...java/org/apache/hugegraph/api/auth/ManagerAPI.java 0.00% 105 Missing ⚠️
...apache/hugegraph/structure/HugeVectorIndexMap.java 0.00% 46 Missing ⚠️
...hugegraph/backend/serializer/BinarySerializer.java 0.00% 39 Missing ⚠️
...n/java/org/apache/hugegraph/core/GraphManager.java 0.00% 33 Missing ⚠️
...va/org/apache/hugegraph/api/filter/PathFilter.java 0.00% 22 Missing ⚠️
...apache/hugegraph/type/define/IndexVectorState.java 0.00% 20 Missing ⚠️
...he/hugegraph/store/client/query/QueryExecutor.java 0.00% 15 Missing ⚠️
...he/hugegraph/backend/tx/GraphIndexTransaction.java 0.00% 14 Missing ⚠️
.../apache/hugegraph/store/util/CopyOnWriteCache.java 0.00% 14 Missing ⚠️
.../java/org/apache/hugegraph/api/auth/TargetAPI.java 0.00% 10 Missing ⚠️
... and 34 more
Additional details and impacted files
@@              Coverage Diff               @@
##             vector-index   #2913   +/-   ##
==============================================
  Coverage                ?   1.49%           
  Complexity              ?      21           
==============================================
  Files                   ?     782           
  Lines                   ?   65240           
  Branches                ?    8353           
==============================================
  Hits                    ?     975           
  Misses                  ?   64181           
  Partials                ?      84           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files. store Store module tests Add or improve test cases

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

7 participants