ModelEngine-Group
diff --git a/‎docs/source/getting-started/installation_npu.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/getting-started/installation_npu.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/getting-started/quick_start.md‎
Lines changed: 14 additions & 7 deletions b/‎docs/source/getting-started/quick_start.md‎
Lines changed: 14 additions & 7 deletions
diff --git a/‎docs/source/user-guide/prefix-cache/nfs_store.md‎
Lines changed: 11 additions & 10 deletions b/‎docs/source/user-guide/prefix-cache/nfs_store.md‎
Lines changed: 11 additions & 10 deletions
diff --git a/‎docs/source/user-guide/sparse-attention/esa.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/user-guide/sparse-attention/esa.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/user-guide/sparse-attention/gsa.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/user-guide/sparse-attention/gsa.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/user-guide/sparse-attention/kvcomp.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/user-guide/sparse-attention/kvcomp.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/user-guide/sparse-attention/kvstar.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/user-guide/sparse-attention/kvstar.md‎
Lines changed: 1 addition & 0 deletions
@@ -57,7 +57,7 @@ cd ..
 ```
 
 ## Setup from docker
-Download the pre-built docker image provided or build unified-cache-management docker image by commands below:
+Download the pre-built `vllm-ascend` docker image or build unified-cache-management docker image by commands below:
  ```bash
  # Build docker image using source code, replace <branch_or_tag_name> with the branch or tag name needed
  git clone --depth 1 --branch <branch_or_tag_name> https://github.com/ModelEngine-Group/unified-cache-management.git
 
@@ -59,7 +59,17 @@ First, specify the python hash seed by:
 export PYTHONHASHSEED=123456
 ```
 
-Run the following command to start the vLLM server with the Qwen/Qwen2.5-14B-Instruct model:
+Create a config yaml like following and save it to your own directory:
+```yaml
+# UCM Configuration File Example
+# Refer to file unified-cache-management/examples/ucm_config_example.yaml for more details
+ucm_connector_name: "UcmNfsStore"
+
+ucm_connector_config:
+  storage_backends: "/mnt/test"
+```
+
+Run the following command to start the vLLM server with the Qwen/Qwen2.5-14B-Instruct model and your config file path:
 
 ```bash
 # Change the model path to your own model path
@@ -73,14 +83,11 @@ vllm serve ${MODEL_PATH} \
 --port 7800 \
 --kv-transfer-config \
 '{
-    "kv_connector": "UnifiedCacheConnectorV1",
-    "kv_connector_module_path": "ucm.integration.vllm.uc_connector",
+    "kv_connector": "UCMConnector",
+    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
     "kv_role": "kv_both",
     "kv_connector_extra_config": {
-        "ucm_connector_name": "UcmNfsStore",
-        "ucm_connector_config": {
-            "storage_backends": "/home/test"
-        }
+        "UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"
     }
 }'
 ```
 
@@ -87,8 +87,15 @@ To use the NFS connector, you need to configure the `connector_config` dictionar
 
 ### Example:
 
-```python
-kv_connector_extra_config={"ucm_connector_name": "UcmNfsStore", "ucm_connector_config":{"storage_backends": "/mnt/test1", "transferStreamNumber": 32}}
+Create a config yaml like following and save it to your own directory:
+```yaml
+# UCM Configuration File Example
+# Refer to file unified-cache-management/examples/ucm_config_example.yaml for more details
+ucm_connector_name: "UcmNfsStore"
+
+ucm_connector_config:
+  storage_backends: "/mnt/test"
+  transferStreamNumber: 32
 ```
 
 ## Launching Inference
@@ -101,7 +108,7 @@ To start **offline inference** with the NFS connector，modify the script `examp
 # In examples/offline_inference.py
 ktc = KVTransferConfig(
     ...
-    kv_connector_extra_config={"ucm_connector_name": "UcmNfsStore", "ucm_connector_config":{"storage_backends": "/mnt/test1", "transferStreamNumber": 32}}
+    kv_connector_extra_config={"UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"}
 )
 ```
 
@@ -131,13 +138,7 @@ vllm serve /home/models/Qwen2.5-14B-Instruct \
     "kv_connector": "UnifiedCacheConnectorV1",
     "kv_connector_module_path": "ucm.integration.vllm.uc_connector",
     "kv_role": "kv_both",
-    "kv_connector_extra_config": {
-        "ucm_connector_name": "UcmNfsStore",
-        "ucm_connector_config": {
-            "storage_backends": "/mnt/test",
-            "transferStreamNumber":32
-        }
-    }
+    "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"}
 }'
 ```
 
 
@@ -9,6 +9,7 @@ ESA provides developers with an intuitive example of how to implement their own
 ### Basic Usage
 ESA can be launched using the following command:
 ```shell
+export ENABLE_SPARSE=TRUE
 export MODEL_PATH="/path/to/model" # For example: /home/models/Qwen2.5-14B-Instruct
 export DATASET_PATH="/path/to/longbench/multifieldqa_zh.jsonl" # For example: /home/data/Longbench/data/multifieldqa_zh.jsonl
 python examples/offline_inference_esa.py
 
@@ -107,6 +107,8 @@ ktc = KVTransferConfig(
 Thus, an example command for launching the online LLM service is as follows:
 
 ```shell
+export ENABLE_SPARSE=TRUE
+
 vllm serve /home/models/DeepSeek-R1-Distill-Qwen-32B \
 --served-model-name DeepSeek-R1-Distill-Qwen-32B \
 --max-model-len 131000 \
 
@@ -97,6 +97,7 @@ This design ensures both **efficiency** and **accuracy** by preserving essential
 KVComp is part of the UCM Sparse Attention module. For installation instructions, please refer to the [UCM's top-level README](https://github.com/ModelEngine-Group/unified-cache-management). Once UCM is installed, KVComp is naturally supported by running the following example python scripts.
 
 ```bash
+export ENABLE_SPARSE=TRUE
 python ucm/sandbox/sparse/kvcomp/offline_inference_kvcomp.py
 ```
 
 
@@ -32,6 +32,7 @@ For long-sequence inference, KVstar achieves the following with minimal accuracy
 ### Basic Usage
 KVstar can be launched using the following command:
 ```shell
+export ENABLE_SPARSE=TRUE
 export MODEL_PATH="/path/to/model" # For example: /home/models/Qwen2.5-14B-Instruct
 export DATASET_PATH="/path/to/longbench/multifieldqa_zh.jsonl" # For example: /home/data/Longbench/data/multifieldqa_zh.jsonl
 export DATA_DIR="/path/to/data"