Skip to content

Commit ca48f45

Browse files
committed
Sync chromasql from 71ba2494b4c3a7907e1c96415d8c545372a90609
1 parent 1c4d45e commit ca48f45

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+2588
-524
lines changed

.gitignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Build artifacts
2+
dist/
3+
build/
4+
*.egg-info/
5+
6+
# Python bytecode
7+
__pycache__/
8+
*.py[cod]
9+
*.pyo
10+
11+
# Virtual environments
12+
.venv/
13+
venv/
14+
15+
# Environment files
16+
.env
17+
18+
# Miscellaneous
19+
*.log
20+
21+
site/

CLI_README.md

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# ChromaSQL Server CLI
2+
3+
The ChromaSQL Server CLI provides an easy way to spin up a server that exposes `MultiCollectionService` via HTTP API. This allows you to query one or more ChromaDB collections (local or cloud) using the ChromaSQL query language.
4+
5+
## Installation
6+
7+
The CLI is included in the chromasql package. When installed, it provides the `chromasql-server` command:
8+
9+
```bash
10+
# Install chromasql package
11+
cd chromasql
12+
pip install -e .
13+
14+
# Or from the main project
15+
poetry install
16+
```
17+
18+
## Usage
19+
20+
### Basic Usage
21+
22+
Start a server with a single local collection:
23+
24+
```bash
25+
poetry run chromasql-server --client "local:/path/to/collection"
26+
```
27+
28+
### Multiple Collections
29+
30+
Start a server with multiple collections:
31+
32+
```bash
33+
poetry run chromasql-server \
34+
--client "local:/path/to/collection1" \
35+
--client "local:/path/to/collection2" \
36+
--client "local:/path/to/collection3"
37+
```
38+
39+
### Cloud Collections
40+
41+
Start a server with a cloud-hosted collection:
42+
43+
```bash
44+
poetry run chromasql-server \
45+
--client "cloud:my-tenant:my-database:env:CHROMA_API_KEY"
46+
```
47+
48+
Note: Use `env:VAR_NAME` syntax to reference environment variables for API keys.
49+
50+
### YAML Configuration
51+
52+
For complex setups, use a YAML configuration file:
53+
54+
```yaml
55+
# collections.yaml
56+
collections:
57+
- type: local
58+
name: my_local_collection
59+
persist_dir: /path/to/local/collection
60+
discriminator_field: model_name
61+
model_registry_target: my.module.registry:MODEL_REGISTRY
62+
embedding_model: text-embedding-3-small
63+
collection_name: my_collection
64+
65+
- type: cloud
66+
name: my_cloud_collection
67+
tenant: my-tenant
68+
database: my-database
69+
api_key: env:CHROMA_API_KEY
70+
query_config_path: /path/to/query_config.json
71+
discriminator_field: model_name
72+
model_registry_target: my.module.registry:MODEL_REGISTRY
73+
embedding_model: text-embedding-3-small
74+
```
75+
76+
Start the server with the configuration file:
77+
78+
```bash
79+
poetry run chromasql-server --config-file collections.yaml
80+
```
81+
82+
### Server Options
83+
84+
Customize the server behavior:
85+
86+
```bash
87+
poetry run chromasql-server \
88+
--client "local:/path/to/collection" \
89+
--host 0.0.0.0 \
90+
--port 9000 \
91+
--reload \
92+
--verbose
93+
```
94+
95+
Options:
96+
- `--host HOST`: Server host (default: 127.0.0.1)
97+
- `--port PORT`: Server port (default: 8000)
98+
- `--reload`: Enable auto-reload for development
99+
- `--verbose, -v`: Enable verbose logging
100+
101+
## API Endpoints
102+
103+
Once the server is running, the following endpoints are available:
104+
105+
### Health Check
106+
```bash
107+
GET /api/chromasql/health
108+
```
109+
110+
Returns server health status.
111+
112+
### List Collections
113+
```bash
114+
GET /api/chromasql/indices
115+
```
116+
117+
Returns metadata for all configured collections, including:
118+
- Collection name and display name
119+
- Embedding model
120+
- Document counts
121+
- Model registry (field schemas)
122+
- System metadata fields
123+
124+
### Execute Query
125+
```bash
126+
POST /api/chromasql/execute?collection=<collection_name>
127+
```
128+
129+
Execute a ChromaSQL query against a specific collection.
130+
131+
**Request Body:**
132+
```json
133+
{
134+
"query": "SELECT * FROM ModelName WHERE metadata.field = 'value' TOPK 10;",
135+
"limit": 500,
136+
"output_format": "json"
137+
}
138+
```
139+
140+
**Response:**
141+
```json
142+
{
143+
"query": "SELECT * FROM ModelName...",
144+
"total_rows": 10,
145+
"collections_queried": 1,
146+
"rows": [
147+
{"id": "doc1", "content": "...", "metadata": {...}},
148+
...
149+
],
150+
"rows_returned": 10
151+
}
152+
```
153+
154+
## Collection Requirements
155+
156+
Each collection directory must contain:
157+
158+
1. **query_config.json**: Chroma query configuration
159+
```json
160+
{
161+
"model_to_collections": {
162+
"ModelName": {
163+
"collections": ["collection_name"],
164+
"total_documents": 100
165+
}
166+
}
167+
}
168+
```
169+
170+
2. **chroma_data/**: ChromaDB persistent storage directory
171+
172+
3. **Model Registry**: Python module with MODEL_REGISTRY (for local collections)
173+
174+
## Example: Test Collection
175+
176+
Create a test collection:
177+
178+
```bash
179+
# Run the test collection creation script
180+
poetry run python workdir/test_collection/create_test_collection.py
181+
182+
# Start server with the test collection
183+
poetry run chromasql-server \
184+
--config-file workdir/test_collection/config.yaml \
185+
--port 8888
186+
```
187+
188+
Test the endpoints:
189+
190+
```bash
191+
# Health check
192+
curl http://localhost:8888/api/chromasql/health
193+
194+
# List collections
195+
curl http://localhost:8888/api/chromasql/indices | jq
196+
197+
# Execute query
198+
curl -X POST "http://localhost:8888/api/chromasql/execute?collection=test_collection" \
199+
-H "Content-Type: application/json" \
200+
-d '{
201+
"query": "SELECT * FROM TestDocument WHERE metadata.category = '\''programming'\'' TOPK 5;",
202+
"limit": 100
203+
}' | jq
204+
```
205+
206+
## CLI Arguments Reference
207+
208+
### Client Specification Format
209+
210+
**Local Collection:**
211+
```
212+
--client "local:<path_to_persist_dir>"
213+
```
214+
215+
**Cloud Collection:**
216+
```
217+
--client "cloud:<tenant>:<database>:<api_key_or_env_ref>"
218+
```
219+
220+
### YAML Configuration Format
221+
222+
See the [YAML Configuration](#yaml-configuration) section above for the complete schema.
223+
224+
## Developer Experience
225+
226+
The CLI provides a similar developer experience to the factory pattern in `adri_agents/app/server_factory.py`:
227+
228+
1. **Configuration-driven**: Define collections via CLI args or YAML
229+
2. **Multi-collection support**: Host multiple collections in one server
230+
3. **Flexible deployment**: Local development or cloud-hosted collections
231+
4. **Type-safe**: Pydantic models for configuration and responses
232+
5. **FastAPI-based**: Auto-generated OpenAPI docs at `/docs`
233+
234+
## Troubleshooting
235+
236+
### Import Errors
237+
238+
If you see import errors related to `indexer` or `adri_agents`:
239+
240+
- Make sure you're running the CLI from the main project: `poetry run chromasql-server`
241+
- Ensure all dependencies are installed: `poetry install`
242+
243+
### Collection Not Found
244+
245+
If you see "Unknown collection" errors:
246+
247+
- Verify the collection name matches the key in your env_map or YAML config
248+
- Check that query_config.json exists in the collection directory
249+
- Ensure the ChromaDB data directory exists and is accessible
250+
251+
### Query Execution Errors
252+
253+
If queries fail:
254+
255+
- Verify the model_registry_target points to a valid Python module
256+
- Check that the discriminator_field matches your metadata fields
257+
- Ensure the embedding_model is consistent with your indexed data
258+
259+
## Next Steps
260+
261+
- Add authentication middleware for production deployment
262+
- Implement rate limiting and request throttling
263+
- Add metrics and monitoring endpoints
264+
- Support for batch query execution
265+
- WebSocket support for streaming results

CONTRIBUTING.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,9 @@ clause. ChromaSQL provides a generic `MetadataFieldRouter` adapter:
143143

144144
```python
145145
from pathlib import Path
146-
from chromasql.adapters import AsyncMultiCollectionAdapter, MetadataFieldRouter
146+
from chromasql.adapters import MetadataFieldRouter
147147
from chromasql.multi_collection import execute_multi_collection
148+
from indexer.query_lib.async_multi_collection_adapter import AsyncMultiCollectionAdapter
148149
from indexer.vectorize_lib.query_client import AsyncMultiCollectionQueryClient
149150
from indexer.vectorize_lib.query_config import load_query_config
150151

0 commit comments

Comments
 (0)