Implement Granular Token Billing and User Usage Tracking

## Context

Currently, the system lacks visibility into individual user token usage. A flat subscription model risks significant cost overruns due to usage disparities between user groups (e.g., standard undergraduate use vs. heavy usage by PhD students/developers).
We need to urgently implement **"Put User Cost"** tracking to distinguish between Heavy and Light users, paving the way for future tiered pricing or quota limits.

## Goals

1. **Short-term:** Accurately track and record input/output token consumption per user.
2. **Long-term:** Establish a unified AI Gateway (LiteLLM) to handle automatic pricing, cost calculation, and user-defined API keys.
3. **Legacy Debt:** Address the billing visibility and stability issues within the current Python-based MCP (Model Context Protocol) service.

## Proposed Solutions

### Option A: MVP (Quick Implementation)

* **Direct Logging:** Parse the `usage` field from the LLM API response in the backend and write token counts directly to the database.
* **Retroactive Calculation:** Implement a Python Cronjob to read historical chat logs and calculate past token consumption using an offline tokenizer.
* *Note:* Embedding costs are negligible and can be ignored or calculated separately.



### Option B: AI Gateway (LiteLLM) - Recommended

Deploy `LiteLLM` as a unified gateway to manage requests to OpenRouter and other providers.

* **Billing Integration:** Utilize the Cost/Usage information returned in LiteLLM headers.
* *Challenge:* Verify if the Streaming API accurately returns cost headers; if not, calculation must occur at the end of the stream.


* **User Attribution:** Inject `User ID` into request headers to utilize LiteLLM's Metadata/Tagging for user-level statistics.
* **Custom Model Support:** Implement a Go Interceptor to handle logic for custom API keys. If the model is not in the whitelist, dynamically construct the config for the user-provided key.

### Option C: MCP Service Refactor

The current Python-based MCP service poses significant risks:

1. **Billing Black Box:** It runs independently, making it difficult to track internal token consumption (especially for "Deep Research" tasks).
2. **Instability:** High failure rate for function calls and poor handling of edge cases.
3. **Decision:**
* **Plan A:** Temporarily disable MCP as it is not production-ready.
* **Plan B:** Migrating core logic to the main Go service to leverage Go's concurrency control and unified billing logic.



## Action Plan

### Phase 1: Data Infrastructure

* [ ] **Database Schema:** Design `user_token_usage` or `billing_records` tables.
* [ ] **Data Collection:**
* [ ] Update Go backend to extract and store `usage` data from API responses.
* [ ] Create a script/cronjob to backfill usage data for historical messages using an offline tokenizer.



### Phase 2: AI Gateway Integration (LiteLLM)

* [ ] **Infra:** Deploy LiteLLM Pod and configure ConfigMaps.
* [ ] **Middleware Development:**
* [ ] Implement Go Interceptor to inject User ID into headers.
* [ ] Implement logic for User-Defined Keys (dynamic config generation for non-whitelisted models).


* [ ] **Billing Sync:** Parse LiteLLM response headers (Cost/Usage) and sync to the billing database.
* [ ] **Testing:** Verify billing accuracy under `stream: true` mode.

### Phase 3: MCP Governance & Migration

* [ ] **Audit:** Review existing Python MCP code to identify essential features vs. spaghetti code.
* [ ] **Refactor:** Rewrite core MCP logic in Go and integrate it into the main service pipeline.
* [ ] **Deprecate:** Decommission the unstable Python MCP service.

## Discussion

* **Streaming Costs:** If the Gateway cannot return real-time costs during streaming, should we implement a token counter in the Worker layer as a fallback?
* **Resourcing:** The MCP migration is labor-intensive. Should we consider assigning this to interns (undergrads) to assist with the Python-to-Go migration or unit testing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Granular Token Billing and User Usage Tracking #58

Context

Goals

Proposed Solutions

Option A: MVP (Quick Implementation)

Option B: AI Gateway (LiteLLM) - Recommended

Option C: MCP Service Refactor

Action Plan

Phase 1: Data Infrastructure

Phase 2: AI Gateway Integration (LiteLLM)

Phase 3: MCP Governance & Migration

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Granular Token Billing and User Usage Tracking #58

Description

Context

Goals

Proposed Solutions

Option A: MVP (Quick Implementation)

Option B: AI Gateway (LiteLLM) - Recommended

Option C: MCP Service Refactor

Action Plan

Phase 1: Data Infrastructure

Phase 2: AI Gateway Integration (LiteLLM)

Phase 3: MCP Governance & Migration

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions