Skip to content

Implement Granular Token Billing and User Usage Tracking #58

@Junyi-99

Description

@Junyi-99

Context

Currently, the system lacks visibility into individual user token usage. A flat subscription model risks significant cost overruns due to usage disparities between user groups (e.g., standard undergraduate use vs. heavy usage by PhD students/developers).
We need to urgently implement "Put User Cost" tracking to distinguish between Heavy and Light users, paving the way for future tiered pricing or quota limits.

Goals

  1. Short-term: Accurately track and record input/output token consumption per user.
  2. Long-term: Establish a unified AI Gateway (LiteLLM) to handle automatic pricing, cost calculation, and user-defined API keys.
  3. Legacy Debt: Address the billing visibility and stability issues within the current Python-based MCP (Model Context Protocol) service.

Proposed Solutions

Option A: MVP (Quick Implementation)

  • Direct Logging: Parse the usage field from the LLM API response in the backend and write token counts directly to the database.
  • Retroactive Calculation: Implement a Python Cronjob to read historical chat logs and calculate past token consumption using an offline tokenizer.
  • Note: Embedding costs are negligible and can be ignored or calculated separately.

Option B: AI Gateway (LiteLLM) - Recommended

Deploy LiteLLM as a unified gateway to manage requests to OpenRouter and other providers.

  • Billing Integration: Utilize the Cost/Usage information returned in LiteLLM headers.

  • Challenge: Verify if the Streaming API accurately returns cost headers; if not, calculation must occur at the end of the stream.

  • User Attribution: Inject User ID into request headers to utilize LiteLLM's Metadata/Tagging for user-level statistics.

  • Custom Model Support: Implement a Go Interceptor to handle logic for custom API keys. If the model is not in the whitelist, dynamically construct the config for the user-provided key.

Option C: MCP Service Refactor

The current Python-based MCP service poses significant risks:

  1. Billing Black Box: It runs independently, making it difficult to track internal token consumption (especially for "Deep Research" tasks).
  2. Instability: High failure rate for function calls and poor handling of edge cases.
  3. Decision:
  • Plan A: Temporarily disable MCP as it is not production-ready.
  • Plan B: Migrating core logic to the main Go service to leverage Go's concurrency control and unified billing logic.

Action Plan

Phase 1: Data Infrastructure

  • Database Schema: Design user_token_usage or billing_records tables.
  • Data Collection:
  • Update Go backend to extract and store usage data from API responses.
  • Create a script/cronjob to backfill usage data for historical messages using an offline tokenizer.

Phase 2: AI Gateway Integration (LiteLLM)

  • Infra: Deploy LiteLLM Pod and configure ConfigMaps.

  • Middleware Development:

  • Implement Go Interceptor to inject User ID into headers.

  • Implement logic for User-Defined Keys (dynamic config generation for non-whitelisted models).

  • Billing Sync: Parse LiteLLM response headers (Cost/Usage) and sync to the billing database.

  • Testing: Verify billing accuracy under stream: true mode.

Phase 3: MCP Governance & Migration

  • Audit: Review existing Python MCP code to identify essential features vs. spaghetti code.
  • Refactor: Rewrite core MCP logic in Go and integrate it into the main service pipeline.
  • Deprecate: Decommission the unstable Python MCP service.

Discussion

  • Streaming Costs: If the Gateway cannot return real-time costs during streaming, should we implement a token counter in the Worker layer as a fallback?
  • Resourcing: The MCP migration is labor-intensive. Should we consider assigning this to interns (undergrads) to assist with the Python-to-Go migration or unit testing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Design / Spec

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions