Skip to content

Feature Proposal: Tool Response Compression for Token Optimization #11

@yaniv-apigene

Description

@yaniv-apigene

Rationale

Large JSON responses from MCP tools consume significant tokens when passed to LLMs, impacting three critical dimensions:

  1. Accuracy: Token limits force truncation of valuable data, reducing context completeness
  2. Cost: More tokens = higher API costs (especially with large context windows)
  3. Latency: Larger payloads increase transmission and processing time

By compressing JSON responses, we can reduce token consumption while preserving all data, leading to better accuracy, lower costs, and faster response times.

Expected Token Size Improvement

Based on analysis of typical tool responses, we expect 30%-60% token reduction depending on the tool and output size. Tools returning arrays of uniform objects with larger datasets see the highest compression ratios.

Tools with Highest Impact:

  1. list_llm_models
  2. search_traces
  3. search_spans
  4. find_errors (nested arrays)
  5. get_llm_expensive_traces
  6. get_llm_slow_traces
  7. list_llm_tools
  8. get_trace (nested spans array)

Implementation Approach

Column/Row JSON Format (Recommended)

Convert arrays of objects with identical keys into tabular format:

// Before
{"models": [
  {"model": "gpt-4", "provider": "openai", "count": 48},
  {"model": "gpt-3.5", "provider": "openai", "count": 12}
]}

// After
{"models": {
  "columns": ["model", "provider", "count"],
  "rows": [["gpt-4", "openai", 48], ["gpt-3.5", "openai", 12]]
}}

Benefits:

  • ✅ Returns valid JSON (MCP-compatible)
  • ✅ Human-readable
  • ✅ No client-side decoding needed
  • ✅ Works with existing JSON parsers

Alternative: TOON Format

See TOON format specification for details.

Implementation Plan

Where to Apply Changes

Apply compression to 8 tool response functions:

  1. src/opentelemetry_mcp/tools/list_models.py - list_models()
  2. src/opentelemetry_mcp/tools/search.py - search_traces()
  3. src/opentelemetry_mcp/tools/search_spans.py - search_spans()
  4. src/opentelemetry_mcp/tools/errors.py - find_errors()
  5. src/opentelemetry_mcp/tools/expensive_traces.py - get_expensive_traces()
  6. src/opentelemetry_mcp/tools/slow_traces.py - get_slow_traces()
  7. src/opentelemetry_mcp/tools/list_llm_tools.py - list_llm_tools()
  8. src/opentelemetry_mcp/tools/trace.py - get_trace() (compress nested spans array)

Steps

  1. Add compression utility: Implement tabular compression function (convert arrays of uniform objects to column/row format)
  2. Apply in tool functions: Compress arrays before returning JSON (only if compression ratio > threshold, e.g., 5%)
  3. Optional: Add config option to enable/disable compression

Example Pattern

from opentelemetry_mcp.compression import compact_json
import json

async def list_models(...) -> str:
    # ... existing logic ...
    result = {
        "count": len(models_list),
        "models": models_list,
    }
    compressed_result = compact_json(result)
    return json.dumps(compressed_result, indent=2)

Open Questions

  1. Should compression be enabled by default or opt-in?
  2. What minimum compression ratio threshold should be used?
  3. add a configuration option to disable compression?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions