Skip to content

Conversation

@ChinmayBansal
Copy link
Contributor

@ChinmayBansal ChinmayBansal commented Sep 12, 2025

Related Issues

Proposed Changes:

This PR implements reasoning support for Cohere's chat generator following the parent issue #9700 requirements. The implementation adds the ability to extract and store model reasoning content in the ReasoningContent field of ChatMessage.

Key Features Added:

  • Reasoning extraction function (_extract_reasoning_from_response) that
    supports multiple formats:
    • XML-style tags: <thinking>, <reasoning>
    • Section headers: ## Reasoning, ## Thinking, ## My reasoning
    • Natural language patterns: "Let me think", "Step by step", "First,"
    • Intelligent separator detection to determine reasoning boundaries
  • Dual-path response parsing that handles both regular responses and tool
    call scenarios
  • Streaming support with custom
    _convert_streaming_chunks_to_chat_message_with_reasoning function
  • Tool call compatibility ensuring reasoning works alongside tool calls
    without conflicts
  • Content cleaning that removes reasoning markers from the main response
    text
  • Length filtering with 30-character minimum threshold to avoid false
    positives

The implementation follows the established patterns from Ollama (#2200) and
Google GenAI (#2212) PRs, ensuring consistency across providers.

How did you test it?

  • Unit tests: Added comprehensive test suite (TestReasoningExtraction)
    covering:
    • Various reasoning pattern formats (<thinking>, <reasoning>,
      step-by-step)
    • Edge cases (no reasoning, short reasoning ignored)
    • Natural language reasoning detection
  • Integration tests: Added TestCohereChatGeneratorReasoning class with:
    • End-to-end reasoning extraction with Command A Reasoning model
      (lightweight)
    • Mock response testing for consistent behavior
    • Tool call compatibility verification
  • Manual verification: All unit tests pass, integration tests show expected
    rate limiting (API working correctly)
  • Code quality: All linting (hatch run fmt) and type checking (hatch run test:types) pass

Notes for the reviewer

  • The reasoning extraction logic is in _extract_reasoning_from_response()
    function (lines ~355-435)
  • Key integration points are in _parse_response() for both regular and tool
    call paths (lines ~194-220)
  • Streaming support uses a post-processing approach since Cohere embeds
    reasoning in response text
  • Test cases use the lightweight command-a-reasoning-111b-2024-10-03 model as
    recommended
  • Implementation handles the Cohere-specific thinking parameter via
    generation_kwargs={"thinking": True}

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:,feat:, build:, chore:, ci:, docs:,style:, refactor:, perf:,test:.

@ChinmayBansal ChinmayBansal requested a review from a team as a code owner September 12, 2025 06:32
@ChinmayBansal ChinmayBansal requested review from sjrl and removed request for a team September 12, 2025 06:32
@github-actions github-actions bot added integration:cohere type:documentation Improvements or additions to documentation labels Sep 12, 2025
@ChinmayBansal ChinmayBansal force-pushed the feat/cohere-reasoning-support branch from 9bce185 to 52ae02b Compare September 16, 2025 16:36
@ChinmayBansal
Copy link
Contributor Author

@sjrl I have addressed your feedback. The changes I have made are using the native method while keeping the fallback method if the native method does not apply reasoning content. I have also updated the test cases.

Let me know if more changes are needed!

@sjrl
Copy link
Contributor

sjrl commented Nov 18, 2025

Hey @ChinmayBansal I noticed in your PR description you also mention adding support for reasoning content when streaming as well, but I don't see any changes in the code to reflect this. Is this something you could also add?

@ChinmayBansal
Copy link
Contributor Author

@sjrl I had included it in the text-based fallback extraction thinking it was needed for streaming. After doing some more research, I found that streaming API does support thinking with chunk.delta.message.content.thinking in content delta events similar to the non streaming structure. I did remove the fallback logic. The current implementation now uses only the native API for non streaming responses. For streaming, it would have to come through content delta events but Haystack's StreamingChunk does not currently have a reasoning field to handle this. This could be added as a followup if needed.

@sjrl
Copy link
Contributor

sjrl commented Nov 19, 2025

For streaming, it would have to come through content delta events but Haystack's StreamingChunk does not currently have a reasoning field to handle this. This could be added as a followup if needed.

StreamingChunk does have a field for this called reasoning. The docs are out of date on this but if your check our API docs here https://docs.haystack.deepset.ai/reference/data-classes-api#streamingchunk you can see that StreamingChunk has a reasoning field.

I'd recommend checking out how it's used in the OpenAIRresponsesChatGenerator here https://github.com/deepset-ai/haystack/blob/0100a4849abbe1cbaa6d2b1cdbb3e1d2a471c3ff/haystack/components/generators/chat/openai_responses.py#L579

@ChinmayBansal
Copy link
Contributor Author

@sjrl I have added streaming support for reasoning content. I handled the content-deltas now and created reasoning content objects for thinking deltas. Since the StreamingChunk can have one of content/tool_calls/reasoning set at a time, I prioritized thinking content over text when both are there.

I also added test cases and followed the OpenAI implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:cohere type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cohere - reasoning support

2 participants