Skip to content

Conversation

@isaacbmiller
Copy link
Collaborator

@isaacbmiller isaacbmiller commented Dec 4, 2025

Will be split into 2 PRs. Do not merge

Refactors both dspy.History and dspy.ReAct.

Goal

Multi turn chat has been a weaker part of the DSPy dx. It is not clear how you should maintain history, nor particularly easy to do so. These PRs should allow you to build multi turn, (optionally tool use) chatbots more easily.

This PR is the start of efforts to alleviate the concerns brought up in #8798

Overview

History

DSPy.History used to be set to only a single input type.

All inputs and outputs needed to match the current signature that is being used otherwise each step would be filtered to match those fields. This was done to guarantee that we could separate into proper user/assistant pairs. We relax that constraint and give the user more flexibility on how to collect and display messages.

History now has 4 possible modes that it can handle:

  1. signature (the old default) - will filter strictly to the current signature
  2. demos - has input/output fields labeled via a nested dict with: `{"input_fields": {"k":"v"}, "output_fields":{"k":"v"}}, but does not require them to match a signature. This still allows you to get user/assistant splitting.
  3. flat - a dict of kv pairs - all messages go into the user side
  4. raw - uses user/assistant messages

ReAct

dspy.ReAct uses two signatures internally. ReAct.react and ReAct.extract. ReAct does "{input_fields}, trajectory:str -> tool: ToolCall" in a loop, and then "{input_fields}, trajectory: str -> {output_fields}"

Testing

The primary testing has been done using the following script to ensure that the messages still work properly. Before merging, I will run some tool use benchmarks on both new and old versions across a suite of differently sized models and ensure parity or better for the new versions.

def run_python(code: str) -> str:
    """Execute Python code and return the output."""
    return interpreter.execute(code)


try:
    while True:
        coding_agent = dspy.ReAct("request: str -> result: str", tools=[run_python])
        history = dspy.History(messages=[])
        user_input = "What is 2+2 (use python)"
        result = coding_agent(request=user_input, trajectory=history)
        history = dspy.History(messages=history.messages + result.trajectory.messages)
        user_input = "Multiply the result of my last calculation by 3 (use python)"
        result = coding_agent(request=user_input, trajectory=history)
        dspy.inspect_history(2)
        break
except (KeyboardInterrupt, EOFError):
    print("\nExiting... Shutting down interpreter.")
finally:
    interpreter.shutdown()

@isaacbmiller isaacbmiller force-pushed the isaac/dspy-react-refactor branch from 393e4ba to b24b448 Compare December 4, 2025 15:56
@isaacbmiller
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +134 to +138
# Only infer if mode is the default "flat" and messages clearly match another mode
if self.mode == "flat":
inferred = self._infer_mode_from_msg(self.messages[0])
if inferred in {"raw", "demo"}:
object.__setattr__(self, "mode", inferred)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve signature histories instead of flattening

The validator only upgrades the default mode from flat to raw or demo, never to signature, so a typical signature-style history like {"question": ..., "answer": ...} remains in flat mode and _format_flat_history emits a single user message. Multi-turn prompts that previously produced user/assistant pairs (e.g., the default History(messages=[{"question": "...", "answer": "..."}]) used in tests/predict/test_predict.py) now collapse the assistant reply into the next user prompt, dropping prior assistant turns from the LM context. Consider keeping signature as the default or inferring signature mode when both input and output keys are present.

Useful? React with 👍 / 👎.

Comment on lines +169 to +173
return value
try:
return json.dumps(value)
except (TypeError, ValueError):
return str(value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep tool observations’ custom payloads

Tool outputs are serialized with json.dumps and a str(value) fallback, so non‑JSON DSPy types (e.g., a tool returning dspy.Image objects or tuples) are recorded as plain strings like (Image(...)). Because split_message_content_for_custom_types never sees the custom-type markers, multimodal tool responses disappear from the trajectory (e.g., test_tool_observation_preserves_custom_type would no longer surface image_url parts to the LM). The serializer should preserve DSPy Type encodings instead of stringifying them.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants