Commit b6115d4
feat(gepa): add tool description optimization for multi-agent systems (#8928)
* feat(gepa): add tool description optimization for multi-agent systems
- Add optimize_tool_descriptions parameter (default False) to GEPA
- Extract tool descriptions from all nested modules via named_sub_modules()
- Apply optimized descriptions in DspyAdapter.build_program()
- Enables holistic optimization of tools across main and subagent modules
- Tests: 4 new tests, all 16 pass (4 new + 12 existing)
* style: fix ruff formatting (trailing whitespace)
* style: apply ruff formatting fixes
* feat(gepa): implement tool-specific proposer for tool descriptions
- Add ToolProposer with GenerateImprovedToolDescription signature
- Implement routing logic to separate tools from signatures
- Tools use ToolProposer, signatures use custom or parent default
- Backward compatible: preserves existing custom_instruction_proposer behavior
- Add test verifying routing splits components correctly
* docs(gepa): clean up multi-agent example code
- Define tool functions outside class for clarity
- Match structure of simple ReAct example
- Add clear comments explaining architecture
- Make code more readable and maintainable
* refactor(gepa): simplify tool reflective dataset with ReAct context reuse
Tools now copy ReAct's reflective data with tool-specific annotation
instead of complex trajectory extraction. This 15-line approach reuses
ReAct's existing context (thoughts, tool calls, observations) and adds
focused annotation for each tool.
Implementation:
- Tools receive full ReAct reflective examples (same trajectory context)
- Feedback prefixed: [Optimizing tool: 'X'] for focused optimization
- Reflection LM sees complete multi-step execution traces per tool
Benefits:
- Simpler: 15 lines vs 70+ line extraction approach
- Reuses code: No duplicate trajectory formatting logic
- Same context: Tools see full ReAct execution traces
- Clean: Removed all debug output
Tests:
- 4 focused tests following GEPA patterns (removed 1 redundant)
- 226KB fixture with 34 LM + 6 reflection calls
- All tests passing with gpt-5-nano traces
Documentation:
- Updated GEPA_Advanced.md with implementation details
- Explains reflective dataset construction approach
* fix(gepa): unify custom proposer routing for tools
* docs(gepa): clarify tool reflection prompt
* test: streamline GEPA tool optimization tests
* fix(gepa): streamline tool proposer formatting
* test(gepa): drop legacy dummy tool fixture
* docs(gepa): add tool-specific reflection prompt and metric example
- Add GenerateImprovedToolDescriptionFromFeedback signature documentation
- Include tool-aware metric example showing trajectory access
- Document tool prefix annotation in feedback
- Note component_selector applies to both signatures and tools
- Fix 'fundamentally' language per reviewer feedback
* docs(gepa): fix implementation details with accurate code flow
- Separate Pass 1 (predictor examples) and Pass 2 (tool aggregation)
- Clarify Generated Outputs includes full trajectory for ReAct
- Fix feedback annotation format to [Tool 'name' from 'predictor_key']
- Add Component Identification & Proposer Routing section
- Explain dual-proposer independence (custom proposer doesn't affect tool proposer)
- Use consistent terminology: 'predictor' and 'signature instructions'
* docs(gepa): remove backward compatibility note
Per reviewer feedback, backward compatibility should be implicit
* docs(gepa): improve usage examples with optimization visualization
- Add component_selector='all' to optimize all components together
- Show how to view optimized tool descriptions
- Add example output demonstrating improvement from vague to specific descriptions
- Remove unnecessary comments for cleaner examples
* docs(gepa): add design rationale comments for tool context sharing
- Document why full ReAct trajectory is shared with all tools
- Explain rationale: tool interdependencies, selection patterns, workflow context
- Add concrete example of optimization benefit
- Describe alternative considered (tool-specific filtering) and rejection reasoning
- Add future work section on joint tool optimization
- Present two architectural approaches: separate proposer vs extending ReAct proposer
- Include implementation details, benefits, challenges, and decision rationale
* docs(gepa): add tool optimization links to overview and parameter docs
- Add Tool Description Optimization section to GEPA overview.md with link to advanced guide
- Add documentation link to optimize_tool_descriptions parameter in gepa.py
- Addresses reviewer feedback to make tool optimization more discoverable
* docs(gepa): refine tool optimization scenarios and remove implementation details
- Restructure 'When to Use' as numbered list (1-5) per reviewer feedback
- Move section after implementation details for better flow
- Remove tool: prefix implementation detail from component identification
- Explain tool discovery via ReAct modules in user-friendly terms
- Add custom proposer compatibility clarification
- Address optional PR feedback items (11 & 13)
* docs(gepa): clarify future work section in code comments
- Add note that proposed architecture details may change
- Expand challenges with counterpoints and questions
- Mark implementation notes as optional to avoid overengineering
* refactor(gepa): unify ReAct optimization as single module
Treat ReAct as ONE unified module containing react predictor, extract
predictor, and tools as subcomponents - respecting both GEPA's module-level
optimization abstraction and DSPy's ReAct module design.
Before:
- Tools optimized separately from react/extract (multiple components)
- Each component had separate reflective dataset (3x redundant trajectories)
- Violated DSPy's ReAct abstraction (tools are subcomponents, not peers)
After:
- ReAct module optimized as single "react_module" component
- Joint optimization of react instruction + extract instruction + tool descriptions
- One reflective dataset per ReAct execution (no redundant trajectories)
- Respects GEPA's dict[str, str] contract (JSON config as string value)
Architecture:
- ReActModuleProposer: Handles entire ReAct module optimization
- Dynamic signature generation: Creates output fields for each tool/parameter
- Optional fields: Extract, tool descriptions, tool args (only improve what needs fixing)
- JSON config: {"react": "...", "extract": "...", "tools": {...}}
Benefits:
- Eliminates duplicate trajectories (addresses gepa#97)
- Coherent improvements (LM sees how components work together)
- Respects both GEPA and DSPy abstractions
- Enables cold-start optimization (tool args always available based on schema)
* test(gepa): add end-to-end ReAct module optimization test
Adds comprehensive test proving GEPA can optimize ReAct modules end-to-end:
- Baseline with minimal tool descriptions achieves 0% accuracy
- After optimization, achieves 100% accuracy
- Tests unified ReAct architecture (react + extract + tools as one module)
Key features:
- Uses stable SHA256 hashing for deterministic fixture replay
- Avoids Python's PYTHONHASHSEED randomization issues
- 189KB fixture with security check passed (no API keys/tokens)
- Verifies all components are optimized (react, extract, tool descriptions)
* fix(gepa): enable arg description optimization for ReAct tools
* chore: remove legacy test_gepa_tool_optimization.py
This test file was for the old architecture where tools were optimized
separately from ReAct modules. With the unified ReAct optimization approach,
this test is replaced by test_gepa_react_optimization.py which tests the
new architecture where ReAct modules (react + extract + tools) are optimized
as a single unified component.
* fix: restore accidentally removed score mismatch warning
* test: update fixture after arg description optimization fix
Regenerates fixture to match commit 3418b59 which changed how
tool arg descriptions are optimized. Reduces LM calls from 26→22
by improving the optimization process efficiency.
* fix(test): use JSON-based hashing for cross-version fixture stability
- Replace repr()-based hashing with json.dumps(sort_keys=True)
- Fixes CI failures caused by Python version differences (3.12.9 vs 3.12.11)
- repr() formatting can differ between Python micro versions
- JSON spec is standardized and stable across all versions
- Regenerate fixture with new hashing approach
* refactor(gepa): rename optimize_tool_descriptions to optimize_react_components
- Rename parameter to better reflect that we optimize all ReAct components
- Components include: react instructions, extract instructions, tool descriptions, and tool argument descriptions
- Update all code references, tests, and documentation
- No functional changes, pure rename for clarity
* docs(gepa): improve 'What is optimize_react_components?' section
- Clarify that specialized optimization applies only to dspy.ReAct modules
- Explain ReAct module structure (react predictor, extract predictor, tools)
- List all 4 optimizable components with clear descriptions
- Specify react instruction always optimized, others optional based on failures
- Simplify language: 'contradict' vs 'work together' instead of complex terms
- Add link to ReAct documentation for deeper dive
* docs(gepa): replace outdated tool-specific prompt with actual ReAct optimization prompt
- Rename section: 'Tool-Specific Reflection Prompt' → 'ReAct Optimization Prompt'
- Replace GenerateImprovedToolDescriptionFromFeedback (doesn't exist) with GenerateImprovedReActDescriptionsFromFeedback (actual implementation)
- Show that prompt receives ALL components (react, extract, tools) and optimizes jointly
- Update metric example: tool_feedback_metric → react_metric for clarity
- Remove outdated notes about tool-specific prefixes and component_selector behavior
- Clarify that tool descriptions/args are added dynamically via signature.append()
* docs(gepa): simplify 'How It Works' section with accurate routing behavior
* docs(gepa): remove outdated Implementation Details section
* docs(gepa): replace theoretical scenarios with real user pain points
* docs(gepa): fix usage examples reference to match updated scenarios
* docs(gepa): update inspect section to show all 4 ReAct components with correct syntax
* docs(gepa): rewrite Section 8 with accurate custom proposer behavior for ReAct
- Clarify custom proposer receives ALL components (regular + ReAct)
- Add realistic signature with ReAct failure patterns and component types
- Use exact naming from implementation: examples_with_feedback, component_reflective_data, propose_instruction
- Show _format_examples() helper matching real markdown formatting
- Remove regular component handling to keep example focused on ReAct
- Test code example validates successfully
- Fix contradiction: optimize_react_components must be True (not irrelevant)
docs(gepa): clarify custom proposer behavior in routing section
Change 'overrides the default routing' to 'receives all components and handles the optimization logic' to avoid confusion with optimize_react_components which still controls discovery/serialization
docs(gepa): remove discouraging recommendation from custom proposer section
Users reading this section want to learn how to implement custom proposers for ReAct - don't discourage them from doing so
* fix(gepa): fix top-level ReAct module lookup and remove tool name sanitization
- Fix ReAct module lookup to handle top-level modules correctly
Previously failed to match 'self' path for top-level ReAct instances
- Remove tool name sanitization entirely
Tool names are now used as-is in dynamic signatures
Removed _sanitize_name() function and all calls to it
Simplifies code and avoids surprising behavior
- Skip failing test_gepa_react_optimization
Hash-based fixtures are fragile across Python versions
- Add debug logging to trace processing for troubleshooting
* refactor(gepa): unify ReAct module key handling and use constant
- Replace all magic string 'react_module' with REACT_MODULE_PREFIX constant
- Unify path normalization pattern across gepa.py and gepa_utils.py
- Rename 'prefix' to 'normalized_path' for clarity
- Simplify module lookup by using consistent normalization
- Remove awkward OR clause in ReAct module matching logic
This makes the codebase more maintainable with a single source of truth
for the module prefix and consistent naming throughout.
* test(gepa): add ReAct module detection tests for nested structures
- Add 3 comprehensive detection tests: single ReAct, mixed workflow (2 ReAct + ChainOfThought), orchestrator with 2 workers
- Tests validate full path preservation (bug fix validation)
- Uses monkey patching to capture base_program from gepa.optimize
- Helper functions for DRY: setup spy, create optimizer, assert detection
- Validates all ReAct components: react, extract, tools, tool metadata
* test(gepa): add comprehensive ReAct detection and reconstruction tests
Detection tests (3):
- test_single_react_module_detection: top-level ReAct module
- test_multi_react_workflow_detection: mixed ReAct + ChainOfThought (bug fix validation)
- test_nested_react_orchestrator_worker_detection: orchestrator with 2 workers as tools
Reconstruction tests (3):
- test_build_program_single_react: single ReAct module
- test_build_program_multi_react_workflow: mixed workflow with ReAct + non-ReAct
- test_build_program_orchestrator_with_workers: complex nested structure
Helper functions (12):
- setup_spy_for_base_program: captures base_program from gepa.optimize
- simple_metric_for_detection/reconstruction: test metrics
- create_gepa_optimizer_for_detection: creates optimizer
- assert_react_module_detected/updated: validates ReAct modules
- assert_regular_module_detected/updated: validates non-ReAct modules
- mock_optimized_react_module: mocks optimized candidate
- create_*_program: 3 reusable program builders
Validates:
- Full path preservation (bug fix)
- All 4 ReAct components (react, extract, tools, arg_desc)
- Non-ReAct module handling
- Deepcopy verification (original unchanged)
- Both detection and reconstruction phases
* test(gepa): add reflective dataset tests for multi-agent trajectory validation
Adds 2 new tests validating make_reflective_dataset captures complete trajectories:
- test_make_reflective_dataset_single_react: Single ReAct module
- test_make_reflective_dataset_orchestrator_with_workers: Multi-agent system (3 modules)
New helpers:
- simple_feedback: Reusable feedback function (consolidates 5 duplicates)
- assert_reflective_example_has_trajectory: Validates trajectory completeness
Tests validate:
- Complete trajectory capture (all iterations with thoughts/tools/observations)
- No duplicate/missing iterations
- Full path preservation in multi-agent systems
- Each module's trajectory captured separately
Improvements:
- Clean up docstrings and remove redundant comments
- Fix whitespace linter warnings (9 auto-fixed)
- Reduce from 1054 to 975 lines
All 8 tests passing (6 detection/reconstruction + 2 new reflective dataset)
* test(gepa): verify tool arg descriptions propagate to args schema
- Update assert_react_module_updated to check tool.args['param']['description']
- Add arg_desc to test cases for comprehensive validation
- Expose bug: GEPA updates arg_desc but not tool.args (what renders in prompts)
* fix(gepa): propagate arg_desc updates to tool.args for prompt rendering
tool.arg_desc is only used during Tool.__init__; updating it after creation
has no effect on prompts. Only tool.args is rendered, so GEPA must update
args for optimized descriptions to appear in prompts.
Fixes the bug where reflection LM improves tool parameter descriptions but
they don't show in actual prompts because arg_desc changes weren't propagated
to the args schema.
* test(gepa): remove fixture-based test and unused dependencies
* test(gepa): remove unused fixture file
* style: fix ruff linting issues (import formatting, whitespace, bare except)
* refactor(test): rename setup_spy_for_base_program to setup_capture_for_base_program for clarity
* docs(gepa): clarify why Tool.func uses placeholder lambda in proposer
* refactor(gepa): make all ReAct components optional with None default for selective optimization
* docs(gepa): clarify 'LM' as 'reflection LM' in comments for precision
* refactor(gepa): refine reflection prompt to guide concise, focused ReAct component optimization
Update the ReAct proposer's reflection signature to guide the LM toward more
appropriate output granularity and selective optimization.
Changes:
- Add context that components are progressively optimized across iterations
- Change 'and' to 'and/or' for abstraction/specificity (allows flexibility)
- Refine field descriptions to guide output style:
* 'ReAct instruction for reasoning and tool selection' (functional context)
* 'Extract instruction for answer extraction' (functional context)
* 'Purpose of tool' (focuses on high-level what/why, not verbose how)
* 'Usage of parameter' (focuses on specific usage, not essay)
The goal is to prevent overly verbose LM outputs (multi-paragraph tool/param
descriptions) while preserving exploration capability. Field descriptions now
provide functional context ('for reasoning', 'purpose', 'usage') that naturally
guides appropriate scope without being prescriptive about format or length.
This allows the reflection LM to determine the right level of detail based on
what's needed to fix failures, aligned with GEPA's general meta-prompt philosophy.
* docs(gepa): revise ReAct metric example to be general and extensible
Replace prescriptive 'minimize tool calls' example with educational progression
that shows users how to write effective metrics without forcing specific objectives.
Changes:
- Show simple metric first (just correctness feedback)
- Then show trajectory-based metric (accessing agent execution)
- Use clear for-loop instead of list comprehension for readability
- Follow DSPy docs conventions: answer_match variable, example/pred naming
- Remove 'minimize tool calls' directive - let users decide their objectives
- Add bullet points explaining what trajectory can reveal (tool selection,
reasoning quality, efficiency) without prescribing how to use it
- Rename section to 'Writing Metrics for ReAct Optimization' (more actionable)
This aligns with GEPA's philosophy: provide general, extensible patterns that
users can adapt to their specific needs. Detailed examples can be shown in
tutorials rather than API documentation.
Addresses PR review comment 5 about prescriptive objectives in documentation.
* docs(gepa): replace custom proposer example with reference to ReActModuleProposer
Address PR review comment 6 by simplifying the custom proposer documentation.
Changes:
- Replace long inline implementation example with clickable GitHub link
- Point to ReActModuleProposer as reference implementation
- Add bulleted list of what the reference shows (parsing, dynamic signatures, etc.)
- Keep essential JSON structure and interface documentation
- Remove 100+ lines of redundant code example
Benefits:
- Less overwhelming for users (no duplicate code)
- Single source of truth (reference implementation)
- Clickable link to actual working code on GitHub
- Users can copy/modify real implementation instead of example
Addresses PR comment from @LakshyAAAgrawal about using reference instead
of full implementation example.
* docs(gepa): make custom proposer section more approachable and clear
Improve the custom proposer documentation to be more user-friendly while
maintaining technical accuracy.
Changes:
- Warmer, more inviting opening ("best way to start")
- Concrete example with 'search' tool instead of generic placeholders
- Plain English explanations for each component ("How the agent reasons...")
- Clear separation: "What you can improve" vs "What to preserve"
- Simpler code example with inline comments explaining ReAct vs regular
- Concise "reference shows how to" bullets (3 key points)
- More approachable tone without sacrificing precision
This makes the advanced feature more accessible to users who need custom
optimization logic beyond the defaults.
Follows up on the previous commit addressing PR comment about custom proposer example.
* docs(gepa): update ReAct reflection prompt to match current implementation
Sync documentation with actual reflection prompt after bd4cdac:
- Add 'These components are progressively optimized' context
- Change to 'and/or specificity' for flexibility
- Update output field types to 'str | None' with default=None
- Refine field descriptions ('for reasoning and tool selection', 'for answer extraction')
- Add note about dynamic field descriptions ('Purpose of tool', 'Usage of parameter')
This ensures docs accurately reflect the current prompt design that guides
appropriate granularity without being prescriptive.
* feat(gepa): warn when ReAct modules detected but optimization disabled
Add warning message when GEPA detects ReAct modules in the program but
optimize_react_components=False. This helps users discover the ReAct
optimization feature.
Changes:
- Always traverse modules to detect ReAct instances
- If optimize_react_components=False, warn for each ReAct module found
- Shows module path to help users identify what would be optimized
- No behavioral changes when optimize_react_components=True
Addresses maintainer feedback to make the feature more discoverable.
* test(gepa): fix DummyLM configuration and remove exception swallowing
- Configure DummyLM with proper ReAct response format (next_thought, next_tool_name, next_tool_args)
- Remove try/except blocks that silently swallowed exceptions
- Add explanatory comments for why compile should now succeed
- Increase DummyLM repetitions (10→20) to support GEPA iterations
Addresses review feedback from @LakshyAAAgrawal requesting removal of
unexplained exception handling that masked real bugs.
All 8 tests now pass deterministically without silent failures.
* test(gepa): add failing tests for generic tool optimization
- Add 4 core tests for tool optimization beyond ReAct
- test_detect_single_tool: single Tool input field
- test_detect_tool_list: multiple tools with ordering
- test_skip_predictor_without_tools: negative case (passing)
- test_update_tool_and_predictor: reconstruction path
Tests use class-based signatures (required for type detection).
Currently failing (TDD approach) - implementation next.
* refactor(gepa): rename optimize_react_components to enable_tool_optimization
Rename flag to reflect generalization beyond ReAct modules:
- optimize_react_components → enable_tool_optimization
- Update documentation to mention custom predictors using dspy.Tool
- Update warning message to use new flag name
This prepares for upcoming feature: generic tool optimization for any
predictor using dspy.Tool, not just dspy.ReAct modules.
* refactor(gepa): extract nested function to private method
Move build_propose_new_texts() from nested function in __init__ to
_build_propose_new_texts() private method per maintainer feedback.
Also simplify LM context handling by using unified context manager
pattern instead of if/else branching (18 lines → 6 lines).
Changes:
- Extract _build_propose_new_texts() as private class method
- Simplify LM context: use 'with dspy.context(lm=self.reflection_lm or dspy.settings.lm)'
- Clean up __init__ (110+ lines nested function → 1 line method call)
Benefits:
- Cleaner class structure (easier to scan __init__)
- Methods testable in isolation
- Reduced code duplication (-26 lines net)
- Addresses maintainer feedback: 'move helper function out as private method'
* feat(gepa): detect tool-using predictors via type checking
- Add type-based detection for predictors using dspy.Tool
- Initialize tool-using predictors with JSON structure
- Add inline helper function is_tool_field() for recursive type checking
- Handle Union/Optional types containing Tool
- Enable generic tool optimization beyond dspy.ReAct
* test(gepa): update ReAct tests for predictor-name-based keys
- Move inline imports to top of file
- Rename module_path → predictor_name for clarity
- Update all assertions to use full predictor names (e.g., extract.predict)
- Update feedback_map keys to match predictor names
- Simplify multi-agent test assertions (20+ lines → 10 lines)
All 8 ReAct optimization tests now passing with new key structure.
* test(gepa): use explicit predictor keys in tool optimization tests
- Replace unpacking pattern with explicit predictor names
- Remove duplicate inline imports (already at top)
- Use TOOL_MODULE_PREFIX:pred consistently across tests
- Improve test docstrings for clarity
All 3 tool tests still passing (1 skipped intentionally).
* feat(gepa): extract tools from runtime traces
Runtime tool discovery:
- Import Tool type for isinstance() checks
- Initialize tools_by_predictor dict to collect unique tools
- Add extract_tools_from_value() recursive helper function
- Extract tools from predictor trace inputs during iteration
- Handle single Tool, list[Tool], dict[str, Tool] structures
- Serialize tools to candidate JSON after all traces processed
Implements runtime tool discovery (Change 2).
Captures dynamically injected tools from actual usage patterns.
* feat(gepa): detect tool-using predictors at compile time
- Import TOOL_MODULE_PREFIX constant
- Detect predictors with dspy.Tool input fields
- Create prefixed keys: tool_module:{predictor_name}
- Use actual predictor name as JSON config key
Pairs with tool extraction (fe19dac). Together they implement
compile-time detection + runtime extraction for generic tool modules.
* refactor(gepa): use predictor identity for ReAct detection
- Find extract/react predictors by object identity (not paths)
- Use actual predictor names as JSON config keys
- Module key uses extract_predictor_name for consistency
- Clearer comments about dynamic predictor names
More robust than path-based matching. Config keys are now actual
predictor names (e.g., "multi_agent.react", "multi_agent.extract.predict")
instead of generic "react"/"extract".
* test(gepa): refactor ReAct tests to use dynamic predictor names
- Add get_predictor_name() helper using object identity
- Remove all hardcoded predictor name strings
- Update mock_optimized_react_module() to accept react_module parameter
- Use expected_* naming convention for clarity
- All 11 tests passing with fully dynamic approach
* refactor(gepa): generalize proposer to support both ReAct and tool modules
- Rename ReActModuleProposer → ToolModuleProposer
- Rename signature to GenerateImprovedToolModuleDescriptionsFromFeedback
- Make base signature generic (current_predictor_instruction)
- Dynamically add extract fields only for ReAct modules
- Use prefix checks (REACT_MODULE_PREFIX) for reliable type detection
- Support both 1-predictor (tool) and 2-predictor (ReAct) modules
- Update routing to handle both TOOL_MODULE_PREFIX and REACT_MODULE_PREFIX
- Clean variable names: primary_predictor_key, extract_predictor_key
- Update all docstrings to reflect tool-using modules (not just ReAct)
* refactor(gepa): eliminate create-delete pattern in base_program build
- Process ReAct modules first, then individual predictors
- Skip predictors already part of module configs (check inside JSON)
- Remove redundant base_program.pop() calls
- No duplicate enable_tool_optimization checks
* refactor(gepa): eliminate ReAct coupling in build_program
Replace ReAct-specific logic with generic approach:
Before:
- isinstance(ReAct) checks
- Direct access to module.react/module.extract/module.tools
- Separate if/elif branches for instruction updates
After:
- Program-level __dict__ traversal to find tools
- Unified aggregation: plain strings → module config overrides
- Single application loop (no duplication)
Why __dict__ traversal:
Tools can be declared as single attributes (self.tool), lists
(self.tools=[...]), or dicts (self.tools={...}), and nested in
any dspy.Module. Traversing __dict__ finds all tools regardless
of how they're structured, without coupling to specific module types.
This makes the code resilient to ReAct internal changes and works
for any module using dspy.Tool.
* refactor(gepa): apply code cleanup principles consistently
- Use tuple syntax for startswith() (more Pythonic)
- Remove unnecessary try-except for JSON parsing (we control the source)
These follow the same principles applied in build_program refactor.
* refactor(gepa): unify config extraction patterns
- Use isinstance(v, str) for predictor filtering (type-based)
- Use .get("tools", {}) for tools extraction (more Pythonic)
Both changes make the code more consistent and resilient to
config structure changes.
* refactor(gepa): remove verbose logs and consolidate comments
Remove ~25 debug/info logs per maintainer feedback:
- Internal routing/processing logs
- Trace processing details
- Reflective example breakdowns
- Config building verbosity
Consolidate multi-line comments into concise single lines while
preserving important context (WHY, not WHAT).
* docs(gepa): clarify ReAct trace workaround with TODO
Document that this is a workaround for ReAct's multiple predictor
calls with partial trajectories. After PR #8999 merges, we should
test if we can remove this and use extract predictor trace directly.
* test(gepa): remove deprecated ReAct-specific tests and refactor tool optimization tests
* feat(gepa): add assertion for ReAct two-predictor design
Fail fast with clear error if DSPy's ReAct design changes (missing extract.predict).
Better than silently skipping broken modules.
* test(gepa): add DSPy ReAct design docs and improve test consistency
- Add header note documenting DSPy's two-predictor ReAct design
- Remove test_react_trace_aggregation (was testing DSPy internals)
- Move test tool fixtures to top for reuse
- Fix test_selective_optimization style:
- Simplify docstring to one-liner
- Remove verbose inline comments
- Fix assertion to use program.tools reference (clearer)
- Add consistent GEPA iteration comments
* fix(test): remove trailing whitespace and extra blank lines
* refactor(gepa): clarify tool proposer output field descriptions
* refactor(gepa): treat args as canonical for tool arg descriptions
* refactor(gepa): tolerate missing arg descriptions when applying tool configs
* refactor(gepa): use args as sole source of tool arg descriptions
* test(gepa): drop arg_desc expectations from tool optimization tests
* refactor(gepa): refine reflection prompts for tool optimization
Improve instructions for the reflection LM to focus on reinforcing successful patterns and providing progressively optimized updates for predictor instructions and tool descriptions.
* refactor(gepa): improve tool extraction robustness and observability
Move tool extraction logic to evaluate() loop for immediate capture. Fix overwrite risk by merging discovered tools with existing config. Improve logging and docstrings for better maintainability.
* refactor(gepa): simplify initialization logic
Move helper function outside loop and simplify predictor deduplication check by validating keys before parsing JSON.
* refactor(gepa): remove ReAct trace workaround
Use standard trace selection logic (prioritizing failures) for all modules including ReAct. The extractor logic workaround is no longer needed as we handle aggregated duplicates differently.
* chore(gepa): clean up whitespace and style changes from tool optimization PR
* chore(gepa): clean up whitespace and style changes from tool optimization PR
* chore: restore .gitignore to match main
* docs(gepa): document tool optimization flag in overview
* docs(gepa): clarify enable_tool_optimization and custom proposers
* docs(gepa): update tool module optimization prompt to match actual code
* docs(gepa): update How Tool Optimization Works section
* docs(gepa): update When to Use Tool Optimization section
* docs(gepa): update custom proposers section for tool optimization
* docs(gepa): update usage examples with correct tool patterns and interfaces
* docs(gepa): remove redundant metrics section
* refactor(gepa): use absolute import for ToolModuleProposer
* docs(gepa): update tool optimization doc link
* docs(gepa): replace eval() example with get_weather tool
* fix(gepa): change ReAct detection log from warning to info
* refactor(gepa): extract _propose_component_texts as private method
* refactor(gepa): TODO out generic tool module optimization, keep ReAct only
* refactor(gepa): remove generic tool module detection, keep ReAct only
* refactor(gepa): improve naming and extract tool update methods
* refactor(gepa): remove unused TOOL_MODULE_PREFIX and rename to tool_components
* refactor(gepa): rename ToolModuleProposer to ToolProposer
* docs(gepa): update tool optimization docs for ReAct-only support
* refactor(gepa): unify prefix to TOOL_MODULE_PREFIX for all tool-using modules
- Rename REACT_MODULE_PREFIX to TOOL_MODULE_PREFIX
- Single abstraction for tool modules (ReAct now, generic later)
- Use count-based detection for extract predictor instead of prefix check
- Update docs to reflect new naming
* docs(gepa): remove CustomAgent example, keep ReAct only
* docs(gepa): update enable_tool_optimization docstring for ReAct-only support
* test(gepa): remove generic tool tests, keep ReAct-only tests
- Remove test_detect_single_tool, test_detect_multiple_tools
- Remove test_apply_optimized_tool_descriptions
- Update REACT_MODULE_PREFIX -> TOOL_MODULE_PREFIX
- Update docstring to reflect ReAct-only support
* refactor(gepa): use local ToolProposer variable, update docs for ReAct-only
- Remove self._tool_proposer instance variable
- Create ToolProposer locally when needed (stateless)
- Update overview.md to say ReAct-only instead of 'any module'
* docs(gepa): update tool optimization docs for ReAct-only support
- Remove generic tool module references, keep ReAct only
- Update JSON structure examples to show both react and extract predictors
- Fix comment in custom proposer example
* some fixes
---------
Co-authored-by: chenmoneygithub <chen.qian@databricks.com>1 parent ed01c88 commit b6115d4
File tree
6 files changed
+1055
-136
lines changed- docs/docs/api/optimizers/GEPA
- dspy/teleprompt/gepa
- tests/teleprompt
6 files changed
+1055
-136
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
443 | 443 | | |
444 | 444 | | |
445 | 445 | | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
120 | 126 | | |
121 | 127 | | |
122 | 128 | | |
| |||
0 commit comments