Skip to content

Response-Based Fallback for FallbackModel #3640

@sarth6

Description

@sarth6

Description

Feature Request: Response-Based Fallback for FallbackModel

Summary

Add a fallback_on_response parameter to FallbackModel that allows fallback decisions based on inspecting the ModelResponse content, not just exceptions. This enables fallback when a model returns a successful HTTP response but the semantic content indicates failure (e.g., a builtin tool like web_fetch failed to retrieve a URL).

Motivation

Currently, FallbackModel only supports exception-based fallback via the fallback_on parameter:

fallback_on: Callable[[Exception], bool] | tuple[type[Exception], ...] = (ModelAPIError,)

This works well for:

  • API errors (rate limits, 5xx responses)
  • Network failures
  • Authentication issues

However, it cannot handle semantic failures where:

  • The model returns HTTP 200 (no exception raised)
  • But the response content indicates the operation failed

Real-World Example: WebFetchTool with Google Models

When using WebFetchTool with Google's Gemini models, the model may successfully return a response, but the BuiltinToolReturnPart indicates the URL fetch failed:

BuiltinToolReturnPart(
    tool_name='web_fetch',
    content=[{
        'uri': 'https://example.com',
        'url_retrieval_status': 'URL_RETRIEVAL_STATUS_FAILED'  # Not SUCCESS!
    }]
)

In this case, I want to fallback to Anthropic's model (which has different web fetching capabilities), but there's no exception to catch.

Current Workaround

Today, I have to implement manual fallback logic outside of FallbackModel:

class BrandURLExtractionAgent:
    def __init__(self):
        self._google_agent = Agent(model=google_model, builtin_tools=[WebFetchTool()])
        self._anthropic_agent = Agent(model=anthropic_model, builtin_tools=[WebFetchTool()])

    async def get_brand_summary(self, url: str) -> str | None:
        # Try Google first
        try:
            result = await self._google_agent.run(user_prompt=prompt)

            # Manual inspection of response parts
            if self._check_web_fetch_success(result.all_messages(), url):
                return result.output
            # Fall through to fallback...
        except Exception:
            pass

        # Manual fallback to Anthropic
        result = await self._anthropic_agent.run(user_prompt=prompt)
        return result.output

    def _check_web_fetch_success(self, messages: list[ModelMessage], url: str) -> bool:
        for message in messages:
            if isinstance(message, ModelRequest):
                for part in message.parts:
                    if isinstance(part, BuiltinToolReturnPart) and part.tool_name == "web_fetch":
                        # Check if URL was successfully retrieved...
                        pass
        return False

This is verbose, error-prone, and doesn't benefit from FallbackModel's clean abstraction.

Proposed API

Add a new fallback_on_response parameter to FallbackModel.__init__:

from collections.abc import Callable
from pydantic_ai.messages import ModelMessage, ModelResponse

class FallbackModel(Model):
    def __init__(
        self,
        default_model: Model | KnownModelName | str,
        *fallback_models: Model | KnownModelName | str,
        fallback_on: Callable[[Exception], bool] | tuple[type[Exception], ...] = (ModelAPIError,),
        # NEW PARAMETER:
        fallback_on_response: Callable[[ModelResponse, list[ModelMessage]], bool] | None = None,
    ):
        ...

Parameter Signature

fallback_on_response: Callable[[ModelResponse, list[ModelMessage]], bool] | None = None
  • response: ModelResponse - The model's response to inspect
  • messages: list[ModelMessage] - Full message history (needed because BuiltinToolReturnPart lives in ModelRequest, not ModelResponse)
  • Returns bool - True to trigger fallback, False to accept the response

Usage Example

from pydantic_ai.messages import (
    BuiltinToolReturnPart,
    ModelMessage,
    ModelRequest,
    ModelResponse,
)
from pydantic_ai.models.fallback import FallbackModel
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.models.anthropic import AnthropicModel


def web_fetch_failed(response: ModelResponse, messages: list[ModelMessage]) -> bool:
    """Return True if web_fetch tool failed to retrieve content."""
    for message in messages:
        if not isinstance(message, ModelRequest):
            continue
        for part in message.parts:
            if isinstance(part, BuiltinToolReturnPart) and part.tool_name == "web_fetch":
                content = part.content
                if isinstance(content, list):
                    for item in content:
                        status = item.get("url_retrieval_status")
                        if status and status != "URL_RETRIEVAL_STATUS_SUCCESS":
                            return True  # Trigger fallback
    return False  # Accept response


google_model = GoogleModel('gemini-2.0-flash')
anthropic_model = AnthropicModel('claude-3-5-haiku-latest')

fallback_model = FallbackModel(
    google_model,
    anthropic_model,
    fallback_on_response=web_fetch_failed,
)

agent = Agent(
    model=fallback_model,
    builtin_tools=[WebFetchTool()],
)

# Now if Google's web_fetch fails, automatically falls back to Anthropic!
result = await agent.run("Summarize https://example.com")

Implementation Notes

Changes to FallbackModel.request()

async def request(
    self,
    messages: list[ModelMessage],
    model_settings: ModelSettings | None,
    model_request_parameters: ModelRequestParameters,
) -> ModelResponse:
    exceptions: list[Exception] = []

    for model in self._models:
        try:
            response = await model.request(messages, model_settings, model_request_parameters)

            # NEW: Check response-based fallback condition
            if self._fallback_on_response is not None:
                if self._fallback_on_response(response, messages):
                    # Optionally log: "Fallback triggered by response inspection"
                    continue  # Try next model

            return response

        except Exception as e:
            if self._should_fallback(e):
                exceptions.append(e)
            else:
                raise

    raise FallbackExceptionGroup("All models failed", exceptions)

Streaming Considerations

For request_stream(), response-based fallback is more complex since the response arrives incrementally. Options:

  1. Don't support for streaming initially - Document that fallback_on_response only works with non-streaming requests
  2. Inspect after stream completes - Buffer the full response, then check (but this defeats the purpose of streaming for the fallback check)
  3. Support a streaming-aware callback - More complex API

I'd recommend starting with option 1 (non-streaming only) and expanding later if there's demand.

Alternatives Considered

Alternative 1: Extend fallback_on to accept response

fallback_on: Callable[[Exception | ModelResponse], bool] | ...

Rejected because: Mixing exception and response handling in one callback is confusing and breaks existing type signatures.

Alternative 2: Custom exception wrapping

Wrap response inspection failures in a custom exception that fallback_on can catch.

Rejected because: Requires users to raise synthetic exceptions, which is awkward and doesn't fit the "successful response with bad content" mental model.

Alternative 3: Output validator with retry

Use @agent.output_validator to raise ModelRetry.

Rejected because: This triggers retry with the same model, not fallback to a different model. Also, the inspection often needs to happen at the message/part level, not the final output level.

Additional Context

This pattern is useful beyond WebFetchTool. Other use cases:

  • Citation validation: Fallback if the model's response doesn't include expected citations
  • Tool call validation: Fallback if a required tool wasn't called
  • Content quality checks: Fallback if response is too short, contains refusals, or lacks required structure
  • Provider-specific quirks: Handle cases where one provider returns empty content while another handles the same prompt fine

Checklist

  • Add fallback_on_response parameter to FallbackModel.__init__
  • Implement response inspection in FallbackModel.request()
  • Add logging when response-based fallback is triggered
  • Document the feature in the FallbackModel docs
  • Add unit tests for response-based fallback
  • Document streaming limitation (if not supported initially)

If this is something you would accept, I'd be interested in contributing it :)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions