Response-Based Fallback for FallbackModel

### Description

# Feature Request: Response-Based Fallback for FallbackModel

## Summary

Add a `fallback_on_response` parameter to `FallbackModel` that allows fallback decisions based on inspecting the `ModelResponse` content, not just exceptions. This enables fallback when a model returns a successful HTTP response but the semantic content indicates failure (e.g., a builtin tool like `web_fetch` failed to retrieve a URL).

## Motivation

Currently, `FallbackModel` only supports exception-based fallback via the `fallback_on` parameter:

```python
fallback_on: Callable[[Exception], bool] | tuple[type[Exception], ...] = (ModelAPIError,)
```

This works well for:
- API errors (rate limits, 5xx responses)
- Network failures
- Authentication issues

However, it **cannot handle semantic failures** where:
- The model returns HTTP 200 (no exception raised)
- But the response content indicates the operation failed

### Real-World Example: `WebFetchTool` with Google Models

When using `WebFetchTool` with Google's Gemini models, the model may successfully return a response, but the `BuiltinToolReturnPart` indicates the URL fetch failed:

```python
BuiltinToolReturnPart(
    tool_name='web_fetch',
    content=[{
        'uri': 'https://example.com',
        'url_retrieval_status': 'URL_RETRIEVAL_STATUS_FAILED'  # Not SUCCESS!
    }]
)
```

In this case, I want to fallback to Anthropic's model (which has different web fetching capabilities), but there's no exception to catch.

### Current Workaround

Today, I have to implement manual fallback logic outside of `FallbackModel`:

```python
class BrandURLExtractionAgent:
    def __init__(self):
        self._google_agent = Agent(model=google_model, builtin_tools=[WebFetchTool()])
        self._anthropic_agent = Agent(model=anthropic_model, builtin_tools=[WebFetchTool()])

    async def get_brand_summary(self, url: str) -> str | None:
        # Try Google first
        try:
            result = await self._google_agent.run(user_prompt=prompt)

            # Manual inspection of response parts
            if self._check_web_fetch_success(result.all_messages(), url):
                return result.output
            # Fall through to fallback...
        except Exception:
            pass

        # Manual fallback to Anthropic
        result = await self._anthropic_agent.run(user_prompt=prompt)
        return result.output

    def _check_web_fetch_success(self, messages: list[ModelMessage], url: str) -> bool:
        for message in messages:
            if isinstance(message, ModelRequest):
                for part in message.parts:
                    if isinstance(part, BuiltinToolReturnPart) and part.tool_name == "web_fetch":
                        # Check if URL was successfully retrieved...
                        pass
        return False
```

This is verbose, error-prone, and doesn't benefit from `FallbackModel`'s clean abstraction.

## Proposed API

Add a new `fallback_on_response` parameter to `FallbackModel.__init__`:

```python
from collections.abc import Callable
from pydantic_ai.messages import ModelMessage, ModelResponse

class FallbackModel(Model):
    def __init__(
        self,
        default_model: Model | KnownModelName | str,
        *fallback_models: Model | KnownModelName | str,
        fallback_on: Callable[[Exception], bool] | tuple[type[Exception], ...] = (ModelAPIError,),
        # NEW PARAMETER:
        fallback_on_response: Callable[[ModelResponse, list[ModelMessage]], bool] | None = None,
    ):
        ...
```

### Parameter Signature

```python
fallback_on_response: Callable[[ModelResponse, list[ModelMessage]], bool] | None = None
```

- **`response: ModelResponse`** - The model's response to inspect
- **`messages: list[ModelMessage]`** - Full message history (needed because `BuiltinToolReturnPart` lives in `ModelRequest`, not `ModelResponse`)
- **Returns `bool`** - `True` to trigger fallback, `False` to accept the response

### Usage Example

```python
from pydantic_ai.messages import (
    BuiltinToolReturnPart,
    ModelMessage,
    ModelRequest,
    ModelResponse,
)
from pydantic_ai.models.fallback import FallbackModel
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.models.anthropic import AnthropicModel


def web_fetch_failed(response: ModelResponse, messages: list[ModelMessage]) -> bool:
    """Return True if web_fetch tool failed to retrieve content."""
    for message in messages:
        if not isinstance(message, ModelRequest):
            continue
        for part in message.parts:
            if isinstance(part, BuiltinToolReturnPart) and part.tool_name == "web_fetch":
                content = part.content
                if isinstance(content, list):
                    for item in content:
                        status = item.get("url_retrieval_status")
                        if status and status != "URL_RETRIEVAL_STATUS_SUCCESS":
                            return True  # Trigger fallback
    return False  # Accept response


google_model = GoogleModel('gemini-2.0-flash')
anthropic_model = AnthropicModel('claude-3-5-haiku-latest')

fallback_model = FallbackModel(
    google_model,
    anthropic_model,
    fallback_on_response=web_fetch_failed,
)

agent = Agent(
    model=fallback_model,
    builtin_tools=[WebFetchTool()],
)

# Now if Google's web_fetch fails, automatically falls back to Anthropic!
result = await agent.run("Summarize https://example.com")
```

## Implementation Notes

### Changes to `FallbackModel.request()`

```python
async def request(
    self,
    messages: list[ModelMessage],
    model_settings: ModelSettings | None,
    model_request_parameters: ModelRequestParameters,
) -> ModelResponse:
    exceptions: list[Exception] = []

    for model in self._models:
        try:
            response = await model.request(messages, model_settings, model_request_parameters)

            # NEW: Check response-based fallback condition
            if self._fallback_on_response is not None:
                if self._fallback_on_response(response, messages):
                    # Optionally log: "Fallback triggered by response inspection"
                    continue  # Try next model

            return response

        except Exception as e:
            if self._should_fallback(e):
                exceptions.append(e)
            else:
                raise

    raise FallbackExceptionGroup("All models failed", exceptions)
```

### Streaming Considerations

For `request_stream()`, response-based fallback is more complex since the response arrives incrementally. Options:

1. **Don't support for streaming initially** - Document that `fallback_on_response` only works with non-streaming requests
2. **Inspect after stream completes** - Buffer the full response, then check (but this defeats the purpose of streaming for the fallback check)
3. **Support a streaming-aware callback** - More complex API

I'd recommend starting with option 1 (non-streaming only) and expanding later if there's demand.

## Alternatives Considered

### Alternative 1: Extend `fallback_on` to accept response

```python
fallback_on: Callable[[Exception | ModelResponse], bool] | ...
```

**Rejected because:** Mixing exception and response handling in one callback is confusing and breaks existing type signatures.

### Alternative 2: Custom exception wrapping

Wrap response inspection failures in a custom exception that `fallback_on` can catch.

**Rejected because:** Requires users to raise synthetic exceptions, which is awkward and doesn't fit the "successful response with bad content" mental model.

### Alternative 3: Output validator with retry

Use `@agent.output_validator` to raise `ModelRetry`.

**Rejected because:** This triggers retry with the *same* model, not fallback to a *different* model. Also, the inspection often needs to happen at the message/part level, not the final output level.

## Additional Context

This pattern is useful beyond `WebFetchTool`. Other use cases:

- **Citation validation**: Fallback if the model's response doesn't include expected citations
- **Tool call validation**: Fallback if a required tool wasn't called
- **Content quality checks**: Fallback if response is too short, contains refusals, or lacks required structure
- **Provider-specific quirks**: Handle cases where one provider returns empty content while another handles the same prompt fine

## Checklist

- [ ] Add `fallback_on_response` parameter to `FallbackModel.__init__`
- [ ] Implement response inspection in `FallbackModel.request()`
- [ ] Add logging when response-based fallback is triggered
- [ ] Document the feature in the FallbackModel docs
- [ ] Add unit tests for response-based fallback
- [ ] Document streaming limitation (if not supported initially)

If this is something you would accept, I'd be interested in contributing it :)

### References

- #516 - Original FallbackModel feature request
- #2837 - FallbackModel doesn't handle `UnexpectedModelBehavior` during response handling (related: response-level issues not caught)
- #2119 - Per-model settings for FallbackModel (resolved, shows precedent for extending FallbackModel API)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Response-Based Fallback for FallbackModel #3640

Description

Feature Request: Response-Based Fallback for FallbackModel

Summary

Motivation

Real-World Example: `WebFetchTool` with Google Models

Current Workaround

Proposed API

Parameter Signature

Usage Example

Implementation Notes

Changes to `FallbackModel.request()`

Streaming Considerations

Alternatives Considered

Alternative 1: Extend `fallback_on` to accept response

Alternative 2: Custom exception wrapping

Alternative 3: Output validator with retry

Additional Context

Checklist

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Response-Based Fallback for FallbackModel #3640

Description

Description

Feature Request: Response-Based Fallback for FallbackModel

Summary

Motivation

Real-World Example: WebFetchTool with Google Models

Current Workaround

Proposed API

Parameter Signature

Usage Example

Implementation Notes

Changes to FallbackModel.request()

Streaming Considerations

Alternatives Considered

Alternative 1: Extend fallback_on to accept response

Alternative 2: Custom exception wrapping

Alternative 3: Output validator with retry

Additional Context

Checklist

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Real-World Example: `WebFetchTool` with Google Models

Changes to `FallbackModel.request()`

Alternative 1: Extend `fallback_on` to accept response