Skip to content

Infinite tool call loop: HuggingFaceModel and text-generation-inference #3318

@baughmann

Description

@baughmann

Description

Hello. Needless to say, amazing library. Please let me know if you'd like me to try something or if you need more info.

I've been going through various local model providers trying to find one that works well, when I cam across a rather shocking bug when running against Huggingface's TGI model host.

The problem appears whether using the OpenAI "compatible" endpoints or the HuggingfaceModel with custom AsyncInferenceClient and HuggingFaceProvider. The latter probably being the official approach, the code included here will be using that.

System Info

curl 127.0.0.1:8080/info | jq:

{
  "model_id": "/models/meta-llama/Meta-Llama-3-8B-Instruct",
  "model_sha": null,
  "model_pipeline_tag": null,
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_tokens": 8191,
  "max_total_tokens": 8192,
  "validation_workers": 2,
  "max_client_batch_size": 4,
  "router": "text-generation-router",
  "version": "3.3.4-dev0",
  "sha": "9f38d9305168f4b47c8c46b573f5b2c07881281d",
  "docker_label": "sha-9f38d93"
}

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
| 40%   54C    P2             61W /  450W |   21499MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:48:00.0 Off |                  Off |
| 30%   43C    P2             52W /  450W |   21394MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Setup

Here's the docker-compose.yaml I'm using to start TGI:

services:
  text-generation-inference:
    image: ghcr.io/huggingface/text-generation-inference:latest
    container_name: tgi
    ports:
      - "8081:80"
    volumes:
      - ../../../models:/models:ro
      - tgi-data:/data
    environment:
      - RUST_LOG=info
    # I have also tested with 3.1-8B and 3.2-3B with the same end results
    command: >
      --model-id /models/meta-llama/Meta-Llama-3-8B-Instruct
      --hostname 0.0.0.0
      --port 80
      --trust-remote-code
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0", "1"]
              capabilities: [gpu]
    shm_size: "64g"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

volumes:
  tgi-data:
    driver: local

Code

All code is running in a Jupyter notebook.

Here's the common setup cell:

from huggingface_hub import AsyncInferenceClient
from pydantic_ai.models.huggingface import HuggingFaceModel
from pydantic_ai.providers.huggingface import HuggingFaceProvider
from pydantic_ai.providers.openai import OpenAIProvider

provider = OpenAIProvider(base_url="http://localhost:8081/v1") # Just used to get the model slug
models = await provider.client.models.list()

client = AsyncInferenceClient(base_url="http://localhost:8081/")

print(f"Connected to TGI. Available models: {len(models.data)}")
for model in models.data:
    print(f"  - {model.id}")

# Create the model instance
agent_model = HuggingFaceModel(
    models.data[0].id,
    provider=HuggingFaceProvider(hf_client=client, api_key="None"),
    # Annoyingly, despite this being basically the default profile, Llama 3's tool calls often fall through to the response without this
    profile=ModelProfile(
        supports_tools=True,
        json_schema_transformer=InlineDefsJsonSchemaTransformer
    )
)

Working: Basic requests and history

  1. Create the basic agent
from pydantic_ai import Agent

simple_agent = Agent(model=agent_model)
  1. Make a simple request
simple_result = await simple_agent.run("Tell me a joke.")

simple_result.output # "Why couldn't the bicycle stand up by itself?\n\nBecause it was two-tired!"
  1. Test including previous messages in another simple request
simple_result_2 = await simple_agent.run( message_history=simple_result.all_messages()

simple_result_2.output # 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his field! (get it?)'

Not working (or sometimes "working" with like 20 tool calls)

  1. Create the agent and a basic function
from pydantic_ai import Tool
from pydantic_ai.toolsets import FunctionToolset
from datetime import datetime

# Create a simple tool
@Tool
async def get_current_date() -> str:
    """Get the current date.

    Returns:
        str: The current date in YYYY-MM-DD format.
    """
    return datetime.now().strftime("%Y-%m-%d")

# Create an agent with the simple tool
tool_agent = Agent(model=agent_model, tools=[get_current_date])
  1. Make a simple request that should use the tool call
tool_result = await tool_agent.run("What is the current date?")

tool_result.output # 'I apologize for the repetition! According to my system clock, the current date is indeed August 31st, 2025.'
  1. Hmm. 8 seconds for that request? Let's inspect the messages
for message in tool_result.all_messages():
    print(message)

Which yields something like:

ModelRequest(parts=[UserPromptPart(content='What is the current date?', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 70324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=175, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 288467, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=219, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 505643, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=262, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 674762, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=305, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 851700, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=348, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 65279, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=391, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 286718, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=434, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 480682, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=477, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 696462, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=520, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 907846, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=563, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 152962, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=606, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 337485, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=649, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 528383, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=692, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 760306, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=735, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 995073, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=778, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 186872, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=821, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 426914, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=864, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 653267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=907, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 877281, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=950, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 124358, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=993, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 319587, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1036, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 517817, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1079, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 709416, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1122, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 946267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1165, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 183936, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1208, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 389117, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1251, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 621889, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1294, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 847334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1337, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 39434, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1380, output_tokens=19), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 300561, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1423, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 536096, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1466, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 752334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1509, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 941799, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1552, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 174612, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1595, output_tokens=14), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 387760, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1638, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 587324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[TextPart(content='I apologize for the repetition! According to my system clock, the current date is indeed August 31st, 2025.')], usage=RequestUsage(input_tokens=1521, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')

35 tool calls!

Here's a log from TGI from one of the calls

INFO chat_completions{parameters="GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: true, max_new_tokens: None, return_full_text: None, stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: Some(Json(Object {\"$functions\": Object {\"get_current_date\": Object {\"description\": String(\"<summary>Get the current date.</summary>\\n<returns>\\n<type>str</type>\\n<description>The current date in YYYY-MM-DD format.</description>\\n</returns>\"), \"additionalProperties\": Bool(false), \"properties\": Object {\"_name\": Object {\"type\": String(\"string\"), \"const\": String(\"get_current_date\")}}, \"required\": Array [String(\"_name\")]}, \"no_tool\": Object {\"description\": String(\"Open ended response with no specific tool selected\"), \"additionalProperties\": Bool(false), \"properties\": Object {\"_name\": Object {\"type\": String(\"string\"), \"const\": String(\"no_tool\")}}, \"required\": Array [String(\"_name\")]}}, \"properties\": Object {\"function\": Object {\"anyOf\": Array [Object {\"$ref\": String(\"#/$functions/get_current_date\")}, Object {\"$ref\": String(\"#/$functions/no_tool\")}]}}})), adapter_id: Some(\"/models/meta-llama/Meta-Llama-3-8B-Instruct\") }" total_time="180.268942ms" validation_time="1.161794ms" queue_time="46.08µs" inference_time="179.061248ms" time_per_token="14.92177ms" seed="Some(6476155871046790452)" total_time="349.189932ms" validation_time="948.707µs" queue_time="38.419µs" inference_time="348.202936ms" time_per_token="12.896405ms" seed="Some(5246360728990037330)"}: text_generation_router::server: router/src/server.rs:432: Success

Expected behavior

I'd understand if it failed to call the tool, but getting the current date 35 times is a bit much! Ideally, the HuggingfaceModel would work with TGI and tool calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions