Skip to content

Conversation

@challenger71498
Copy link

@challenger71498 challenger71498 commented Dec 26, 2025

This PR fixes Session cleanup is being failed, caused by task cancellation via __aexit__ not handled properly.

Motivation and Context

Session cleanups the receive streams via finally clause of BaseSession::_receive_loop.

  • When the session is being closed, or leaving the async context(which calls __aexit__), it cancels the TaskGroup.
  • When a TaskGroup is being cancelled, it cancels every task children.
  • The parent of task we are using in finally clause, such as MemoryObjectStream::send is its base coroutine _receive_loop, IS the TaskGroup which is being closed.
  • Therefore the cleanup process is being cancelled while closing, leaving all receive streams not being closed, causing stream leak.

I fixed this problem by adding CancelScope with shield=True on finally clause, ensures the cleanup logic to be executed properly even TaskGroup cancel is requested.

Also, while cleaning up receive streams, we loop streams, therefore stream dictionary must not change.

  • If you change dictionary while looping, it causes runtime error, which should not happen in production.
  • BaseSession AS-IS does not protect dictionary change on cleanup.
  • To handle this minor issue, I added a simple flag to signal that the session is cleaning up the resources.

How Has This Been Tested?

I found this issue while debugging the problem, using ADK AgentTool with MCP causes an abtruse error:

RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

After a week of debugging, I realised the ultimate fix to this problem is quite huge and need fixes both MCP SDK and Google ADK, and this fix is one of them.

I added a simple test which triggers this issue.

  • In this test..
    • Server never responses
    • Creates the ClientSession and request 2 ping requests to server asynchronously
    • While 2 tasks are waiting (since the server never responses), leaves the async with block.
    • ClientSession is being closed
    • Assert that total count of response stream is equal to 0, not 2
    • Assert that the closed ClientSession fails to send ping
  • BaseSession AS-IS fails this test, response stream count is still 2, since the cleanup logic is being cancelled.

Breaking Changes

  • No breaking changes at all since this is just a minor bugfix.
  • There might be a side-effect, since we are now execute finally block until every stream is being closed, the __aexit__ will be blocked until finally block finishes its execution.
    • If the finally block is being executed too long, user might think it is hanging.
    • But I don't think this really be a problem, stream is running on memory not an actual network, also the amount of the stream won't be that huge, expecting ~1000, then it will be fine.
    • If it IS being a problem, we could just run closing logics in parallel. (I don't want that much change in this PR though.)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed No docs needed to be updated

Additional context

This PR MAY resolve issues below:

I think this problem is related to task leak (or indefinite-waiting) at Google ADK, since in ADK the tool call await forever until the receive stream responds.

Session must close _response_streams properly when exiting async with scope. Session AS-IS fails the test since the cleanup logic is being cancelled forcefully.
A cleanup logic should not be cancelled via CancelScope. Therefore 'shield=True' parameter is required.
Since we are using for-loop while cleaning up _response_streams dict, the dict must not be changed on cleanup. We could handle this issue easily by adding a simple flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant