-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-142155: Fix infinite recursion in shutil.copytree on Windows junctions #142156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
|
||
| def copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, | ||
| ignore_dangling_symlinks=False, dirs_exist_ok=False): | ||
| ignore_dangling_symlinks=False, dirs_exist_ok=False, _seen=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I think adding a private parameter to a public function is a little odd. I see we added the _seen here because _copytree calls the public copytree, so we need to add it here too.
I wonder if we could just change _copytree to call itself recursively instead of adding this private parameter. This would also avoid triggering the audit recursively (should we do that?).
If we should trigger the audit recursively, maybe we could create another private function for the recursive logic and let _copytree call it, with a public wrapper that keeps the original function signature (without adding the _seen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It is better to have a private helper for that.
picnixz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to have a recursive guard, use a frozenset() as a default and update the frozenset every time. We do this in typing.py I think or maybe elsewhere (search for recursive_guard in Lib)
Lib/shutil.py
Outdated
| if _seen is None: | ||
| _seen = set() | ||
| src_st = os.stat(src) | ||
| src_id = (src_st.st_dev, src_st.st_ino) | ||
| if src_id in _seen: | ||
| raise Error([(src, dst, "Infinite recursion detected")]) | ||
| _seen = _seen.copy() | ||
| _seen.add(src_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is costly. We do a copy every time.
| except Exception as e: | ||
| # Skip if mklink is not available or fails for any reason | ||
| self.skipTest(f"Failed to create junction: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea. Can't we create a junction from Python directly? (cc @barneygale).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not addressed.
Lib/test/test_shutil.py
Outdated
| src_dir = os.path.join(base_dir, "Source") | ||
| junction_dir = os.path.join(src_dir, "Junction") | ||
| os.makedirs(junction_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use lower-case letters or is it not possible for junctions?
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
picnixz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address all changes before requesting a new review please, either by adding a comment if I was not clear or by changing the code.
| if _seen is None: | ||
| _seen = frozenset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why we do this. Why not just using a default argument? (it is no more mutable so it is fine)
| # Track visited directories to detect cycles (e.g., Windows junctions) | ||
| if _seen is None: | ||
| _seen = frozenset() | ||
| src_st = os.stat(src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call can fail and (can be) expensive. But I assume that we need it right?
|
|
||
| def copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, | ||
| ignore_dangling_symlinks=False, dirs_exist_ok=False): | ||
| ignore_dangling_symlinks=False, dirs_exist_ok=False, _seen=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It is better to have a private helper for that.
| os.makedirs(junction_dir) | ||
|
|
||
| # Create a junction pointing to its parent, creating a cycle | ||
| junction_target = os.path.dirname(junction_dir) # Points to Source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| junction_target = os.path.dirname(junction_dir) # Points to Source | |
| junction_target = os.path.dirname(junction_dir) |
| self.skipTest(f"Failed to create junction: {e}") | ||
|
|
||
| # Create destination directory | ||
| dst_dir = os.path.join(base_dir, "Dest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dest -> dest
| except Exception as e: | ||
| # Skip if mklink is not available or fails for any reason | ||
| self.skipTest(f"Failed to create junction: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not addressed.
Thanks for the detailed review! It's getting late in my time zone, so I'm going to get some rest now. I will go through all your comments and update the code first thing tomorrow. |
Summary
This PR fixes a critical crash (WinError 206 / WinError 1921) in
shutil.copytreeon Windows when encountering directory junctions that point to a parent directory (recursive cycles).The Problem
On Windows,
shutil.copytreecontains specific logic to force traversal into Directory Junctions by treating them as standard directories (is_symlinkis forced toFalse).However, unlike
os.walkorshutil.rmtree(after recent fixes),copytreelacked a cycle detection mechanism for these traversed junctions. This caused infinite recursion until the path exceeded the OS limit, resulting in an unhandledOSError(Crash) rather than a PythonRecursionError.The Fix
I implemented cycle detection using file system identity (
st_dev,st_ino), similar to how deepcopy handles recursion:_seenset parameter tocopytreeand its helper_copytree.(st.st_dev, st.st_ino)is already in_seen.shutil.Error("Infinite recursion detected")immediately, preventing the crash.Verification
test_copytree_recursive_junctiontoLib/test/test_shutil.py.WinError 206crash.shutil.Error: Infinite recursion detected.