-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-98896: Fix parsing issue in resource_tracker to allow shared memory names containing colons #138473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
13ca553 to
d973cf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Sorry for the delay in reviewing.
The fix looks good for colons, but a comment on the issue also mentioned newlines in filenames. Do you want to tackle that, or leave it to another PR?
|
Hi Petr,
Thanks for your email (and also the one you sent later) - it is very kind
of you.
I try to address, indeed, the newlines - by encoding and decoding the
shared_memory name. So a different approach than before. See the updated PR.
Regards,
Rani
…On Tue, Nov 4, 2025 at 2:56 PM Petr Viktorin ***@***.***> wrote:
***@***.**** commented on this pull request.
Hello! Sorry for the delay in reviewing.
The lix looks good for colons, but a comment on the issue also mentioned
*newlines* in filenames. Do you want to tackle that, or leave it to
another PR?
------------------------------
In Lib/multiprocessing/resource_tracker.py
<#138473 (comment)>:
> + parts = line.strip().decode('ascii').split(':')
+ if len(parts) < 3:
+ raise ValueError("malformed resource_tracker message: %r" % (parts,))
+ cmd = parts[0]
+ rtype = parts[-1]
+ name = ':'.join(parts[1:-1])
How about this?
⬇️ Suggested change
- parts = line.strip().decode('ascii').split(':')
- if len(parts) < 3:
- raise ValueError("malformed resource_tracker message: %r" % (parts,))
- cmd = parts[0]
- rtype = parts[-1]
- name = ':'.join(parts[1:-1])
+ cmd, *name_parts, rtype = line.strip().decode('ascii').split(':')
+ name = ':'.join(name_parts)
—
Reply to this email directly, view it on GitHub
<#138473 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH6O62VUX7TDUCM6PR6DYN333CV67AVCNFSM6AAAAACFRVHJJWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMJWGU4DAOBUGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
encukou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This approach feels ad-hoc; did you consider a standard encoding like JSON lines or pickle?
(Python's JSON encoder won't output newlines unless you ask for indentation; pickle.load reads one object from a file and stops.)
But I can merge this if you don't want to explore those options.
| cmd, name, rtype = line.strip().decode('ascii').split(':') | ||
| cmd, enc_name, rtype = line.rstrip(b'\n').decode('ascii').split(':', 2) | ||
| if rtype == "shared_memory": | ||
| name = base64.urlsafe_b64decode(enc_name.encode('ascii')).decode('utf-8', 'surrogateescape') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoding is OK with strings.
| name = base64.urlsafe_b64decode(enc_name.encode('ascii')).decode('utf-8', 'surrogateescape') | |
| name = base64.urlsafe_b64decode(enc_name).decode('utf-8', 'surrogateescape') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. I like your suggestion to consider using JSON to encode the message.
Also, in the original implementation there was a check that the message is not longer than 512 bytes, since writes shorter than PIPE_BUF (512 bytes on POSIX) are guaranteed to be atomic. But there was no check on the name length.
Now I check that the name is at most 255 bytes long, which is the value of NAME_MAX on Linux (including the leading slash that POSIX requires in shared memory and semaphore names).
I still encode the name using Base64, because json.dumps(..., ensure_ascii=True) would otherwise expand each non-ASCII byte into a 6-character escape like \uDC80. Using Base64 ensures that a 255-byte name becomes at most 340 bytes long, so the total JSON message always remains well below 512 bytes.
As a result, the previous runtime check for the message length is now replaced by an assert.
What do you think?
…the sent length below 512
encukou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good catch about still needing base64 for atomicity.
Hopefully, humans won't need to inspect the stream.
| line = raw.rstrip(b'\n') | ||
| try: | ||
| obj = json.loads(line.decode('ascii')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripping and decoding shouldn't be needed; json.loads can handle bytes and trailing newlines.
| if not isinstance(cmd, str) or not isinstance(rtype, str) or not isinstance(b64, str): | ||
| raise ValueError("malformed resource_tracker fields: %r" % (obj,)) | ||
|
|
||
| enc = b64.encode('ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urlsafe_b64decode can handle bytes as well.
| cmd = obj.get("cmd") | ||
| rtype = obj.get("rtype") | ||
| b64 = obj.get("base64_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need get with constant arguments:
| cmd = obj.get("cmd") | |
| rtype = obj.get("rtype") | |
| b64 = obj.get("base64_name") | |
| cmd = obj["cmd"] | |
| rtype = obj["rtype"] | |
| b64 = obj["base64_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternately, make some fields optional -- then the PROBE message can be shorter:
| cmd = obj.get("cmd") | |
| rtype = obj.get("rtype") | |
| b64 = obj.get("base64_name") | |
| cmd = obj["cmd"] | |
| rtype = obj["rtype"] | |
| b64 = obj.get("base64_name", "") |
|
🤖 New build scheduled with the buildbot fleet by @encukou for commit ea6416e 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138473%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
Looks good! Thank you for the fix! |
|
Thanks @rani-pinchuk for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
|
Thanks @rani-pinchuk for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
| res = assert_python_failure("-c", code, PYTHONWARNINGS='error') | ||
| self.assertIn(b'DeprecationWarning', res.err) | ||
| self.assertIn(b'is multi-threaded, use of forkpty() may lead to deadlocks in the child', res.err) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been a double line gap. I'll fix it in the backports.
…itrary shared memory names (pythonGH-138473) (cherry picked from commit c6f3dd6) Co-authored-by: Rani Pinchuk <33353578+rani-pinchuk@users.noreply.github.com>
|
GH-141922 is a backport of this pull request to the 3.14 branch. |
…itrary shared memory names (pythonGH-138473) (pythonGH-141922) (cherry picked from commit 64d6bde38ffb97a6f7c7fee82ba98de81911f1b9) Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Rani Pinchuk <33353578+rani-pinchuk@users.noreply.github.com>
|
GH-142014 is a backport of this pull request to the 3.13 branch. |
Shared memory names containing colons were not parsed correctly as the code of
resource_trackerassumed that these names contain no colons.@encukou