-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-119451: Fix a potential denial of service in http.client #119454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
25166b1
f097fad
83cc548
749c7cd
0a33a89
4559a5e
1ef9b5a
5aaf8a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -111,6 +111,11 @@ | |
| _MAXLINE = 65536 | ||
| _MAXHEADERS = 100 | ||
|
|
||
| # Data larger than this will be read in chunks, to prevent extreme | ||
| # overallocation. | ||
| _MIN_READ_BUF_SIZE = 1 << 20 | ||
|
|
||
|
|
||
| # Header name/value ABNF (http://tools.ietf.org/html/rfc7230#section-3.2) | ||
| # | ||
| # VCHAR = %x21-7E | ||
|
|
@@ -642,9 +647,16 @@ def _safe_read(self, amt): | |
| reading. If the bytes are truly not available (due to EOF), then the | ||
| IncompleteRead exception can be used to detect the problem. | ||
| """ | ||
| data = self.fp.read(amt) | ||
| if len(data) < amt: | ||
| raise IncompleteRead(data, amt-len(data)) | ||
| cursize = min(amt, _MIN_READ_BUF_SIZE) | ||
| data = self.fp.read(cursize) | ||
| while len(data) < amt: | ||
| if len(data) < cursize: | ||
| raise IncompleteRead(data, amt-len(data)) | ||
| # This is a geometric increase in read size (never more than | ||
| # doubling out the current length of data per loop iteration). | ||
| delta = min(cursize, amt - cursize) | ||
| data += self.fp.read(cursize) | ||
|
||
| cursize += delta | ||
| return data | ||
|
|
||
| def _safe_readinto(self, b): | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| Fix a potential memory denial of service in the :mod:`http.client` module. | ||
| When connecting to a malicious server, it could cause | ||
| an arbitrary amount of memory to be allocated. | ||
| This could have led to symptoms including a :exc:`MemoryError`, swapping, out | ||
| of memory (OOM) killed processes or containers, or even system crashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When multiple 1 MB chunks have to be read and joined,
io.BytesIOwould consume less memory and do the job significantly faster.Here's a result of a simple benchmark on Python 3.14.0:
The benchmarking script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your benchmark reads data by chunks with the constant size .
This PR reads it by chunks with geometrically increased size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That''s true. Here's the changed script with its result
The updated benchmarking script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now why
BytesIOconsumes 25% less memory. If you addnnew bytes to a buffer of sizen, in case of the bytes concatenation you need to allocate a new buffer of size2*n-- total4*nbytes, butBytesIOcan reallocate the original buffer -- total3*nbytes. It also benefits from CPython-specifix optimization --BytesIO.getvalue()just returns the internal bytes buffer without allocating a new bytes object for result.The time difference is perhaps a derivation of this, although it is not important when we read from internet. But the peak memory consumption is important.