Improve URL host IDNA-encoding performance and compatibility #1632
+127
−83
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improves IDNA encoding performance by converting UTF8 to UTF16 in Swift before calling the UTF16-native ICU functions. Improves compatibility by allowing specific UIDNA errors during
nameToASCII.Motivation:
Performance: ICU
uidna_nameToASCII_UTF8anduidna_nameToUnicodeUTF8are just convenience wrappers around the UTF16-native functions. Performing the conversions to and from UTF16 ourselves in Swift is faster than having ICU do it for us, and we can also use the fact thatnameToASCIIproduces ASCII on success to efficiently truncate the returnedUInt16buffer.Compatibility: Resolves #1560. As the issue describes, URL handling of ASCII and IDNA-encoded hosts is inconsistent. For instance, if an IDNA-encoded host has a domain label longer than 63 bytes or the entire domain is longer than 255 bytes,
URLpreviously returnednilbecause theuidnafunctions indicated the respective non-fatal errors. Hosts without IDNA-encoding don't see this limitation. Allowing these errors also aligns our behavior with Safari and other WHATWG URL parsers.Modifications:
In cases where we would previously call the UTF8
uidnafunctions, this PR instead performs the UTF8 to UTF16 transcoding in Swift before passing the UTF16 buffer to the ICU function. OnnameToASCIIsuccess, we truncate the returnedUInt16ASCII elements toUInt8and initialize the resultingString. OnnameToUnicodesuccess, we create theStringfrom UTF16, which performs the UTF16 to UTF8 transcoding.Use the same allowed errors for
nameToASCIIthat are currently allowed fornameToUnicode:UIDNA_ERROR_EMPTY_LABEL | UIDNA_ERROR_LABEL_TOO_LONG | UIDNA_ERROR_DOMAIN_NAME_TOO_LONG | UIDNA_ERROR_LEADING_HYPHEN | UIDNA_ERROR_TRAILING_HYPHEN | UIDNA_ERROR_HYPHEN_3_4Result:
~10% speedup for IDNA encoding and decoding.
Compatibility with WHATWG parsers/browsers and consistent behavior for IDN and non-IDN host name lengths.
Testing:
Added benchmarks for IDNA encoding and decoding.
Added unit test for allowed
nameToASCIIerrors.