You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2><aclass="toc-backref" id="examples-bad-utfminus8-input-text" href="#examples-bad-utfminus8-input-text">Bad UTF-8 input text</a></h2><p>This lib makes no effort to handle bad/malformed UTF-8 input text. The behaviour on bad input is currently undefined, and it will likely result in an internal AssertionDefect or some other error.</p>
513
-
<p>What can be done about this is validating the input text before passing it to the match function.</p>
512
+
<h2><aclass="toc-backref" id="examples-bad-utfminus8-input-text" href="#examples-bad-utfminus8-input-text">Bad UTF-8 input text</a></h2><p>This lib makes no effort to handle invalid UTF-8 input text (i.e: malformed or corrupted). The behaviour on invalid input is currently undefined, and it will likely result in an internal AssertionDefect or some other error.</p>
513
+
<p>What can be done about this is validating the input text to avoid passing invalid input to the match function.</p>
<h2><aclass="toc-backref" id="examples-match-binary-data" href="#examples-match-binary-data">Match binary data</a></h2><p>Matching on arbitrary binary data (i.e: not utf-8) is not currently supported. Both the regex and the input text are assumed to be valid utf-8. The input text is treated as utf-8, and setting the regex to ASCII mode won't help.</p>
518
+
<spanclass="Identifier">doAssert</span><spanclass="Identifier">validateUtf8</span><spanclass="Punctuation">(</span><spanclass="StringLit">"</span><spanclass="EscapeSequence">\xf8</span><spanclass="EscapeSequence">\xa1</span><spanclass="EscapeSequence">\xa1</span><spanclass="EscapeSequence">\xa1</span><spanclass="EscapeSequence">\xa1</span><spanclass="StringLit">"</span><spanclass="Punctuation">)</span><spanclass="Operator">!=</span><spanclass="Operator">-</span><spanclass="DecNumber">1</span></pre><p>Note at the time of writting this, Nim's <ttclass="docutils literal"><spanclass="pre"><spanclass="Identifier">validateUtf8</span></span></tt><aclass="reference external" href="https://github.com/nim-lang/Nim/issues/19333">is not strict enough</a> and so you are better off using <aclass="reference external" href="https://github.com/nitely/nim-unicodeplus">nim-unicodeplus's</a><ttclass="docutils literal"><spanclass="pre"><spanclass="Identifier">verifyUtf8</span></span></tt> function.</p>
519
+
520
+
<h2><aclass="toc-backref" id="examples-match-binary-data" href="#examples-match-binary-data">Match binary data</a></h2><p>Matching on arbitrary binary data (i.e: not utf-8) is not currently supported. Both the regex and the input text are expected to be valid utf-8. The input text is treated as utf-8, and setting the regex to ASCII mode won't help.</p>
0 commit comments