Skip to content

Commit 9450c6b

Browse files
committed
improve docs
1 parent 3b86748 commit 9450c6b

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

src/regex.nim

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,32 @@ the scope, and it contains the submatches for every capture group.
283283
matched = true
284284
doAssert matched
285285
286+
Bad UTF-8 input text
287+
####################
288+
289+
This lib makes no effort to handle bad/malformed UTF-8 input text.
290+
The behaviour on bad input is currently undefined, and it will
291+
likely result in an internal AssertionDefect or some other error.
292+
293+
What can be done about this is validating the input text before
294+
passing it to the match function.
295+
296+
.. code-block:: nim
297+
:test:
298+
import unicode
299+
# good input text
300+
doAssert validateUtf8("abc") == -1
301+
# bad input text
302+
doAssert validateUtf8("\xf8\xa1\xa1\xa1\xa1") != -1
303+
304+
Match binary data
305+
#################
306+
307+
Matching on arbitrary binary data (i.e: not utf-8) is not currently supported.
308+
Both the regex and the input text are assumed to be valid utf-8.
309+
The input text is treated as utf-8, and setting the regex to ASCII mode
310+
won't help.
311+
286312
]##
287313

288314
import std/tables

0 commit comments

Comments
 (0)