Skip to content

Commit e91b43f

Browse files
authored
Update README.md
1 parent 34e9c24 commit e91b43f

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,9 +86,10 @@ See [WordTokenizers.jl](https://github.com/JuliaText/WordTokenizers.jl)
8686

8787
There is an issue though:
8888
How much are these tokens costing you in memory use?
89+
The math in this section is a bit hand-wavy and an over-simplification, but it should give you the gist of it.
8990

9091
Originally you had say a 100MB (10⁸ bytes) text file (multiply this out as required).
91-
Which as a String took-up (10⁸ bytes + 1 pointer (4 or 8 bytes) + 1 length marker (4 or 8 bytes) + null terminating character (total 10⁸ + 9 (or 17) bytes).
92+
Which as a String took-up (10⁸ bytes + 1 pointer (4 or 8 bytes) + 1 length marker (4 or 8 bytes) + null terminating character (total 10⁸ + 9 (or 16) bytes).
9293
To simplify the math lets say the average token length was 10 bytes.
9394
So you had 10⁷ tokens.
9495

0 commit comments

Comments
 (0)