File tree Expand file tree Collapse file tree 1 file changed +2
-1
lines changed
Expand file tree Collapse file tree 1 file changed +2
-1
lines changed Original file line number Diff line number Diff line change @@ -86,9 +86,10 @@ See [WordTokenizers.jl](https://github.com/JuliaText/WordTokenizers.jl)
8686
8787There is an issue though:
8888How much are these tokens costing you in memory use?
89+ The math in this section is a bit hand-wavy and an over-simplification, but it should give you the gist of it.
8990
9091Originally you had say a 100MB (10⁸ bytes) text file (multiply this out as required).
91- Which as a String took-up (10⁸ bytes + 1 pointer (4 or 8 bytes) + 1 length marker (4 or 8 bytes) + null terminating character (total 10⁸ + 9 (or 17 ) bytes).
92+ Which as a String took-up (10⁸ bytes + 1 pointer (4 or 8 bytes) + 1 length marker (4 or 8 bytes) + null terminating character (total 10⁸ + 9 (or 16 ) bytes).
9293To simplify the math lets say the average token length was 10 bytes.
9394So you had 10⁷ tokens.
9495
You can’t perform that action at this time.
0 commit comments