Commit f7d92d9
authored
Review: "Implement DRY penalty" (#645)
* Silence bogus Clippy warning
Clippy's suggestion cannot be implemented because of borrowing issues
* Get rid of unnecessary type annotations
Interesting that Clippy doesn't catch this
* Store default sequence breakers in a slice
It's nicer when the length is not hardcoded
* Make default sequence breakers private
No need to leak this as it's not used elsewhere
* Limit match length
Avoids quadratic runtime and potential DoS with adversarial inputs
Ref oobabooga/text-generation-webui#6047
* "Fix" sequence breaker tokenization
Most tokenizers encode punctuation tokens differently depending on where they occur in the input, and which tokens surround them. With the default sequence breakers, the appropriate encoding usually corresponds to the encoding produced when the token occurs after a word, rather than by itself. To emulate this, prefix the token with "a" before encoding, and extract the final token of the result.
See LostRuins/koboldcpp#982 for a correct solution to this problem.1 parent 8650d9c commit f7d92d9
1 file changed
+14
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
| 82 | + | |
| 83 | + | |
88 | 84 | | |
89 | 85 | | |
90 | 86 | | |
| |||
96 | 92 | | |
97 | 93 | | |
98 | 94 | | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
| 95 | + | |
103 | 96 | | |
104 | 97 | | |
105 | 98 | | |
| |||
123 | 116 | | |
124 | 117 | | |
125 | 118 | | |
126 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
127 | 125 | | |
128 | 126 | | |
129 | 127 | | |
130 | 128 | | |
131 | 129 | | |
132 | 130 | | |
133 | | - | |
| 131 | + | |
134 | 132 | | |
135 | 133 | | |
136 | 134 | | |
| |||
505 | 503 | | |
506 | 504 | | |
507 | 505 | | |
508 | | - | |
| 506 | + | |
| 507 | + | |
509 | 508 | | |
510 | 509 | | |
511 | 510 | | |
| |||
527 | 526 | | |
528 | 527 | | |
529 | 528 | | |
| 529 | + | |
530 | 530 | | |
531 | 531 | | |
532 | 532 | | |
| |||
0 commit comments