Commit bc83804
Improve training config (#11)
This PR aims to bring improvements to the training args used for
pretraining MPNet from several angles:
1. improved default values for the training args[^1] and updates to some
to more closely follow hyperparams in MPNet paper
2. clearer, more succinct descriptions of what the core args are/do and
how to use them
3. addition of new A) options for some existing training args[^2] and B)
exposing/integrating some hardcoded parameters[^3] to new CLI args to be
adjustable by the user
[^1]: i.e. like grad clip which has become standard during pretrain
since the original repo came out
[^2]: added support for new activation fns "silu" and "relu2"
[^3]: the relaative attention hyperparams
`relative_attention_num_buckets` and `max_distance` are hardcoded to
values for 512 ctx, dhould be set-able by user w/ reasonable defaults
---------
Signed-off-by: peter szemraj <peterszemraj@gmail.com>
Signed-off-by: Peter Szemraj <peterszemraj+dev@gmail.com>
Co-authored-by: Peter Szemraj <peterszemraj+dev@gmail.com>1 parent cec50e3 commit bc83804
File tree
8 files changed
+437
-82
lines changed- annotated_mpnet
- data
- modeling
- transformer_modules
- utils
- cli_tools
8 files changed
+437
-82
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | 6 | | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
| 22 | + | |
20 | 23 | | |
21 | 24 | | |
22 | | - | |
23 | | - | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
75 | 77 | | |
76 | 78 | | |
77 | 79 | | |
| |||
534 | 536 | | |
535 | 537 | | |
536 | 538 | | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
537 | 543 | | |
538 | 544 | | |
539 | 545 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
| 75 | + | |
75 | 76 | | |
76 | 77 | | |
77 | 78 | | |
| |||
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
| 119 | + | |
| 120 | + | |
118 | 121 | | |
119 | 122 | | |
120 | 123 | | |
| |||
160 | 163 | | |
161 | 164 | | |
162 | 165 | | |
163 | | - | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
164 | 187 | | |
165 | 188 | | |
166 | 189 | | |
| |||
259 | 282 | | |
260 | 283 | | |
261 | 284 | | |
262 | | - | |
| 285 | + | |
263 | 286 | | |
264 | 287 | | |
265 | 288 | | |
| |||
293 | 316 | | |
294 | 317 | | |
295 | 318 | | |
296 | | - | |
297 | | - | |
| 319 | + | |
298 | 320 | | |
299 | | - | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
300 | 331 | | |
301 | 332 | | |
302 | 333 | | |
| |||
307 | 338 | | |
308 | 339 | | |
309 | 340 | | |
310 | | - | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
311 | 344 | | |
312 | 345 | | |
313 | 346 | | |
| |||
317 | 350 | | |
318 | 351 | | |
319 | 352 | | |
320 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
321 | 371 | | |
322 | 372 | | |
323 | 373 | | |
| |||
Lines changed: 1 addition & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | | - | |
61 | | - | |
| 59 | + | |
62 | 60 | | |
63 | 61 | | |
64 | 62 | | |
| |||
0 commit comments