File tree Expand file tree Collapse file tree 2 files changed +2
-2
lines changed
Expand file tree Collapse file tree 2 files changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -188,7 +188,7 @@ This subsection lists the third-party dependencies for the different stages of D
188188# Future Plans
189189
190190- [ ] Accelerate attention with Flash-Attention
191- - [ ] Expand context length to over 32k
191+ - [x ] Expand context length to over 32k
192192- [ ] Support 4-bit quantization
193193- [ ] Support quantized models fine-tuned with GPTQ
194194- [ ] Support MoE architecture
Original file line number Diff line number Diff line change @@ -189,7 +189,7 @@ $$ x_{u8} = x_{fp32} / scale + zeropoint $$
189189# 未来规划
190190
191191- [ ] 首包加速:加入CPU实现的Flash-Attention等Attention加速技术;
192- - [ ] Context Length:扩展到32k以上;
192+ - [x ] Context Length:扩展到32k以上;
193193- [ ] 低bit量化支持:支持4-bit量化;
194194- [ ] QAT量化支持:支持GPTQ算法量化微调过的模型;
195195- [ ] MoE:支持MoE模型和架构。
You can’t perform that action at this time.
0 commit comments