Releases · modelscope/dash-infer

29 Jul 15:50

kzjeef

v2.2.5

29326d7

v2.2.5 Latest

Latest

What's Changed

Support Single Node P-D Disaggregate function for CUDA.
Support Qwen3 Model, Currently only for Dense, MoE Models support is WIP.
Support EP in MOE OP, currently only support BF16 and FP16.
CPU Support is not work in this release, Use a v2.1.x version for CPU support.

In Detail

readme: add citation, update subproject description by @leefige in #70
fix cugraph dnn moe kernel bug by @laiwenzh in #71
support MOE EP by @yjc9696 in #73
support disaggregated prefilling by @laiwenzh in #78
prompt相关的问题修复 by @lddfym in #77
Build: only build flash attention kernel once. by @kzjeef in #80
support qwenV3【Dense】 by @yjc9696 in #81
model: fix cpu compiler error by @kzjeef in #82
Fix cpu compile by @kzjeef in #83
Create build-check.yml by @kzjeef in #84
ci: only trigger cuda release for current version. by @kzjeef in #85

New Contributors

@yjc9696 made their first contribution in #73
@lddfym made their first contribution in #77

Full Changelog: v2.1.0...v3.0.0-rc1

Existing Issue

CPU Support is not work in this release, Use a v2.1.x version for CPU support.

Contributors

kzjeef, leefige, and 3 other contributors

Assets 7

11 Feb 07:28

laiwenzh

v2.1.0

069c74e

v2.1.0

What's Changed

[JSON mode]: FormatEnforcer use cudaMallocHost for scores buffer by @WangNorthSea in #56
[A16W8 & A8W8]: further optimization for Ampere A16W8 fused gemm kernel 2. fix lora doc by @wyajieha in #58
[Multimodal]: Support LLM quantization with GPTQ and AXWY by @x574chen in #60
[PKG]: Reduce package size by only compiling flash-attn src with hdim128 by @laiwenzh in #62
[MOE]: add high performance moe kernel; fix a16w8 compile bug for sm<80 by @laiwenzh in #67

New Contributors

@wyajieha made their first contribution in #58

Full Changelog: v2.0.0...v2.1.0

Contributors

x574chen, WangNorthSea, and 2 other contributors

Assets 14

21 Jan 07:31

github-actions

v2.0.0

012eb1b

v2.0.0

What's Changed

engine: stop and release model when engine release, and remove deprecated lock
sampling: generate_op heavily modified, remove dependency on global tensors
prefix cache: some bug fix, impove evict performance
json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p
span-attention: move span_attn decoderReshape to init
lora: add docs, fix typo
ubuntu: add ubuntu dockerfile, fix install dir err
bugifx: fix multi-batch rep_penlty bug

Full Changelog: v1.3.0...v2.0.0

Assets 14

20 Dec 13:10

github-actions

v2.0.0-rc3

163850f

v2.0.0-rc3

some bugfix

- uuid crash issue
- update lora implement
- set page size by param
- delete deprecated files

Assets 14

17 Dec 12:29

github-actions

v2.0.0-rc2

1b2a6ad

v2.0.0-rc2

release script: reduce python wheel size (#46)

Assets 14

27 Aug 03:33

yejunjin

v1.3.0

2e7ea7b

v1.3.0

Highlight

Support Baichuan-7B and Baichuan2-7B & 13B by @WangNorthSea in #38

Full Changelog: v1.2.1...v1.3.0

Contributors

WangNorthSea

Assets 12

01 Jul 03:28

yejunjin

v1.2.1

5ceddf9

v1.2.1

What's Changed

Add llama.cpp benchmark steps
fix: fallback to mha without avx512f support
solve security issue; helper: bugfix, cpu platform check
add release package workflow

Assets 13

24 Jun 05:32

yejunjin

v1.2.0

3a0417b

v1.2.0

expand context length to 32K & support flash attention on intel-avx512 platform

remove currently unsupported cache mode
examples: update qwen prompt template, add print func to examples
support glm-4-9b-chat by
change to size_t to avoid overflow when seq is long
update README since we support 32k context length
Add flash attention on intel-avx512 platform

Assets 13

29 May 08:32

laiwenzh

v1.1.0

1b9b010

v1.1.0

support Qwen2, change dashinfer model extensions

support Qwen2, add model_type Qwen_v20
change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
python example: remove xxx_quantize.json config file, use command line arg instead

Assets 13

14 May 05:50

laiwenzh

v1.0.4

9ef6e35

v1.0.4

First official release.

Assets 13

Releases: modelscope/dash-infer

v2.2.5

What's Changed

In Detail

New Contributors

Existing Issue

Contributors

Uh oh!

v2.1.0

What's Changed

New Contributors

Contributors

Uh oh!

v2.0.0

What's Changed

Uh oh!

v2.0.0-rc3

Uh oh!

v2.0.0-rc2

Uh oh!

v1.3.0

Highlight

Contributors

Uh oh!

v1.2.1

What's Changed

Uh oh!

v1.2.0

Uh oh!

v1.1.0

Uh oh!

v1.0.4

Uh oh!