Skip to content

Commit 1f23b2e

Browse files
committed
Add complilation config 3 to model config, remove model info for deleted weights
1 parent a1f8b3e commit 1f23b2e

File tree

2 files changed

+89
-12
lines changed

2 files changed

+89
-12
lines changed

vec_inf/config/README.md

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,6 @@ More profiling metrics coming soon!
2424
| [`CodeLlama-70b-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-hf) | 4x a40 | - tokens/s | - tokens/s |
2525
| [`CodeLlama-70b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf) | 4x a40 | - tokens/s | - tokens/s |
2626

27-
### [Databricks: DBRX](https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962)
28-
29-
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
30-
|:----------:|:----------:|:----------:|:----------:|
31-
| [`dbrx-instruct`](https://huggingface.co/databricks/dbrx-instruct) | 8x a40 (2 nodes, 4 a40/node) | 107 tokens/s | 904 tokens/s |
32-
3327
### [Google: Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
3428

3529
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
@@ -104,12 +98,6 @@ More profiling metrics coming soon!
10498
|:----------:|:----------:|:----------:|:----------:|
10599
| [`Phi-3-medium-128k-instruct`](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) | 2x a40 | - tokens/s | - tokens/s |
106100

107-
### [Aaditya Ura: Llama3-OpenBioLLM](https://huggingface.co/aaditya/Llama3-OpenBioLLM-70B)
108-
109-
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
110-
|:----------:|:----------:|:----------:|:----------:|
111-
| [`Llama3-OpenBioLLM-70B`](https://huggingface.co/aaditya/Llama3-OpenBioLLM-70B) | 4x a40 | - tokens/s | - tokens/s |
112-
113101
### [Nvidia: Llama-3.1-Nemotron](https://huggingface.co/collections/nvidia/llama-31-nemotron-70b-670e93cd366feea16abc13d8)
114102

115103
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |

0 commit comments

Comments
 (0)