Skip to content
Discussion options

You must be logged in to vote

@QuantumPlayDev

As ik mentions, having some logs would be useful. Thanks for posting your full command. I'll try to answer what you asked and add some thoughts.

Thoughts

  1. You have a Zen5 chip pretty sure, so for max PP make sure to recompile and use quants listed in this PR: #710 . You should be pretty good if that is my ubergarm IQ5_K.
  2. For max PP throughput you want to increase batch sizes e.g. -ub 4096 -b 4096 is pretty good spot if you don't OOM.
  3. Remove -amb as it is only for -mla style quants.
  4. I assume you're compiling with -DGGML_SCHED_MAX_COPIES=1 which can be useful for multi-GPU in general to reduce OOM (though mostly imortant for MLA quants psure).
  5. Set --threads-batch 24 or the e…

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
9 replies
@trilog-inc
Comment options

@magikRUKKOLA
Comment options

@magikRUKKOLA
Comment options

@Thireus
Comment options

@magikRUKKOLA
Comment options

Comment options

You must be logged in to vote
1 reply
@magikRUKKOLA
Comment options

Comment options

You must be logged in to vote
1 reply
@ubergarm
Comment options

Answer selected by QuantumPlayDev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants