Skip to content

Feature Request: RPC upstream changes from main lcpp #978

@Panchovix

Description

@Panchovix

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hello, thanks for all your work as always.

I was wondering if RPC updates and changes could be ported from main lcpp into iklcpp.

Nowadays you can get pretty good performance, even when offloading to RAM and also using RPC.

I have these examples on GLM 4.6 full on VRAM on lcpp: ggml-org/llama.cpp#16625 (reply in thread)

I also tried i.e. DeepSeek R1 0528 Q3_K_XL offloading about 25 layers to CPU, 5 layers to RPC (CUDA) and 30 layers to the main PC (CUDA) and got just about 10-20% perf penalty.

Motivation

This would let people use multiple PCs to offload more into devices like CUDA, that in example, on my case, doesn't fit anymore on another system.

Speed penalty is pretty low so it would be worth to consider maybe.

Possible Implementation

Not sure exactly besides porting RPC stuff into here.

As @ubergarm mentioned here https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/13#691bb8df9287514645b7cc35, it seems to start on ggml-org/llama.cpp#16276 commit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions