-
-
Notifications
You must be signed in to change notification settings - Fork 120
[Platform] Introduce CachedPlatform
#416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
51fa81c to
1fa590a
Compare
bf5a1fe to
5ef4417
Compare
5ef4417 to
cc5f431
Compare
8e15d3d to
46bca63
Compare
914eddf to
8112ab2
Compare
| $metadata->add('cached', true); | ||
| $metadata->add('prompt_cache_key', $options['prompt_cache_key']); | ||
| $metadata->add('cached_prompt_count', $data['prompt_eval_count']); | ||
| $metadata->add('cached_completion_count', $data['eval_count']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to group this data into a DTO, like there is TokenUsage, and then add that DTO to metadata or perhaps even reuse the said DTO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced about the benefits of using an object here, we're only storing an integer, I don't see the benefits to be honest 🤔
@OskarStark @chr-hertel Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, it would be great to have an object like CacheUsage similar to TokenUsage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes CacheUsage would be a good fit
| $firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | ||
| 'messages' => [ | ||
| [ | ||
| 'role' => 'user', | ||
| 'content' => 'Say hello world', | ||
| ], | ||
| ], | ||
| 'model' => 'llama3.2', | ||
| ], [ | ||
| 'prompt_cache_key' => 'foo', | ||
| ]); | ||
|
|
||
| $result = $firstCall->getResult(); | ||
|
|
||
| $this->assertSame('Hello world', $result->getContent()); | ||
| $this->assertSame(10, $result->getMetadata()->get('cached_prompt_count')); | ||
| $this->assertSame(10, $result->getMetadata()->get('cached_completion_count')); | ||
|
|
||
| $secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | ||
| 'messages' => [ | ||
| [ | ||
| 'role' => 'user', | ||
| 'content' => 'Say hello world', | ||
| ], | ||
| ], | ||
| 'model' => 'llama3.2', | ||
| ], [ | ||
| 'prompt_cache_key' => 'foo', | ||
| ]); | ||
|
|
||
| $secondResult = $secondCall->getResult(); | ||
|
|
||
| $this->assertSame('Hello world', $secondResult->getContent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| $firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | |
| 'messages' => [ | |
| [ | |
| 'role' => 'user', | |
| 'content' => 'Say hello world', | |
| ], | |
| ], | |
| 'model' => 'llama3.2', | |
| ], [ | |
| 'prompt_cache_key' => 'foo', | |
| ]); | |
| $result = $firstCall->getResult(); | |
| $this->assertSame('Hello world', $result->getContent()); | |
| $this->assertSame(10, $result->getMetadata()->get('cached_prompt_count')); | |
| $this->assertSame(10, $result->getMetadata()->get('cached_completion_count')); | |
| $secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | |
| 'messages' => [ | |
| [ | |
| 'role' => 'user', | |
| 'content' => 'Say hello world', | |
| ], | |
| ], | |
| 'model' => 'llama3.2', | |
| ], [ | |
| 'prompt_cache_key' => 'foo', | |
| ]); | |
| $secondResult = $secondCall->getResult(); | |
| $this->assertSame('Hello world', $secondResult->getContent()); | |
| $firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | |
| 'messages' => [ | |
| [ | |
| 'role' => 'user', | |
| 'content' => 'Say hello world', | |
| ], | |
| ], | |
| 'model' => 'llama3.2', | |
| ], [ | |
| 'prompt_cache_key' => 'foo', | |
| ]); | |
| $secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [ | |
| 'messages' => [ | |
| [ | |
| 'role' => 'user', | |
| 'content' => 'Say hello world', | |
| ], | |
| ], | |
| 'model' => 'llama3.2', | |
| ], [ | |
| 'prompt_cache_key' => 'foo', | |
| ]); | |
| $firstResult = $firstCall->getResult(); | |
| $secondResult = $secondCall->getResult(); | |
| $this->assertSame('Hello world', $firstResult->getContent()); | |
| $this->assertSame(10, $firstResult->getMetadata()->get('cached_prompt_count')); | |
| $this->assertSame(10, $firstResult->getMetadata()->get('cached_completion_count')); | |
| $this->assertSame('Hello world', $secondResult->getContent()); |
de85ef7 to
e038555
Compare
|
Let's zoom a bit out here, for two reasons:
|
Ollama does a "context caching" and/or a K/V caching, it stores the X latest messages for the model window (or pending tokens to speed TTFT), it's not a cache that returns the generated response if the request already exist.
Well, because that's the one that I use the most and the easiest to implement first but we can integrate it for every platform if that's the question, we just need to use the API contract, both Anthropic and OpenAI already does it natively 🤔 If the question is: Could we implement it at the platform layer for every platform without relying on API calls, well, that's not a big deal to be honest and we could easily integrate it 🙂 |
|
What do you think about having it as decorator |
|
I like the idea of |
e038555 to
194f7a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes here would also belong into CachedPlatform so every bridge can benefit from this decorator
ce368c9 to
3ce43bd
Compare
CachedPlatform
3ce43bd to
b94644b
Compare
b94644b to
1e9a194
Compare
1e9a194 to
a4f0eba
Compare
| vectorizer: 'ai.vectorizer.mistral_embeddings' | ||
| store: 'ai.store.memory.research' | ||
| Cached platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Cached platform | |
| Cached Platform |
| --------------- | ||
|
|
||
| Thanks to Symfony's Cache component, platforms can be decorated and use any cache adapter, | ||
| this platform allows to reduce network calls / resource comsumption: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this platform allows to reduce network calls / resource comsumption: | |
| this platform allows to reduce network calls / resource consumption: |
|
|
||
| echo $firstResult->getContent().\PHP_EOL; | ||
|
|
||
| $secondResult = $cachedPlatform->invoke('gpt-4o-mini', new MessageBag(Message::ofUser('What is the capital of France?'))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like:
| $secondResult = $cachedPlatform->invoke('gpt-4o-mini', new MessageBag(Message::ofUser('What is the capital of France?'))); | |
| // This call will not be executed against the API | |
| $secondResult = $cachedPlatform->invoke('gpt-4o-mini', new MessageBag(Message::ofUser('What is the capital of France?'))); |
|
|
||
| # For using Ollama | ||
| OLLAMA_HOST_URL=http://localhost:11434 | ||
| OLLAMA_HOST_URL=http://127.0.0.1:11434 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to an extra PR please?
| ->children() | ||
| ->stringNode('platform')->isRequired()->end() | ||
| ->stringNode('service')->isRequired()->end() | ||
| ->stringNode('cache_key')->end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should we provide a default key here?
| "symfony/process": "^7.3|^8.0", | ||
| "symfony/var-dumper": "^7.3|^8.0" | ||
| }, | ||
| "suggest": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove, in Symfony we decided against suggest section in composer.json files
Hi 👋🏻
This PR aim to introduce a caching layer for
Ollamaplatform (like OpenAI, Anthropic and more already does).