-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK #139074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @afoucret, I've created a changelog YAML for you. |
afdfd7c to
2f2c354
Compare
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @afoucret, I've created a changelog YAML for you. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
ff1608f to
b414480
Compare
| Source.readFrom((PlanStreamInput) in), | ||
| in.readNamedWriteable(LogicalPlan.class), | ||
| in.readNamedWriteable(Expression.class), | ||
| in.getTransportVersion().supports(ESQL_INFERENCE_ROW_LIMIT_TRANSPORT_VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to introduce a new transport version, in fact we don't need this method at all.
now that RERANK and COMPLETION have an implicit limit - they will always be executed on the coordinator, meaning we never need to send them to the data nodes.
so we can further simplify this, remove the NamedWritable and just throw an exception if we need to serialize them (which would be a bug and a code path that should never be reached):
elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Row.java
Lines 36 to 44 in 8f72b23
| @Override | |
| public void writeTo(StreamOutput out) { | |
| throw new UnsupportedOperationException("not serialized"); | |
| } | |
| @Override | |
| public String getWriteableName() { | |
| throw new UnsupportedOperationException("not serialized"); | |
| } |
ChangePoint, Fuse, Fork etc are also not serialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely removed the serialization logic for Rerank and Completion logical and physical plans.
| try (var resp = run(query)) { | ||
| List<List<Object>> values = getValuesList(resp); | ||
| // Should be limited by the default row limit (100) | ||
| assertThat(values.size(), lessThanOrEqualTo(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the index we create always has 6 documents - so we are not really testing the limit enforcement.
let's change the createAndPopulateTestIndex so that we create an index with more than 100 documents when we test COMPLETION and more than 1000 when we test RERANK.
and here we should check that we get exactly 100 documents back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test have been updated.
|
|
||
| try (var resp = run(query)) { | ||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values.size(), lessThanOrEqualTo(customLimit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's check we get exactly customLimit docs back (take a look at the other comment that suggests changing createAndPopulateTestIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests have been updated.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceService.java
Show resolved
Hide resolved
|
Docs preview LGTM, nice! |
…k (do not escape the coordinator node anymore).
| public EsqlStatement parse( | ||
| String query, | ||
| QueryParams params, | ||
| SettingsValidationContext settingsValidationCtx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should consider embedding inferenceSettings in SettingsValidationContext?
I know that SettingsValidationContext serves a different purpose, so maybe we can just have a ValidationContext that is initialized in EsqlSession and can receive whatever context is necessary for validation during parsing?
Happy to do this as a separate follow up and not in this PR, since it would increase the scope.
Added implicits configurable Limit for
COMPLETIONandRERANKUpdated documentation
Others:
Closes: #136861