Skip to content

Commit 3d754d2

Browse files
authored
[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK (#139074)
1 parent 489dc0c commit 3d754d2

File tree

37 files changed

+1136
-456
lines changed

37 files changed

+1136
-456
lines changed

benchmarks/src/main/java/org/elasticsearch/benchmark/_nightly/esql/QueryPlanningBenchmark.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
import org.elasticsearch.xpack.esql.index.EsIndex;
2828
import org.elasticsearch.xpack.esql.index.IndexResolution;
2929
import org.elasticsearch.xpack.esql.inference.InferenceResolution;
30+
import org.elasticsearch.xpack.esql.inference.InferenceSettings;
3031
import org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext;
3132
import org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizer;
3233
import org.elasticsearch.xpack.esql.parser.EsqlParser;
@@ -126,7 +127,7 @@ public void setup() {
126127
}
127128

128129
private LogicalPlan plan(EsqlParser parser, Analyzer analyzer, LogicalPlanOptimizer optimizer, String query) {
129-
var parsed = parser.parseQuery(query, new QueryParams(), telemetry);
130+
var parsed = parser.parseQuery(query, new QueryParams(), telemetry, new InferenceSettings(Settings.EMPTY));
130131
var analyzed = analyzer.analyze(parsed);
131132
var optimized = optimizer.optimize(analyzed);
132133
return optimized;

docs/changelog/139074.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 139074
2+
summary: "[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK"
3+
area: ES|QL
4+
type: enhancement
5+
issues: []

docs/reference/query-languages/esql/_snippets/commands/layout/completion.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,38 @@ stack: preview 9.1.0
66
77
The `COMPLETION` command allows you to send prompts and context to a Large Language Model (LLM) directly within your ES|QL queries, to perform text generation tasks.
88

9-
:::{important}
9+
:::::{important}
1010
**Every row processed by the COMPLETION command generates a separate API call to the LLM endpoint.**
1111

12+
::::{tab-set}
13+
14+
:::{tab-item} 9.3.0+
15+
16+
Starting in version 9.3.0, `COMPLETION` automatically limits processing to **100 rows by default** to prevent accidental high consumption and costs. This limit is applied before the `COMPLETION` command executes.
17+
18+
If you need to process more rows, you can adjust the limit using the cluster setting:
19+
```
20+
PUT _cluster/settings
21+
{
22+
"persistent": {
23+
"esql.command.completion.limit": 500
24+
}
25+
}
26+
```
27+
28+
You can also disable the command entirely if needed:
29+
```
30+
PUT _cluster/settings
31+
{
32+
"persistent": {
33+
"esql.command.completion.enabled": false
34+
}
35+
}
36+
```
37+
:::
38+
39+
:::{tab-item} 9.1.x - 9.2.x
40+
1241
Be careful to test with small datasets first before running on production data or in automated workflows, to avoid unexpected costs.
1342

1443
Best practices:
@@ -19,6 +48,9 @@ Best practices:
1948
4. **Monitor usage**: Track your LLM API consumption and costs.
2049
:::
2150

51+
::::
52+
:::::
53+
2254
**Syntax**
2355

2456
::::{tab-set}

docs/reference/query-languages/esql/_snippets/commands/layout/rerank.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,53 @@ stack: preview 9.2.0
77
The `RERANK` command uses an inference model to compute a new relevance score
88
for an initial set of documents, directly within your ES|QL queries.
99

10+
:::::{important}
11+
**RERANK processes each row through an inference model, which impacts performance and costs.**
12+
13+
::::{tab-set}
14+
15+
:::{tab-item} 9.3.0+
16+
17+
Starting in version 9.3.0, `RERANK` automatically limits processing to **1000 rows by default** to prevent accidental high consumption. This limit is applied before the `RERANK` command executes.
18+
19+
If you need to process more rows, you can adjust the limit using the cluster setting:
20+
```
21+
PUT _cluster/settings
22+
{
23+
"persistent": {
24+
"esql.command.rerank.limit": 5000
25+
}
26+
}
27+
```
28+
29+
You can also disable the command entirely if needed:
30+
```
31+
PUT _cluster/settings
32+
{
33+
"persistent": {
34+
"esql.command.rerank.enabled": false
35+
}
36+
}
37+
```
38+
:::
39+
40+
:::{tab-item} 9.2.x
41+
42+
No automatic row limit is applied. **You should always use `LIMIT` before or after `RERANK` to control the number of documents processed**, to avoid accidentally reranking large datasets which can result in high latency and increased costs.
43+
44+
For example:
45+
```esql
46+
FROM books
47+
| WHERE title:"search query"
48+
| SORT _score DESC
49+
| LIMIT 100 // Limit to top 100 results before reranking
50+
| RERANK "search query" ON title WITH { "inference_id" : "my_rerank_endpoint" }
51+
```
52+
:::
53+
54+
::::
55+
:::::
56+
1057
**Syntax**
1158

1259
```esql

x-pack/plugin/esql/build.gradle

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,10 @@ dependencies {
5555
}
5656
testImplementation project(':test:framework')
5757
testImplementation(testArtifact(project(xpackModule('core'))))
58+
testImplementation(testArtifact(project(xpackModule('inference'))))
5859
testImplementation project(path: xpackModule('enrich'))
5960
testImplementation project(path: xpackModule('spatial'))
61+
testImplementation project(path: xpackModule('inference'))
6062
testImplementation project(path: xpackModule('kql'))
6163
testImplementation project(path: xpackModule('mapper-unsigned-long'))
6264

@@ -72,6 +74,8 @@ dependencies {
7274
testImplementation('org.webjars.npm:fontsource__roboto-mono:4.5.7')
7375

7476
internalClusterTestImplementation project(":modules:mapper-extras")
77+
internalClusterTestImplementation project(xpackModule('inference:qa:test-service-plugin'))
78+
internalClusterTestImplementation(testArtifact(project(xpackModule('inference'))))
7579
}
7680

7781
tasks.named("dependencyLicenses").configure {

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
import org.elasticsearch.common.collect.Iterators;
2828
import org.elasticsearch.common.lucene.BytesRefs;
2929
import org.elasticsearch.common.regex.Regex;
30+
import org.elasticsearch.common.settings.ClusterSettings;
3031
import org.elasticsearch.common.settings.Settings;
3132
import org.elasticsearch.common.unit.ByteSizeValue;
3233
import org.elasticsearch.common.util.BigArrays;
@@ -112,6 +113,7 @@
112113
import org.elasticsearch.xpack.esql.index.IndexResolution;
113114
import org.elasticsearch.xpack.esql.inference.InferenceResolution;
114115
import org.elasticsearch.xpack.esql.inference.InferenceService;
116+
import org.elasticsearch.xpack.esql.inference.InferenceSettings;
115117
import org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext;
116118
import org.elasticsearch.xpack.esql.parser.EsqlParser;
117119
import org.elasticsearch.xpack.esql.parser.QueryParam;
@@ -568,7 +570,7 @@ public static LogicalOptimizerContext unboundLogicalOptimizerContext() {
568570
mock(ProjectResolver.class),
569571
mock(IndexNameExpressionResolver.class),
570572
null,
571-
new InferenceService(mock(Client.class)),
573+
new InferenceService(mock(Client.class), createMockClusterService()),
572574
new BlockFactoryProvider(PlannerUtils.NON_BREAKING_BLOCK_FACTORY),
573575
TEST_PLANNER_SETTINGS,
574576
new CrossProjectModeDecider(Settings.EMPTY)
@@ -577,6 +579,12 @@ public static LogicalOptimizerContext unboundLogicalOptimizerContext() {
577579
private static ClusterService createMockClusterService() {
578580
var service = mock(ClusterService.class);
579581
doReturn(new ClusterName("test-cluster")).when(service).getClusterName();
582+
doReturn(Settings.EMPTY).when(service).getSettings();
583+
584+
// Create ClusterSettings with the required inference settings
585+
var clusterSettings = new ClusterSettings(Settings.EMPTY, new java.util.HashSet<>(InferenceSettings.getSettings()));
586+
doReturn(clusterSettings).when(service).getClusterSettings();
587+
580588
return service;
581589
}
582590

x-pack/plugin/esql/qa/testFixtures/src/main/resources/completion.csv-spec

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,17 @@ title:text | completion:keyword
5959
War and Peace | THIS IS A PROMPT: WAR AND PEACE
6060
War and Peace (Signet Classics) | THIS IS A PROMPT: WAR AND PEACE (SIGNET CLASSICS)
6161
;
62+
63+
completion followed by stats
64+
required_capability: completion
65+
required_capability: match_operator_colon
66+
67+
FROM books METADATA _score
68+
| WHERE title:"war and peace" AND author:"Tolstoy"
69+
| COMPLETION CONCAT("This is a prompt: ", title) WITH { "inference_id" : "test_completion" }
70+
| STATS count=COUNT(*), avg_completion_length = AVG(LENGTH(completion))
71+
;
72+
73+
count:long | avg_completion_length:double
74+
4 | 50.75
75+
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/rerank.csv-spec

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,3 +294,18 @@ The Lord of the Rings - Boxed Set | 3.76885509490
294294
Return of the King Being the Third Part of The Lord of the Rings | 3.6248698234558105 | 9.000900317914784E-4 | 0.001396648003719747
295295
// end::combine-result[]
296296
;
297+
298+
reranker followed by stats
299+
required_capability: rerank
300+
required_capability: match_operator_colon
301+
302+
FROM books METADATA _score
303+
| WHERE title:"war and peace" AND author:"Tolstoy"
304+
| SORT _score DESC, book_no ASC
305+
| RERANK "war and peace" ON title WITH { "inference_id" : "test_reranker" }
306+
| STATS count_book = COUNT(*) WHERE _score >= 0.03
307+
;
308+
309+
count_book:long
310+
2
311+
;

0 commit comments

Comments
 (0)