Skip to content

Commit 808c553

Browse files
authored
Update open-inference-protocol-v2.openapi.yaml (#6925)
added the details of how to avoid the deadlock note
1 parent 2341148 commit 808c553

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

docs-gb/apis/inference/open-inference-protocol-v2.openapi.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,23 @@ paths:
136136
inferencing. The model name and (optionally) version must be available
137137
in the URL. If a version is not provided the server may choose a version
138138
based on its own policies.
139+
140+
The endpoint model readiness endpoints report that an individual model is
141+
loaded and ready to serve. It is intended only to give customers visibility
142+
into the model’s state and is not intended to be used as a Kubernetes
143+
readiness probe for the MLServer container.
144+
145+
If you use a model-specific health endpoint for the container readiness probe
146+
this can cause a deadlock based on the current implementation, because
147+
- the Seldon agent does not begin model download until the Pod’s IP is visible in endpoints;
148+
– the IP of the Pod is only published after the Pod is Ready or all internal readiness checks have passed;
149+
– the MLServer container only becomes Ready once the model is loaded.
150+
151+
This would result in the agent never downloading the model and the Pod never becoming Ready.
152+
153+
For container-level readiness checks we recommend the server-level readiness
154+
endpoints. These indicate that the MLServer process is up and accepting
155+
health checks and does not deadlock the agent/model loading flow.
139156
x-gitbook-description-document:
140157
object: document
141158
data:

0 commit comments

Comments
 (0)