Incorporated some of the comments

pabel-rh · pabel-rh · commit 1e6155a0b0f5 · 2025-10-31T15:23:44.000+05:30
diff --git a/modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc b/modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc
@@ -3,15 +3,17 @@
 [id="con-about-lightspeed-stack-and-llama-stack_{context}"]
 = About {lcs-name} and Llama Stack
 
-The **{lcs-name} ({lcs-short})** and Llama Stack deploy together as sidecar containers to augment {rhdh-short} functionality.
+The {lcs-name} ({lcs-short}) and Llama Stack deploy together as sidecar containers to augment {rhdh-short} functionality.
 
-{lcs-short} acts as an intermediary service layer for interfacing with Large Language Model (LLM) providers. {lcs-short} handles LLM provider setup, authentication, and includes key functionalities such as question validation, user feedback collection, and Retrieval Augmented Generation (RAG).
+{lcs-short} serves as the Llama Stack service intermediary, managing configurations for key components. These components include the Large Language Model (LLM), inference providers, tool runtime providers, safety providers, and Retrieval Augmented Generation (RAG) settings.
 
-Llama Stack provides the model server functionality that {lcs-short} uses to process requests. This service requires a Kubernetes Secret to securely store environment variables for the chosen LLM provider.
+* **{lcs-short}** manages authentication, user feedback collection, MCP server configuration, and caching.
 
-The {ls-brand-name} plugin within {rhdh-short} communicates with the {lcs-short} sidecar to send prompts and receives responses from the configured LLM service. The {lcs-short} sidecar centralizes the LLM interaction logic and configuration alongside your {rhdh-short} instance.
+* Llama Stack provides the inference functionality that {lcs-short} uses to process requests. The service requires a **Kubernetes Secret** to securely store environment variables for the chosen LLM provider.
+
+* The {ls-brand-name} plugin in {rhdh-short} sends prompts and receives LLM responses through the {lcs-short} sidecar. {lcs-short} then uses the Llama Stack sidecar service to perform inference and MCP tool calling.
 
 [NOTE]
 ====
-{ls-brand-name} is a Developer Preview release. You must manually deploy the {lcs-name} and Llama Stack sidecar containers, and then install the {ls-brand-name} plugin on your {rhdh-short} instance.
+{ls-brand-name} is a Developer Preview release. You must manually deploy the {lcs-name} and Llama Stack sidecar containers, and install the {ls-brand-name} plugin on your {rhdh-short} instance.
 ====
diff --git a/modules/developer-lightspeed/con-rag-embeddings.adoc b/modules/developer-lightspeed/con-rag-embeddings.adoc
@@ -3,6 +3,6 @@
 [id="con-rag-embeddings_{context}"]
 = Retrieval Augmented Generation embeddings
 
-The {product} documentation set has been added as a Retrieval-Augmented Generation (RAG) data source.
+The {product} documentation set serves as the Retrieval-Augmented Generation (RAG) data source.
 
-The {lcs-name} ({lcs-short}) sidecar handles the RAG process, using specialized Llama Stack components to manage the documentation data and generate vector embeddings.
+RAG initialization occurs through an init container, which copies the RAG data to a shared volume. The Llama Stack sidecar then mounts this volume to access the data. The Llama Stack service uses the resulting RAG embeddings in the vector database as a tool. This tool allows the service to provide references to production documentation during the inference process.
diff --git a/modules/developer-lightspeed/con-supported-architecture.adoc b/modules/developer-lightspeed/con-supported-architecture.adoc
@@ -3,14 +3,9 @@
 [id="con-supported-architecture_{context}"]
 = Supported architecture for {ls-brand-name}
 
-{ls-short} is available as a plugin on all platforms that host {product-very-short}. It requires the use of {lcs-name}, which runs as a sidecar container. {lcs-short} acts as the intermediary layer for interfacing with Large Language Model (LLM) providers.
+{ls-short} is available as a plugin on all platforms that host {product-very-short}. It requires two sidecar containers: the {lcs-name} and the Llama Stack service.
 
-{ls-short} handles functionalities like question validation, feedback, and Retrieval-Augmented Generation (RAG) by leveraging components often built with Llama Stack.
-
-[NOTE]
-====
-Currently, the provided {lcs-short} image is built for x86 platforms. To use other platforms (for example, arm64), you must ensure that you enable emulation.
-====
+The {lcs-short} container acts as the intermediary layer, which interfaces with and manages the Llama Stack service.
 
 .Additional resources
 * link:https://access.redhat.com/support/policy/updates/developerhub[{product} Life Cycle and supported platforms]
diff --git a/modules/developer-lightspeed/proc-configuring-mcp-components.adoc b/modules/developer-lightspeed/proc-configuring-mcp-components.adoc
@@ -3,11 +3,12 @@
 [id="proc-configure-mcp-components_{context}"]
 = Configuring the Model Context Protocol (MCP)
 
-You can configure the Model Context Protocol (MCP) to leverage {ls-short}. With MCP configuration, the deployed {lcs-short} and Llama Stack server utilize external tools and retrieve real-time data. This allows the virtual assistant to execute complex actions and incorporate current operational context into its responses. You must synchronize settings across three components: the Llama Stack tool definition, the {lcs-short} server endpoint definition, and the {ls-brand-name} Plugin authorization token.
+To leverage Model Context Protocol (MCP) servers, you must configure them within the Llama Stack service. Configuring the MCP in {lcs-short} enables the deployed {lcs-short} and Llama Stack to use external tools and retrieve real-time data. This integration allows the virtual assistant to execute complex actions and incorporate current operational context into its responses.
+
+Configuration requires synchronizing settings across three components: the Llama Stack tool definition, the {lcs-short} MCP server endpoint definition, and the {ls-brand-name} plugin MCP authorization token.
 
 .Procedure
-. Configure the {llama-name} tool definition.
-Define the MCP provider in the `tool_runtime` section of the Llama Stack configuration file (`run.yaml`).
+. Configure the Llama Stack tool definition by defining the MCP provider in the `tool_runtime` section of the Llama Stack configuration file (`run.yaml`), as shown in the following code:
 +
 [source,yaml]
 ----
@@ -18,24 +19,32 @@ providers:
       config: {}
 ----
 
-. Configure the {lcs-short} server endpoint.
-In the {lcs-short} configuration file (`lightspeed-stack.yaml`), define the MCP server endpoints. The `provider_id` must** match the Llama Stack definition (`model-context-protocol`).
-
+. Configure the {lcs-short} MCP server endpoint by defining the MCP server endpoints in the {lcs-short} configuration file (`lightspeed-stack.yaml`), as shown in the following code:
++
 [source,yaml]
 ----
 mcp_servers:
   - name: mcp::backstage
     provider_id: model-context-protocol
     url: https://rhdh-mcp-proxy-apicast-production.apps.rosa.redhat-ai-dev.m6no.p3.openshiftapps.com:443/api/mcp-actions/v1
 ----
++
+[IMPORTANT]
+====
+`provider_id` must match the Llama Stack definition (`model-context-protocol`).
+====
 
-. Configure the {ls-brand-name} plugin authorization token.
-In the {ls-brand-name} plugin configuration file (`lightspeed-app-config.yaml`), specify the MCP servers and provide the token for authorization. This token is used when the plugin makes requests to the {lcs-short} `/v1/streaming_query` endpoint.
+. Configure the {ls-brand-name} plugin authorization token by specifying the MCP servers and providing the token for authentication in the {ls-brand-name} plugin configuration file (`lightspeed-app-config.yaml`), as shown in the following code:
 +
 [source,yaml]
 ----
 lightspeed:
   mcpServers:
   - name: mcp::backstage
     token: ${MCP_TOKEN}
-----
+----
++
+[IMPORTANT]
+====
+`name` must match the {lcs-name} definition (`mcp::backstage`).
+====
diff --git a/modules/developer-lightspeed/proc-gathering-feedback.adoc b/modules/developer-lightspeed/proc-gathering-feedback.adoc
@@ -3,11 +3,15 @@
 [id="proc-gathering-feedback_{context}"]
 = Gathering feedback in {ls-short}
 
-Feedback collection is an optional feature configured on the {lcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window. {lcs-short} gathers the feedback, along with the user's query and the response of the model, and stores it as a JSON file within the local file system of the Pod for later collection and analysis by the platform administrator. This can be useful for assessing model performance and improving your users' experience. The collected feedback is stored in the cluster where {product-very-short} and {lcs-short} are deployed, and as such, is only accessible by the platform administrators for that cluster. For users that intend to have their data removed, they must request their respective platform administrator to perform that action as {company-name} does not collect (or have access to) any of this data.
+Feedback collection is an optional feature configured on the {lcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window. 
+
+{lcs-short} collects the feedback, the user's query, and the response of the model, storing the data as a JSON file on the local file system of the Pod. A platform administrator must later collect and analyze this data to assess model performance and improve the user experience.
+
+The collected data resides in the cluster where {product-very-short} and {lcs-short} are deployed, making it accessible only to platform administrators for that cluster. For data removal, users must request this action from their platform administrator, as {company-name} neither collects nor accesses this data.
 
 .Procedure
 
-* To enable or disable feedback, in your {lcs-short} configuration file, add the following settings:
+. To enable or disable feedback collection, in the {lcs-short} configuration file (`lightspeed-stack.yaml`), add the following settings:
 +
 [source,yaml]
 ----
@@ -16,6 +20,8 @@ llm_providers:
 lightspeed_stack:
    ......
   user_data_collection:
-    feedback_disabled: <true/false>
-    feedback_storage: "/app-root/tmp/data/feedback"
+    feedback_enabled: true
+    feedback_storage: "/tmp/data/feedback"
+    transcripts_enabled: true
+    transcripts_storage: "/tmp/data/transcripts"
 ----
diff --git a/modules/developer-lightspeed/proc-installing-and-configuring-lightspeed.adoc b/modules/developer-lightspeed/proc-installing-and-configuring-lightspeed.adoc
diff --git a/modules/developer-lightspeed/proc-using-question-validation.adoc b/modules/developer-lightspeed/proc-using-question-validation.adoc