-
Notifications
You must be signed in to change notification settings - Fork 26
feat(fai-chat): prompt and retriev #5449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: app
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
🚀 FAI Chat Lambda Preview DeployedYour Lambda function has been deployed to a preview environment! 🔗 Preview URL: https://5hbbxu66vi.execute-api.us-east-1.amazonaws.com/dev2 📝 Available Endpoints:
📋 Example Usage: # Test health endpoint
curl "https://5hbbxu66vi.execute-api.us-east-1.amazonaws.com/dev2/health"
# Test chat endpoint (currently returns hardcoded response)
curl -X POST "https://5hbbxu66vi.execute-api.us-east-1.amazonaws.com/dev2/chat"🏷️ Stack Name: ℹ️ Note: This preview will be automatically destroyed when the PR is closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Suggestion:
The product and version attributes are requested from Turbopuffer but never extracted into the document metadata, so they'll always be empty when accessed by other code.
View Details
📝 Patch Details
diff --git a/servers/fai-lambda/fai-chat/src/retrieval/turbopuffer_retriever.py b/servers/fai-lambda/fai-chat/src/retrieval/turbopuffer_retriever.py
index 5d92990a3..b37be2c18 100644
--- a/servers/fai-lambda/fai-chat/src/retrieval/turbopuffer_retriever.py
+++ b/servers/fai-lambda/fai-chat/src/retrieval/turbopuffer_retriever.py
@@ -22,7 +22,7 @@ from .turbopuffer_query_filters import (
build_turbopuffer_filters,
)
-TURBOPUFFER_INCLUDE_ATTRIBUTES = ["document", "title", "url", "id", "product", "version"]
+TURBOPUFFER_INCLUDE_ATTRIBUTES = ["document", "title", "url", "id", "product", "version", "source"]
class TurbopufferRetriever(RAGRetriever):
@@ -263,6 +263,12 @@ class TurbopufferRetriever(RAGRetriever):
metadata["title"] = row.title
if hasattr(row, "url") and row.url:
metadata["url"] = row.url
+ if hasattr(row, "product") and row.product:
+ metadata["product"] = row.product
+ if hasattr(row, "version") and row.version:
+ metadata["version"] = row.version
+ if hasattr(row, "source") and row.source:
+ metadata["source"] = row.source
score = getattr(row, "$dist", 0.0)
Analysis
Missing metadata extraction for product, version, and source fields
What fails: The _parse_turbopuffer_results() method in TurbopufferRetriever only extracts title and url from Turbopuffer query results, even though product, version, and source attributes are included in TURBOPUFFER_INCLUDE_ATTRIBUTES and requested from Turbopuffer. Code in system.py and documentation_search.py attempts to access these fields from document metadata, but they are always empty.
How to reproduce:
- Call any retrieval method (retrieve, batch_retrieve, etc.) that queries Turbopuffer with TURBOPUFFER_INCLUDE_ATTRIBUTES
- Access the returned RetrievedDocument metadata in system.py line 130 or documentation_search.py for
productfield - The value will be empty string or None even if product data exists in Turbopuffer
Example code that demonstrates the issue:
# In turbopuffer_retriever.py _parse_turbopuffer_results method (lines 252-278)
# Current behavior: product and version are never extracted
metadata = {}
if hasattr(row, "title") and row.title:
metadata["title"] = row.title
if hasattr(row, "url") and row.url:
metadata["url"] = row.url
# product and version from row are ignored
# In system.py line 130 that tries to use the metadata:
product = doc.metadata.get("product", "") if doc.metadata else "" # Always returns ""What should happen: When Turbopuffer returns rows with product, version, and source attributes, these should be extracted and added to the metadata dictionary in _parse_turbopuffer_results(), similar to how title and url are currently handled. Additionally, source should be added to TURBOPUFFER_INCLUDE_ATTRIBUTES since it is accessed in the codebase but not currently requested.
Verification:
- Turbopuffer query API documentation confirms that any attribute can be requested via
include_attributesand will be present in the response - Current code in system.py line 130 and documentation_search.py explicitly accesses these fields from metadata
No description provided.