-
Notifications
You must be signed in to change notification settings - Fork 838
fix(exp): Add run in github experiment #3459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a sample-app script to run GPT‑4 research tasks and extends the Experiment SDK with GitHub-backed bulk run flows, new Pydantic models for aggregated task results and GitHub context, and a concurrency helper to execute tasks locally before submitting results to the backend. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Sample App\n(run_research_experiment.py)
participant SDK as Experiment SDK\n(experiment.py)
participant Worker as Local Executor
participant OpenAI as OpenAI API
participant Backend as Traceloop Backend
App->>SDK: init Traceloop client
App->>App: main() coroutine
App->>SDK: client.experiment.run(... or run_in_github ...)
activate SDK
rect rgb(248,250,255)
note over SDK: choose flow based on env (GITHUB_ACTIONS / GITHUB_REF)
end
SDK->>SDK: parse dataset -> rows
SDK->>Worker: _execute_tasks(rows, task)
activate Worker
loop per row (concurrent)
Worker->>App: call research_task(row)
App->>OpenAI: generate_research_response(question) (async)
OpenAI-->>App: assistant response
App-->>Worker: return TaskResult (input, output) or error
end
deactivate Worker
rect rgb(235,255,235)
note over SDK: build RunInGithubRequest with GithubContext (if GitHub) or prepare local results
end
SDK->>Backend: submit run (task_results + context/metadata)
Backend-->>SDK: RunInGithubResponse (experiment_id, experiment_slug, run_id)
deactivate SDK
SDK-->>App: return run details
App->>App: print experiment slug & run ID
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
packages/sample-app/sample_app/experiment/run_research_experiment.py (2)
55-78: Running this script locally will raise due torun_in_githubguardBecause
main()always callsclient.experiment.run_in_github, running this module directly outside GitHub Actions will raiseRuntimeError. That’s acceptable for a CI-only sample, but if you expect local runs, consider:
- Catching the
RuntimeErrorinmain()and printing a friendly message, or- Adding a short note in the module docstring that it must be run in GitHub Actions.
23-40: Reuse module-levelAsyncOpenAIclient and add error handling for API callsCreating a new
AsyncOpenAIclient on every call adds connection overhead. Create a single module-level client instance and reuse it across function calls. Additionally, wrapchat.completions.createcalls in error handling to catch transient failures.Recommended improvements:
- Use
async with AsyncOpenAI() as client:to ensure automatic client cleanup, or create a module-level instance with explicit shutdown.- Configure timeout (default 5sec is too short for LLM calls; use timeout=60 for typical requests).
- Catch specific exceptions:
APIConnectionError,APITimeoutError,RateLimitError,APIStatusErrorto handle network issues, rate limits, and API errors appropriately.packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
37-94:experiment_metadataplumbing inrun()and_init_experiment()looks correctThe new
experiment_metadataparameter is threaded fromrun()into_init_experiment()and then intoInitExperimentRequest, matching the existing pattern used forexperiment_run_metadata. Only minor nits:
- Docstring line “(an experiment holds all the experinent runs)” has a small typo (“experinent”).
Otherwise this change is consistent and safe.
277-299: HTTP error handling is reasonable; consider a library-specific exception typePosting to
"/experiments/run-in-github"and raising a genericExceptionwhenresponse is Noneis functionally correct given the currentHTTPClient.postcontract. For clearer error handling and easier catching by SDK consumers, you might eventually:
- Introduce an SDK-specific exception type (e.g.,
ExperimentRequestError) and raise that instead of bareException.- Optionally include the HTTP status/error details in the exception where available.
This is a polish item and not blocking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/sample-app/sample_app/experiment/run_research_experiment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(7 hunks)packages/traceloop-sdk/traceloop/sdk/experiment/model.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/experiment/model.pypackages/sample-app/sample_app/experiment/run_research_experiment.pypackages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (3)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (4)
RunInGithubRequest(88-98)RunInGithubResponse(101-106)TaskResult(71-76)GithubContext(79-85)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
get_version_jsonl(165-170)packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
post(23-37)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
223-226: Avoid specifying long messages outside the exception class
(TRY003)
247-250: Avoid specifying long messages outside the exception class
(TRY003)
294-296: Create your own exception
(TRY002)
294-296: Avoid specifying long messages outside the exception class
(TRY003)
387-387: Do not catch blind exception: Exception
(BLE001)
406-406: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
🔇 Additional comments (1)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
15-20: Good use of env vars for Traceloop client configuration
TRACELOOP_API_KEYandTRACELOOP_BASE_URLare correctly sourced from environment variables, so no secrets are hardcoded and this aligns with the Python guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 4048c6c in 1 minute and 59 seconds. Click for details.
- Reviewed
36lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:241
- Draft comment:
The function now checks that GITHUB_EVENT_NAME equals 'pull_request'. Ensure this restriction is intentional; consider documenting or supporting additional event types (e.g., 'pull_request_target') if needed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates multiple rules: 1) It asks the author to "ensure" something is intentional, which is explicitly mentioned as not useful in the rules ("Do NOT ask the PR author to confirm their intention"). 2) It's speculative - it suggests "consider...supporting additional event types" without clear evidence that this is needed. 3) The comment doesn't point to a definite bug or issue - it's more of a "have you thought about X?" type of comment. The function name isrun_in_githuband the error message clearly states it's for pull_request workflows, so the restriction appears deliberate. The comment doesn't provide strong evidence of an actual problem. Perhaps the author intentionally restricted this to pull_request events for a specific reason, and suggesting support for pull_request_target could be valid if there's a clear use case. However, without evidence that this is actually needed or that the current restriction causes problems, this is just speculation. The critique doesn't change my assessment. The comment is still asking the author to confirm their intention ("ensure this restriction is intentional") and making speculative suggestions ("consider...supporting additional event types") without demonstrating a clear problem. These are exactly the types of comments the rules say to remove. This comment should be deleted because it asks the author to confirm their intention and makes speculative suggestions without clear evidence of a problem. It violates the rule against asking authors to "ensure" things are intentional.
2. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:400
- Draft comment:
The try/except block for individual task execution was removed. Without this, any unexpected exception from a task may abort the entire _execute_tasks execution. Confirm if this change is intentional. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_uJCerDA2pP8b4vVP
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
43-52: Previous typo has been corrected.The misspelled key
"sentenece"reported in earlier reviews has been fixed to"sentence"on line 51.packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (3)
244-250: Previous validation concerns have been addressed.The explicit check for
github_event_name == "pull_request"(line 245) ensures this method only runs in the correct GitHub workflow context, addressing the previous review's concern about insufficient event verification.
252-264: Previous type safety concerns have been addressed.The validation sequence now ensures
pr_numberis extracted and verified (lines 255-258) before constructingpr_url(line 264), eliminating the type mismatch risk flagged in earlier reviews.
411-413: Previous bug in exception handling has been fixed.The broken
completed_task.task_inputaccess and redundant outer try/except block reported in earlier reviews have been removed. The code now correctly collectsTaskResultobjects returned byrun_single_row, which already handles exceptions internally.
🧹 Nitpick comments (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
23-40: Consider reusing the AsyncOpenAI client instance.Creating a new
AsyncOpenAIclient on every call togenerate_research_responseadds unnecessary overhead. The OpenAI client is designed to be reused and maintains its own connection pool.Refactor to create the client once at module level:
from traceloop.sdk import Traceloop +from openai import AsyncOpenAI # Initialize Traceloop client client = Traceloop.init( app_name="research-experiment-ci-cd", api_key=os.getenv("TRACELOOP_API_KEY"), api_endpoint=os.getenv("TRACELOOP_BASE_URL"), ) +# Initialize OpenAI client once +openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) + async def generate_research_response(question: str) -> str: """Generate a research response using OpenAI""" - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) - response = await openai_client.chat.completions.create(packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
302-306: Consider adding more specific error details.When the API request fails,
self._http_client.postreturnsNoneand prints the actual error to stderr. The exception message here doesn't preserve the underlying cause.Consider capturing and including the underlying error details if the HTTP client can be enhanced to return/raise them, or at minimum advise users to check stderr for the actual failure reason:
if response is None: raise Exception( - f"Failed to submit experiment '{experiment_slug}' for GitHub execution. " + f"Failed to submit experiment '{experiment_slug}' for GitHub execution. " + "Check stderr output above for the detailed error from the API request." )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/experiment/run_research_experiment.pypackages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧬 Code graph analysis (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
Traceloop(37-275)init(49-206)get(259-275)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
run_in_github(188-307)run(37-186)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
get_version_jsonl(165-170)packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
post(23-37)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
223-226: Avoid specifying long messages outside the exception class
(TRY003)
246-250: Avoid specifying long messages outside the exception class
(TRY003)
259-262: Avoid specifying long messages outside the exception class
(TRY003)
303-305: Create your own exception
(TRY002)
303-305: Avoid specifying long messages outside the exception class
(TRY003)
396-396: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (4)
packages/sample-app/sample_app/experiment/run_research_experiment.py (2)
16-20: LGTM: Proper secret handling.The client initialization correctly retrieves API credentials from environment variables, adhering to the security guideline against hardcoding secrets.
55-74: LGTM: Clean CI/CD orchestration.The main function correctly orchestrates the GitHub experiment flow. Letting exceptions propagate is appropriate for CI/CD scripts, as it will fail the workflow with a clear error message.
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
44-44: LGTM: Clean API extension for experiment metadata.The
experiment_metadataparameter is properly added to therun()signature, documented, and correctly passed through to_init_experiment.Also applies to: 57-57, 92-92
389-400: LGTM: Appropriate exception handling pattern.The broad
Exceptioncatch on line 396 is the correct pattern here. It ensures that any failure in user-provided task functions is captured and converted to aTaskResultwith the error field set, allowing the experiment to continue processing remaining rows.
|
|
||
| return results, errors | ||
|
|
||
| async def run_in_github( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a private method and then in the general run you decide whether to call run_in_github or run_local based on the env vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 6764781 in 1 minute and 9 seconds. Click for details.
- Reviewed
85lines of code in2files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/experiment/run_research_experiment.py:1
- Draft comment:
The file docstring still mentions 'run_in_github' even though the code now calls 'run()'. Please update the documentation to reflect the new usage. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:37
- Draft comment:
The 'run' method’s return type depends on the environment (tuple for local runs vs. RunInGithubResponse in GitHub Actions). Update the type annotation and docstring to clarify this behavior. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:37
- Draft comment:
Document the additional parameters ('aux', 'stop_on_error', 'wait_for_results') in the 'run' method's docstring. Also, note that these parameters are only applicable for local execution. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:65
- Draft comment:
Clarify that when running in GitHub Actions (i.e. when GITHUB_ACTIONS is true), parameters such as 'experiment_metadata', 'aux', 'stop_on_error', and 'wait_for_results' are not forwarded and are thus unused. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, explaining the behavior of the code when running in GitHub Actions. It does not suggest any changes, ask for confirmation, or point out potential issues. It violates the rule against making purely informative comments.
Workflow ID: wflow_vHaVuqKywUYrVmUm
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
37-86: Return type annotation is incorrect - method returns different types.The
run()method is annotated to returnTuple[List[TaskResponse], List[str]](line 49), but:
_run_in_github()returnsRunInGithubResponse(line 248)_run_locally()returnsTuple[List[TaskResponse], List[str]](line 101)This breaks type safety and will cause issues for callers expecting a consistent return type.
Consider using a Union type or introducing a wrapper response type:
+from typing import Union + async def run( self, task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]], dataset_slug: Optional[str] = None, dataset_version: Optional[str] = None, evaluators: Optional[List[EvaluatorDetails]] = None, experiment_slug: Optional[str] = None, experiment_metadata: Optional[Dict[str, Any]] = None, related_ref: Optional[Dict[str, str]] = None, aux: Optional[Dict[str, str]] = None, stop_on_error: bool = False, wait_for_results: bool = True, -) -> Tuple[List[TaskResponse], List[str]]: +) -> Union[Tuple[List[TaskResponse], List[str]], RunInGithubResponse]:Alternatively, consider a unified response model that both paths return.
🧹 Nitpick comments (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
23-40: Consider handling potentialNonecontent from OpenAI response.
response.choices[0].message.contentcan beNoneif the model returns no content (e.g., when using function calls or tool use). Since this is a sample app demonstrating the workflow, this may be acceptable, but be aware it could cause downstream issues ifNonepropagates to evaluators.packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
424-438: Docstring mentions non-existent parameterstop_on_error.The docstring at line 434 documents a
stop_on_errorparameter that doesn't exist in the method signature.async def _execute_tasks( self, rows: List[Dict[str, Any]], task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]], ) -> List[TaskResult]: """Execute tasks locally with concurrency control Args: rows: List of dataset rows to process task: Function to run on each row - stop_on_error: Whether to stop on first error Returns: List of TaskResult objects with inputs, outputs, and errors """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.pypackages/sample-app/sample_app/experiment/run_research_experiment.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (3)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (5)
RunInGithubRequest(88-98)RunInGithubResponse(101-106)TaskResult(71-76)GithubContext(79-85)TaskResponse(10-15)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
get_version_jsonl(165-170)packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
post(23-37)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
275-278: Avoid specifying long messages outside the exception class
(TRY003)
298-302: Avoid specifying long messages outside the exception class
(TRY003)
311-314: Avoid specifying long messages outside the exception class
(TRY003)
355-357: Create your own exception
(TRY002)
355-357: Avoid specifying long messages outside the exception class
(TRY003)
448-448: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
🔇 Additional comments (6)
packages/sample-app/sample_app/experiment/run_research_experiment.py (2)
16-20: LGTM - API keys properly sourced from environment variables.The client initialization correctly retrieves sensitive values from environment variables, following security best practices. Based on coding guidelines, this is the correct approach for handling API keys.
43-52: LGTM - Typo has been fixed.The previously flagged typo
"sentenece"has been corrected to"sentence".packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (4)
4-20: LGTM - Imports are appropriate for the new GitHub integration.The added imports for
osand the new model types (RunInGithubRequest,RunInGithubResponse,TaskResult,GithubContext) align with the new functionality.
89-238: LGTM - Local execution logic is well-structured.The
_run_locally()method properly handles:
- Experiment initialization with metadata
- Concurrent task execution with semaphore (50 concurrent tasks)
- Error aggregation with optional
stop_on_errorbehavior- Both synchronous and async evaluator execution modes
291-323: GitHub context handling is now robust.The implementation properly:
- Validates
GITHUB_EVENT_NAME == "pull_request"before proceeding (line 297)- Extracts PR number from
GITHUB_REFonly after validation- Ensures all required values exist before constructing
GithubContextThis addresses the previous review concerns about
pr_urlbeingNoneand insufficient event verification.
461-467: Previoustask_inputbug has been fixed.The exception handling has been simplified - errors are now captured within
run_single_row()and returned asTaskResultobjects with theerrorfield set. This removes the need for the outer try/except that previously referenced the non-existentcompleted_task.task_inputattribute.
packages/sample-app/sample_app/experiment/run_research_experiment.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed d36d712 in 1 minute and 15 seconds. Click for details.
- Reviewed
136lines of code in1files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:39
- Draft comment:
Task parameter now expects an async function (Awaitable). Ensure documentation clearly states that 'task' must be async. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:170
- Draft comment:
Good explicit type annotation for 'eval_results' (Dict[str, Union[Dict[str, Any], str]]); this removes the need for previous type ignores. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:396
- Draft comment:
Allowing 'task_input' to be Optional is beneficial. Ensure that all callers handle None values appropriately. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:313
- Draft comment:
Robust GitHub environment validation is implemented; consider logging details when required env vars (e.g. GITHUB_REF, GITHUB_REPOSITORY) are missing to aid debugging. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:461
- Draft comment:
There's an inconsistency: 'run_single_row' accepts an Optional row while 'run_with_semaphore' types row as Dict. Consider aligning these for type consistency. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
6. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:350
- Draft comment:
The use of 'related_ref' passed as 'experiment_run_metadata' may be confusing. Consider aligning the naming for clarity. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_ALse0DALmd5g2igu
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
24-53: Consider reusing theAsyncOpenAIclient instead of creating it per call
generate_research_responseinstantiatesAsyncOpenAIon every invocation, which is fine for a sample but adds overhead if the dataset is large. You could create a single module-level async client (similarly to the Traceloop client) and reuse it insidegenerate_research_responsefor better efficiency.Example shape:
-from openai import AsyncOpenAI +from openai import AsyncOpenAI @@ -async def generate_research_response(question: str) -> str: - """Generate a research response using OpenAI""" - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) +openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) + + +async def generate_research_response(question: str) -> str: + """Generate a research response using OpenAI""" @@ - response = await openai_client.chat.completions.create( + response = await openai_client.chat.completions.create(Please confirm against the current AsyncOpenAI client docs that reusing a single client instance matches recommended usage for your version.
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
428-471: Fix_execute_tasksdocstring and consider tightening exception handlingTwo small issues here:
The docstring mentions a
stop_on_errorargument, but_execute_tasksonly takesrowsandtask. That’s misleading and looks like a copy/paste from_run_locally. Please update or remove that line so the docstring matches the actual signature.The inner
run_single_rowcurrently uses a bareexcept Exceptionand wraps the error intoTaskResult.error. That’s acceptable for a defensive boundary, but if you know which exceptions are expected fromtask, narrowing the exception type (or at least logging unexpected ones separately) would improve debuggability and align better with linting guidance.- Args: - rows: List of dataset rows to process - task: Function to run on each row - stop_on_error: Whether to stop on first error + Args: + rows: List of dataset rows to process + task: Function to run on each rowPlease confirm against your linting configuration (e.g., Ruff/Flake8 rules like BLE001) how strict you want to be about broad
Exceptioncatches in this module.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/sample-app/sample_app/experiment/run_research_experiment.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/experiment/run_research_experiment.pypackages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/experiment/run_research_experiment.py (2)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (1)
RunInGithubResponse(101-106)packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
run(37-89)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
200-200: Do not catch blind exception: Exception
(BLE001)
201-201: Use explicit conversion flag
Replace with conversion flag
(RUF010)
278-281: Avoid specifying long messages outside the exception class
(TRY003)
301-305: Avoid specifying long messages outside the exception class
(TRY003)
314-317: Avoid specifying long messages outside the exception class
(TRY003)
359-361: Create your own exception
(TRY002)
359-361: Avoid specifying long messages outside the exception class
(TRY003)
452-452: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
🔇 Additional comments (1)
packages/sample-app/sample_app/experiment/run_research_experiment.py (1)
56-77: Main experiment flow and GitHub/local handling look solidThe
main()coroutine wires the sample together cleanly, and theisinstance(response, RunInGithubResponse)guard correctly handles the union return type fromclient.experiment.runin GitHub vs local environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 49e92f2 in 1 minute and 4 seconds. Click for details.
- Reviewed
50lines of code in1files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:73
- Draft comment:
Good: The run() method now correctly passes the 'aux' parameter to _run_in_github. Ensure tests validate that auxiliary metadata is handled as expected. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:247
- Draft comment:
The signature of _run_in_github now includes the 'aux' parameter with the proper type. This maintains consistency across calls. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:330
- Draft comment:
The new experiment_run_metadata dictionary correctly aggregates both 'related_ref' and 'aux'. Consider including documentation on the expected structure for 'aux' if not already documented elsewhere. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:347
- Draft comment:
The RunInGithubRequest payload now correctly passes experiment_run_metadata (which includes both 'related_ref' and 'aux'). This fixes the prior issue of only passing related_ref. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_xH5s1SiOFnsvFUFa
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
68-77:experiment_metadatais still not passed to_run_in_github.The
run()method acceptsexperiment_metadata(line 44) and correctly passes it to_run_locally(line 85), but it's omitted when calling_run_in_github. Callers who supplyexperiment_metadatawill have it silently ignored in GitHub runs.if os.getenv("GITHUB_ACTIONS"): return await self._run_in_github( task=task, dataset_slug=dataset_slug, dataset_version=dataset_version, evaluators=evaluators, experiment_slug=experiment_slug, + experiment_metadata=experiment_metadata, related_ref=related_ref, aux=aux, )
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
454-465: Consider aligning type hints for consistency.
run_single_rowacceptsOptional[Dict[str, Any]](line 454), butTaskResult.inputis typed asDict[str, Any](non-optional) per the model. IfrowwereNone, Pydantic validation would fail at line 458/463.Since
rowsisList[Dict[str, Any]]and never containsNone, this works in practice, but the type hints are inconsistent.- async def run_single_row(row: Optional[Dict[str, Any]]) -> TaskResult: + async def run_single_row(row: Dict[str, Any]) -> TaskResult:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (4)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (5)
RunInGithubRequest(88-98)RunInGithubResponse(101-106)TaskResult(71-76)GithubContext(79-85)TaskResponse(10-15)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
get_version_jsonl(170-175)packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
post(23-37)packages/traceloop-sdk/traceloop/sdk/fetcher.py (1)
post(64-65)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
201-201: Do not catch blind exception: Exception
(BLE001)
202-202: Use explicit conversion flag
Replace with conversion flag
(RUF010)
281-284: Avoid specifying long messages outside the exception class
(TRY003)
304-308: Avoid specifying long messages outside the exception class
(TRY003)
317-320: Avoid specifying long messages outside the exception class
(TRY003)
368-370: Create your own exception
(TRY002)
368-370: Avoid specifying long messages outside the exception class
(TRY003)
461-461: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (4)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (4)
1-21: LGTM!Imports are appropriate for the new GitHub execution feature. The new model imports and type additions align with the async patterns and new API contracts.
92-240: LGTM!The local execution path is well-structured with proper concurrency control via semaphore and comprehensive error handling. The evaluation flow correctly handles both sync and async modes.
242-372: LGTM! Past review concerns have been addressed.The implementation now correctly:
- Validates
GITHUB_EVENT_NAME == "pull_request"before proceeding (line 303)- Ensures
pr_urlis always set by requiringpr_numberextraction success (line 316)- Merges user-provided
experiment_metadatawithcreated_from: "github"(lines 331-334)- Uses the same
experiment_run_metadataconstruction pattern as_run_locally(lines 336-340)
374-418: LGTM!The
_init_experimentand_create_taskupdates correctly support the new metadata propagation. TheOptional[Dict[str, Any]]type fortask_inputappropriately handles cases where experiments may run without input data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 724f0d5 in 1 minute and 18 seconds. Click for details.
- Reviewed
12lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:444
- Draft comment:
Removed 'stop_on_error' from the docstring since it is not an accepted parameter in _execute_tasks. Ensure that error handling is managed in the caller if needed. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_s5phOBDydc0SjYXR
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
65-67: DocstringReturnssection is incomplete.The return type is
Tuple[List[TaskResponse], List[str]] | RunInGithubResponse, but the docstring only documents the tuple format. Consider documenting both return shapes so SDK users know what to expect in CI vs local execution.Returns: - Tuple of (results, errors). Returns ([], []) if wait_for_results is False + - Local execution: Tuple of (results, errors). Returns ([], []) if wait_for_results is False + - GitHub execution: RunInGithubResponse with experiment_id, run_id, and status
469-471: Minor type hint inconsistency.
run_with_semaphoreparameter is typed asDict[str, Any]butrun_single_rowacceptsOptional[Dict[str, Any]]. For consistency:- async def run_with_semaphore(row: Dict[str, Any]) -> TaskResult: + async def run_with_semaphore(row: Optional[Dict[str, Any]]) -> TaskResult:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧠 Learnings (1)
📚 Learning: 2025-08-04T15:35:30.188Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.
Applied to files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
201-201: Do not catch blind exception: Exception
(BLE001)
202-202: Use explicit conversion flag
Replace with conversion flag
(RUF010)
281-284: Avoid specifying long messages outside the exception class
(TRY003)
304-308: Avoid specifying long messages outside the exception class
(TRY003)
317-320: Avoid specifying long messages outside the exception class
(TRY003)
368-370: Create your own exception
(TRY002)
368-370: Avoid specifying long messages outside the exception class
(TRY003)
460-460: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (7)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (7)
1-21: LGTM!Imports are well-organized and appropriate for the new GitHub execution functionality.
92-240: LGTM!The local execution path properly propagates all parameters including
experiment_metadata, handles concurrency with semaphore, and correctly implementsstop_on_errorandwait_for_resultsbehavior.
242-372: LGTM! Previous review feedback addressed.The method now:
- Validates
pull_requestevent explicitly (line 303)- Propagates
experiment_metadataandauxparameters- Builds
experiment_run_metadataconsistently with_run_locally- Merges user-provided
experiment_metadatawithcreated_from: "github"Based on learnings, the
Exceptionraising pattern at line 368 is consistent with the established SDK error handling conventions.
374-399: LGTM!The
experiment_metadataparameter is correctly added and propagated to the request body.
401-418: LGTM!The
task_inputparameter type correctly updated toOptional[Dict[str, Any]].
420-435: No concerns.Existing JSONL parsing logic is unchanged.
437-479: Previous critical bug fixed.The
_execute_tasksmethod now correctly awaits and appends results without the brokencompleted_task.task_inputaccess. Error handling is properly encapsulated inrun_single_row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 7ad43b6 in 1 minute and 35 seconds. Click for details.
- Reviewed
13lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:50
- Draft comment:
The removed docstring lines provided clarity on GitHub vs. local execution. Consider retaining a brief note that the method routes to a GitHub-specific flow when GITHUB_ACTIONS is set, to aid users in understanding the behavior. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment is suggesting that documentation be added back or retained. This is a documentation suggestion, not a code change. The rules state "Do NOT comment unless there is clearly a code change required" and "Do NOT make comments that are obvious or unimportant." The behavior is already evident from the return type annotation and the method implementation itself. The comment uses "Consider retaining" which is a soft suggestion rather than identifying a clear issue. Additionally, the removed documentation was somewhat redundant given that the code itself is self-documenting with the environment variable check and the two separate method calls. The comment is about documentation style/preference rather than a functional issue. The removed documentation might actually be helpful for users who are reading the docstring to understand the method's behavior without diving into the implementation. The dual behavior based on environment variables might not be immediately obvious from just the return type, and explicit documentation could improve the developer experience. While documentation can be helpful, the rules explicitly state not to comment unless there's a clear code change required. This is a documentation preference, not a functional issue. The behavior is already clear from the code structure and return types. If the author removed this documentation, they likely had a reason (perhaps to keep the docstring more concise or because the behavior is self-evident from the implementation). This comment should be deleted. It's suggesting documentation changes rather than identifying a code issue. The comment doesn't point to a bug or required code change, just a preference for more detailed documentation. This violates the rule that comments should only be made when there's clearly a code change required.
Workflow ID: wflow_ff9K6bZnVxzSqsEW
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)
63-64: Update docstring to document both return types.The
Returnssection only describes the tuple shape(results, errors), but when running in GitHub Actions, the method returnsRunInGithubResponsewith different fields (experiment_id,experiment_slug,run_id). Document both shapes so callers know what to expect in each context.
442-444: Remove stale docstring parameter.The docstring's Args section does not reference a
stop_on_errorparameter, but based on past review context, there was a stale reference. Ensure the docstring accurately reflects the current method signature.
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
66-88: Verify environment variable detection for consistency.The branch condition uses
os.getenv("GITHUB_ACTIONS")to detect GitHub CI, which returnsNone(falsy) when unset. Inside_run_in_github, however, lines 278-282 verify the same variable but could theoretically be skipped if called directly. While the current code paths ensure_run_in_githubis only reached whenGITHUB_ACTIONSis set, consider whether the redundant validation in_run_in_github(lines 278-282) is necessary or can be removed for clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧠 Learnings (1)
📚 Learning: 2025-08-04T15:35:30.188Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3219
File: packages/traceloop-sdk/traceloop/sdk/datasets/dataset.py:357-360
Timestamp: 2025-08-04T15:35:30.188Z
Learning: In the traceloop SDK Dataset class, the established error handling pattern is that HTTP client methods return None on failure (after catching and logging RequestException), and the Dataset API methods check for None return values and raise Exception with descriptive messages. The update_row_api method is inconsistent with this pattern.
Applied to files:
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (3)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (5)
RunInGithubRequest(88-98)RunInGithubResponse(101-106)TaskResult(71-76)GithubContext(79-85)TaskResponse(10-15)packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
get_version_jsonl(218-223)packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
post(23-37)
🪛 Ruff (0.14.6)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
199-199: Do not catch blind exception: Exception
(BLE001)
200-200: Use explicit conversion flag
Replace with conversion flag
(RUF010)
279-282: Avoid specifying long messages outside the exception class
(TRY003)
302-306: Avoid specifying long messages outside the exception class
(TRY003)
315-318: Avoid specifying long messages outside the exception class
(TRY003)
366-368: Create your own exception
(TRY002)
366-368: Avoid specifying long messages outside the exception class
(TRY003)
458-458: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
🔇 Additional comments (4)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (4)
329-332: LGTM: Experiment metadata merge pattern.The merge pattern correctly combines user-provided metadata with the GitHub marker, addressing the past review concern about hardcoded metadata.
334-338: LGTM: Metadata structure normalized.The
experiment_run_metadataconstruction now matches the pattern used in_run_locally(lines 123-127), ensuring consistent metadata shape across both execution paths and addressing the past concern about asymmetry.
301-318: LGTM: GitHub context validation is robust.The addition of explicit
GITHUB_EVENT_NAMEvalidation (line 301) and the RuntimeError whenpr_numbercannot be extracted (lines 314-318) address the past review concerns about type mismatches and insufficient event verification. The code now ensures thatpr_urlis always non-null when constructingGithubContext, making it compatible with the model'sstrtype annotation.
473-475: LGTM: asyncio.Task bug fixed.The critical bug flagged in past reviews—accessing
completed_task.task_input—has been resolved. The current implementation correctly awaits each task and appends the returnedTaskResultwithout attempting to access non-existent attributes.
🧪 Traceloop Experiment ResultsExperiment: research-exp Evaluators Results
|
feat(instrumentation): ...orfix(instrumentation): ....Important
Add GitHub Actions support for running experiments with local task execution and result submission in the
Experimentclass.run_in_github()method inExperimentclass to execute tasks locally and submit results for GitHub CI/CD.RunInGithubRequest,RunInGithubResponse,TaskResult, andGithubContextmodels inmodel.pyto support GitHub-specific experiment execution.run_research_experiment.pyscript to demonstrate running experiments in GitHub Actions with local task execution and result submission.run()method inexperiment.pynow checks for GitHub Actions environment and delegates to_run_in_github()if detected._execute_tasks()inexperiment.py._run_in_github()to ensure execution in pull request workflows.run()and_run_locally()to reflect async task handling.This description was created by
for 7ad43b6. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.