Skip to content

Conversation

@keugenek
Copy link
Contributor

@keugenek keugenek commented Dec 5, 2025

Changes

This PR creates Lafeflow Job for long-running Apps Codegen Evals by cloning from this repo and running sample generation of Databricks apps using current cli mcp and running evals on them and publishing the result to mlflow.

Testing

evgenii.kniazev@FP424MF2FY cli % cd experimental/apps-mcp/evals
evgenii.kniazev@FP424MF2FY evals % databricks auth login
✔ Databricks profile name [DEFAULT]: █
Profile DEFAULT was successfully saved
evgenii.kniazev@FP424MF2FY evals % databricks bundle validate -t dev
Name: apps-mcp-evals
Target: dev
Workspace:..

Validation OK!
evgenii.kniazev@FP424MF2FY evals % databricks bundle deploy -t dev
Building apps_mcp_evals...
Uploading dist/apps_mcp_evals-0.1.0-py3-none-any.whl...
Uploading bundle files to /..
Deploying resources...
Updating deployment state...
Deployment complete!
evgenii.kniazev@FP424MF2FY evals % databricks bundle run -t dev apps_eval_job
Run URL: ...

2025-12-05 17:00:16 "[dev evgenii_kniazev] [dev] Apps-MCP Continuous Evals" RUNNING

@keugenek keugenek requested review from a team and lennartkats-db as code owners December 5, 2025 17:03
@keugenek keugenek marked this pull request as draft December 5, 2025 17:04
- Change requires-python from >=3.11 to >=3.10
- Replace str | None union syntax with Optional[str] for 3.10 compat
- Remove unused databricks-sdk and tqdm dependencies

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@keugenek keugenek requested review from fjakobs and igrekun December 5, 2025 17:23
- Remove bundle run dependency (databricks CLI not available in serverless)
- Clone appdotbuild-agent repo and install klaudbiusz deps
- Handle case of no apps gracefully - log sample metrics to MLflow
- Job successfully validates infrastructure and logs to MLflow

Note: Full eval requires Python 3.12+ or pre-populated apps

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add apps_generation_job.job.yml with single-node Docker cluster
- Add generate_apps.py orchestrator using klaudbiusz framework
- Add init/setup_generation.sh to install Dagger and Python deps
- Update run_evals.py to read apps from UC Volume
- Add variables for CLI binary and generated apps volumes

Generation uses databricks experimental apps-mcp as the MCP server,
built from this repo for Linux x86_64.

Prerequisites:
- Create secret: databricks secrets put-secret apps-mcp-evals anthropic-api-key
- Upload CLI: GOOS=linux GOARCH=amd64 go build -o databricks-linux .
             databricks fs cp databricks-linux /Volumes/main/evals/artifacts/

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Use main.default.apps_mcp_artifacts and main.default.apps_mcp_generated
volumes which were created successfully.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants