_**Note: This PR and the code have been created with [spec-kit](https://github.com/github/spec-kit), and its a test balloon for SDD. Feel free to shortcut the review, but this kind of spec-oriented reviews are at the heart of spec driven development, so its also good to get an initial feeling about that methodology. Feel free to comment also about the process in general below. No bits and bytes have been harmed until now, this here is only the result of the initial specification step. See this [nice diagram](github/spec-kit#468 (reply in thread)) about a first look on SDD. See the section "About SDD" for another quick intro into the SDD review process.**_ ## Overview This PR implements the **specification phase** for the External Provider Injection feature, as described in the design document: https://hackmd.io/@7FIgxbJfSliRvRtcYpoXFw/SyswYGr6le/edit **⚠️ Note**: This is the **first phase** containing only specifications - **no implementation code** is included in this PR. ### 🎯 Key Architectural Decisions 1. **Two-phase init container architecture**: Install (N containers) → Merge (1 container) 2. **CRD-ordered execution**: Init containers run in user-specified order (not alphabetical) 3. **extra-providers.yaml schema**: Forward-compatible design enabling future migration to native LlamaStack support 4. **Merge tool binary**: Included in operator image for Phase 2 init container 5. **Configuration merge order**: User ConfigMap run.yaml (if exists) → External providers (with conflict resolution: external providers take precedence over ConfigMap with WARNING logged) ## What's in This PR This PR introduces comprehensive specification documents generated using `spec-kit`: ### 📋 Core Specification Documents - **[`spec.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md)** ⭐ - **Primary review focus**: Core requirements, user stories, and acceptance criteria (32 functional requirements) - **[`plan.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/plan.md)** - Implementation approach with two-phase init container architecture - **[`tasks.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/tasks.md)** - Breakdown of 110 tasks across 7 implementation phases - **[`integration-points.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/integration-points.md)** - Detailed integration with existing codebase (10 integration points) - **[`extra-providers-schema.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/extra-providers-schema.md)** - Forward-compatible schema for future LlamaStack support - **[`pr-strategy.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/pr-strategy.md)** - 8-PR breakdown strategy to manage implementation complexity ## 📖 Review Roadmap ### Recommended Review Order (30-45 minutes total) Follow this order for an efficient and thorough review: <details> <summary>Roadmap for doing this PR review</summary> <p> #### Step 1: Quick Context (5 minutes) 1. Read the [**Purpose** section in `spec.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#purpose) 2. Skim the [**User Scenarios & Testing** section](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#user-scenarios--testing) to understand the 3 user stories #### Step 2: Deep Dive - Requirements (15 minutes) ⭐ MOST IMPORTANT 1. **Read [`spec.md` - Functional Requirements](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#functional-requirements)** - Focus on: Do these requirements make sense? - Check: Are requirements testable and unambiguous? - Look for: Missing requirements or edge cases 2. **Read [`spec.md` - User Story Acceptance Scenarios](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#user-story-1---deploy-custom-provider-priority-p1)** - Validate: Are scenarios realistic and complete? - Consider: What scenarios are we missing? #### Step 3: Architecture Validation (10 minutes) 1. **Skim [`integration-points.md` - Init Container Architecture](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/integration-points.md#init-container-architecture-phase-1---current)** - Understand: Two-phase flow (Install → Merge) - Validate: Does this align with Kubernetes patterns? 2. **Read [`spec.md` - Extra Providers Configuration](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#extra-providers-configuration-extra-providersyaml)** - Check: Does the schema make sense? - Consider: Is it forward-compatible? 3. **Read [`spec.md` - Merge Order and Precedence](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#merge-order-and-precedence)** - Validate: Does conflict resolution make sense? - Check: Are warnings sufficient for override cases? #### Step 4: Implementation Planning (5-10 minutes) - OPTIONAL 1. **Skim [`pr-strategy.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/pr-strategy.md#pr-breakdown-8-prs)** - Validate: Is the 8-PR breakdown reasonable? - Check: Are PR sizes manageable for review? 2. **Skim [`tasks.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/tasks.md#phase-1-setup-shared-infrastructure)** - Understand: Task organization across 7 phases - Check: Does sequencing make sense? #### Step 5: Edge Cases & Error Handling (5 minutes) 1. **Read [`spec.md` - Edge Cases](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#edge-cases)** - Validate: Are these the right edge cases? - Consider: What are we missing? 2. **Read [`spec.md` - Error Message Requirements](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#error-message-requirements)** - Check: Are error formats clear and actionable? ### 🎯 Key Review Questions As you review, consider: **Requirements (spec.md)**: - [ ] Are the 3 user stories clear and independently testable? - [ ] Are the 32 functional requirements complete and unambiguous? - [ ] Are error message formats actionable for users? - [ ] Are edge cases properly identified? - [ ] Does the merge order and conflict resolution make sense? **Architecture (integration-points.md)**: - [ ] Does the two-phase init container approach make sense? - [ ] Are the 10 integration points in the right places? - [ ] Is the extra-providers.yaml schema forward-compatible? **Planning (tasks.md, pr-strategy.md)**: - [ ] Is the 110-task breakdown reasonable? - [ ] Is the 8-PR strategy manageable for reviewers? **Process**: - [ ] Is the SDD approach working well? - [ ] How can we improve the specification review process? ## Review Guidance ### 🔍 What to Focus On **Primary review areas** (in order of importance): 1. **[`spec.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md)** - User stories, functional requirements, and acceptance criteria - Are the 3 user stories clear and testable? - Are the 32 functional requirements complete? - Are error message formats actionable? - Does the configuration merge order make sense? 2. **Use cases and edge cases** (in [`spec.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/spec.md#user-scenarios--testing)) - Do the acceptance scenarios cover the critical paths? - Are edge cases properly identified? 3. **[`integration-points.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/integration-points.md)** - Does the integration strategy make sense? - Are the 10 integration points in the right places? - Does the two-phase init container architecture align with the existing codebase? **Secondary review areas** (optional deep dive): - [`plan.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/plan.md) - Implementation approach and code examples - [`tasks.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/tasks.md) - Task organization and dependencies - [`pr-strategy.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/pr-strategy.md) - PR breakdown strategy </p> </details> ### 📚 Understanding the Spec-Kit Structure The specifications follow a structured format: - **spec.md**: WHAT we're building (requirements, contracts, success criteria) - **plan.md**: HOW we'll build it (technical approach, architecture, examples) - **tasks.md**: WHEN and in what ORDER (task breakdown, phases, dependencies) ## About SDD (Specification-Driven Development) This PR demonstrates the **SDD workflow** where we: 1. ✅ **Specify first**: Create detailed specifications before writing code 2. ⏳ **Review specs**: Get alignment on requirements and design (← **you are here**) 3. ⏳ **Implement**: Build according to validated specifications 4. ⏳ **Verify**: Ensure implementation matches spec ### 🤔 SDD Review Process This may be different from typical code reviews: - **Focus on requirements clarity** rather than implementation details - **Check for ambiguity** in specifications - **Validate assumptions** about existing behavior - **Ensure testability** of acceptance criteria - **Challenge architectural decisions** before code is written **We're all learning this process**, so feedback on the SDD approach itself is equally welcome! ## Questions for Reviewers 1. Are the user stories (spec.md) clear and complete? 2. Does the two-phase init container architecture make sense? 3. Does the configuration merge order and conflict resolution make sense? 4. Are there integration points we're missing? 5. Does the extra-providers.yaml schema feel future-proof? 6. Is the 8-PR breakdown strategy (pr-strategy.md) reasonable? 7. **Process feedback**: How can we improve the SDD review workflow? ## Next Steps After this PR is approved: 1. Begin implementation following the 8-PR strategy (see [`pr-strategy.md`](https://github.com/rhuss/llama-stack-k8s-operator/blob/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1/pr-strategy.md)) 2. Start with first PR: Foundation - CRD Schema & Types (~150 lines) 3. Each implementation PR will reference back to these specifications ## Related Links - Design document: https://hackmd.io/@7FIgxbJfSliRvRtcYpoXFw/SyswYGr6le/edit - Specification directory: [`specs/001-deploy-time-providers-l1/`](https://github.com/rhuss/llama-stack-k8s-operator/tree/001-deploy-time-providers-l1/specs/001-deploy-time-providers-l1) Signed-off-by: Roland Huß <rhuss@redhat.com>
-Original file line number
+Diff line change
 config/manifests/bases/llama-stack-k8s-operator.clusterserviceversion.yaml
 .DS_Store
++
 +# SDD and Claude Code directories
 +.claude
 +.specify