feat: Enhance Claude Sonnet 4.5 support with 1M context window and tiered pricing (#209)

magesh-presidio · Mage1507 · web-flow · commit 4c9d30b4ff09 · 2025-10-08T19:33:41.000+05:30
* feat: Enhance Claude Sonnet 4.5 support with 1M context window and tiered pricing

* doc: updated changeset

---------

Co-authored-by: Magesh &lt;mageshmscss@gmail.com&gt;
diff --git a/.changeset/kind-games-remain.md b/.changeset/kind-games-remain.md
@@ -0,0 +1,5 @@
+---
+"hai-build-code-generator": patch
+---
+
+Enhanced support for Claude Sonnet 4.5, extending its maximum context window to 1 million tokens and enabling tiered pricing for more flexible usage models.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -5,7 +5,7 @@ body:
     - type: markdown
       attributes:
           value: |
-              **Important:** All bug reports must be reproducible using Claude 4 Sonnet. HAI uses complex prompts so less capable models may not work as expected.
+              **Important:** All bug reports must be reproducible using Claude Sonnet 4.5. HAI uses complex prompts so less capable models may not work as expected.
     - type: textarea
       id: what-happened
       attributes:
@@ -36,7 +36,7 @@ body:
       attributes:
           label: Provider/Model
           description: What provider and model were you using when the issue occurred?
-          placeholder: "e.g., anthropic/claude-3.7-sonnet, gemini:gemini-2.5-pro-exp-03-25"
+          placeholder: "e.g., anthropic/claude-sonnet-4.5, gemini:gemini-2.5-pro-exp-03-25"
       validations:
           required: true
     - type: textarea
diff --git a/hai-docs/getting-started/model-selection-guide.mdx b/hai-docs/getting-started/model-selection-guide.mdx
@@ -9,7 +9,7 @@ New models drop constantly, so this guide focuses on what's working well with HA
 
 | Model | Context Window | Input Price* | Output Price* | Best For |
 |-------|---------------|--------------|---------------|----------|
-| **Claude Sonnet 4** | 1M tokens | $3-6 | $15-22.50 | Reliable tool usage, complex codebases |
+| **Claude Sonnet 4.5** | 1M tokens | $3-6 | $15-22.50 | Reliable tool usage, complex codebases |
 | **Qwen3 Coder** | 256K tokens | $0.20 | $0.80 | Coding tasks, open source flexibility |
 | **Gemini 2.5 Pro** | 1M+ tokens | TBD | TBD | Large codebases, document analysis |
 | **GPT-5** | 400K tokens | $1.25 | $10 | Latest OpenAI tech, three modes |
@@ -57,9 +57,9 @@ New models drop constantly, so this guide focuses on what's working well with HA
 
 | If you want... | Use this |
 |----------------|----------|
-| Something that just works | Claude Sonnet 4 |
+| Something that just works | Claude Sonnet 4.5 |
 | To save money | DeepSeek V3 or Qwen3 variants |
-| Huge context windows | Gemini 2.5 Pro or Claude Sonnet 4 |
+| Huge context windows | Gemini 2.5 Pro or Claude Sonnet 4.5 |
 | Open source | Qwen3 Coder, Z AI GLM 4.5, or Kimi K2 |
 | Latest tech | GPT-5 |
 | Speed | Qwen3 Coder on Cerebras (fastest available) |
@@ -71,6 +71,6 @@ HAI automatically handles context limits with [auto-compact](/features/auto-comp
 
 ## The Bottom Line
 
-Start with **Claude Sonnet 4** if you want reliability. Experiment with **open source options** once you're comfortable to find the best fit for your workflow and budget.
+Start with **Claude Sonnet 4.5** if you want reliability. Experiment with **open source options** once you're comfortable to find the best fit for your workflow and budget.
 
 The landscape moves fast - these recommendations reflect what's working now, but keep an eye on new releases.
diff --git a/hai-docs/getting-started/understanding-context-management.mdx b/hai-docs/getting-started/understanding-context-management.mdx
@@ -53,7 +53,7 @@ Think of context like a whiteboard you and HAI share:
 -   **Context Window** is the size of the whiteboard itself:
     -   Measured in tokens (1 token ≈ 3/4 of an English word)
     -   Each model has a fixed size:
-        -   Claude Sonnet 4: 1,000,000 tokens
+        -   Claude Sonnet 4.5: 1,000,000 tokens
         -   Qwen3 Coder: 256,000 tokens
         -   Gemini 2.5 Pro: 1,000,000+ tokens
         -   GPT-5: 400,000 tokens
@@ -77,7 +77,7 @@ HAI provides a visual way to monitor your context window usage through a progres
 -   ↑ shows input tokens (what you've sent to the LLM)
 -   ↓ shows output tokens (what the LLM has generated)
 -   The progress bar visualizes how much of your context window you've used
--   The total shows your model's maximum capacity (e.g., 1M for Claude Sonnet 4)
+-   The total shows your model's maximum capacity (e.g., 1M for Claude Sonnet 4.5)
 
 ### When to Watch the Bar
 
diff --git a/hai-docs/provider-config/anthropic.mdx b/hai-docs/provider-config/anthropic.mdx
@@ -17,11 +17,8 @@ description: "Learn how to configure and use Anthropic Claude models with HAI. C
 HAI supports the following Anthropic Claude models:
 
 - `claude-opus-4-20250514`
-- `claude-opus-4-20250514:thinking` (Extended Thinking variant)
-- `claude-sonnet-4-20250514` (Recommended)
-- `claude-sonnet-4-20250514:thinking` (Extended Thinking variant)
+- `anthropic/claude-sonnet-4.5` (Recommended)
 - `claude-3-7-sonnet-20250219`
-- `claude-3-7-sonnet-20250219:thinking` (Extended Thinking variant)
 - `claude-3-5-sonnet-20241022`
 - `claude-3-5-haiku-20241022`
 - `claude-3-opus-20240229`
@@ -46,8 +43,8 @@ HAI users can leverage this by checking the `Enable Extended Thinking` box below
 
 **Key Aspects of Extended Thinking:**
 
-- **Supported Models:** This feature is available for select models, including variants of Claude Opus 4, Claude Sonnet 4, and Claude Sonnet 3.7. The specific models listed in the "Supported Models" section above with the `:thinking` suffix are pre-configured in HAI to utilize this.
-- **Summarized Thinking (Claude 4):** For Claude 4 models, the API returns a summary of the full thinking process to balance insight with efficiency and prevent misuse. You are billed for the full thinking tokens, not just the summary.
+- **Supported Models:** This feature is available for select models, including Claude Opus 4, Claude Sonnet 4.5, and Claude Sonnet 3.7.
+- **Summarized Thinking (Claude 4):** For Claude 4 and 4.5 models, the API returns a summary of the full thinking process to balance insight with efficiency and prevent misuse. You are billed for the full thinking tokens, not just the summary.
 - **Streaming:** Extended thinking responses, including the `thinking` blocks, can be streamed.
 - **Tool Use & Prompt Caching:** Extended thinking interacts with tool use (requiring thinking blocks to be passed back) and prompt caching (with specific behaviors around cache invalidation and context).
 
diff --git a/src/core/api/providers/anthropic.ts b/src/core/api/providers/anthropic.ts
@@ -1,6 +1,6 @@
 import { Anthropic } from "@anthropic-ai/sdk"
 import { Stream as AnthropicStream } from "@anthropic-ai/sdk/streaming"
-import { AnthropicModelId, anthropicDefaultModelId, anthropicModels, CLAUDE_SONNET_4_1M_SUFFIX, ModelInfo } from "@shared/api"
+import { AnthropicModelId, anthropicDefaultModelId, anthropicModels, CLAUDE_SONNET_1M_SUFFIX, ModelInfo } from "@shared/api"
 import { ApiHandler, CommonApiHandlerOptions } from "../index"
 import { withRetry } from "../retry"
 import { ApiStream } from "../transform/stream"
@@ -45,16 +45,18 @@ export class AnthropicHandler implements ApiHandler {
 		const model = this.getModel()
 		let stream: AnthropicStream<Anthropic.RawMessageStreamEvent>
 
-		const modelId = model.id.endsWith(CLAUDE_SONNET_4_1M_SUFFIX)
-			? model.id.slice(0, -CLAUDE_SONNET_4_1M_SUFFIX.length)
-			: model.id
-		const enable1mContextWindow = model.id.endsWith(CLAUDE_SONNET_4_1M_SUFFIX)
+		const modelId = model.id.endsWith(CLAUDE_SONNET_1M_SUFFIX) ? model.id.slice(0, -CLAUDE_SONNET_1M_SUFFIX.length) : model.id
+		const enable1mContextWindow = model.id.endsWith(CLAUDE_SONNET_1M_SUFFIX)
 
 		const budget_tokens = this.options.thinkingBudgetTokens || 0
-		const reasoningOn = !!((modelId.includes("3-7") || modelId.includes("4-")) && budget_tokens !== 0)
+		const reasoningOn = !!(
+			(modelId.includes("3-7") || modelId.includes("4-") || modelId.includes("4-5")) &&
+			budget_tokens !== 0
+		)
 
 		switch (modelId) {
 			// 'latest' alias does not support cache_control
+			case "claude-sonnet-4-5-20250929":
 			case "claude-sonnet-4-20250514":
 			case "claude-3-7-sonnet-20250219":
 			case "claude-3-5-sonnet-20241022":
diff --git a/src/core/api/providers/bedrock.ts b/src/core/api/providers/bedrock.ts
@@ -9,7 +9,7 @@ import {
 	InvokeModelWithResponseStreamCommand,
 } from "@aws-sdk/client-bedrock-runtime"
 import { fromNodeProviderChain } from "@aws-sdk/credential-providers"
-import { BedrockModelId, bedrockDefaultModelId, bedrockModels, CLAUDE_SONNET_4_1M_SUFFIX, ModelInfo } from "@shared/api"
+import { BedrockModelId, bedrockDefaultModelId, bedrockModels, CLAUDE_SONNET_1M_SUFFIX, ModelInfo } from "@shared/api"
 import { calculateApiCostOpenAI } from "@utils/cost"
 import { ApiHandler, CommonApiHandlerOptions } from "../"
 import { withRetry } from "../retry"
@@ -119,11 +119,11 @@ export class AwsBedrockHandler implements ApiHandler {
 		// cross region inference requires prefixing the model id with the region
 		const rawModelId = await this.getModelId()
 
-		const modelId = rawModelId.endsWith(CLAUDE_SONNET_4_1M_SUFFIX)
-			? rawModelId.slice(0, -CLAUDE_SONNET_4_1M_SUFFIX.length)
+		const modelId = rawModelId.endsWith(CLAUDE_SONNET_1M_SUFFIX)
+			? rawModelId.slice(0, -CLAUDE_SONNET_1M_SUFFIX.length)
 			: rawModelId
 
-		const enable1mContextWindow = rawModelId.endsWith(CLAUDE_SONNET_4_1M_SUFFIX)
+		const enable1mContextWindow = rawModelId.endsWith(CLAUDE_SONNET_1M_SUFFIX)
 
 		const model = this.getModel()
 
@@ -741,7 +741,10 @@ export class AwsBedrockHandler implements ApiHandler {
 	 */
 	private shouldEnableReasoning(baseModelId: string, budgetTokens: number): boolean {
 		return (
-			(baseModelId.includes("3-7") || baseModelId.includes("sonnet-4") || baseModelId.includes("opus-4")) &&
+			(baseModelId.includes("3-7") ||
+				baseModelId.includes("sonnet-4") ||
+				baseModelId.includes("opus-4") ||
+				baseModelId.includes("sonnet-4-5")) &&
 			budgetTokens !== 0
 		)
 	}
diff --git a/src/core/api/transform/openrouter-stream.ts b/src/core/api/transform/openrouter-stream.ts
@@ -1,5 +1,10 @@
 import { Anthropic } from "@anthropic-ai/sdk"
-import { CLAUDE_SONNET_4_1M_SUFFIX, ModelInfo, openRouterClaudeSonnet41mModelId } from "@shared/api"
+import {
+	CLAUDE_SONNET_1M_SUFFIX,
+	ModelInfo,
+	openRouterClaudeSonnet41mModelId,
+	openRouterClaudeSonnet451mModelId,
+} from "@shared/api"
 import OpenAI from "openai"
 import { isGPT5ModelFamily } from "../../prompts/system-prompt/utils"
 import { convertToOpenAiMessages } from "./openai-format"
@@ -20,16 +25,18 @@ export async function createOpenRouterStream(
 		...convertToOpenAiMessages(messages),
 	]
 
-	const isClaudeSonnet41m = model.id === openRouterClaudeSonnet41mModelId
-	if (isClaudeSonnet41m) {
+	const isClaudeSonnet1m = model.id === openRouterClaudeSonnet41mModelId || model.id === openRouterClaudeSonnet451mModelId
+	if (isClaudeSonnet1m) {
 		// remove the custom :1m suffix, to create the model id openrouter API expects
-		model.id = model.id.slice(0, -CLAUDE_SONNET_4_1M_SUFFIX.length)
+		model.id = model.id.slice(0, -CLAUDE_SONNET_1M_SUFFIX.length)
 	}
 
 	// prompt caching: https://openrouter.ai/docs/prompt-caching
 	// this was initially specifically for claude models (some models may 'support prompt caching' automatically without this)
 	// handles direct model.id match logic
 	switch (model.id) {
+		case "anthropic/claude-sonnet-4.5":
+		case "anthropic/claude-4.5-sonnet":
 		case "anthropic/claude-sonnet-4":
 		case "anthropic/claude-opus-4.1":
 		case "anthropic/claude-opus-4":
@@ -89,6 +96,8 @@ export async function createOpenRouterStream(
 	// (models usually default to max tokens allowed)
 	let maxTokens: number | undefined
 	switch (model.id) {
+		case "anthropic/claude-sonnet-4.5":
+		case "anthropic/claude-4.5-sonnet":
 		case "anthropic/claude-sonnet-4":
 		case "anthropic/claude-opus-4.1":
 		case "anthropic/claude-opus-4":
@@ -125,6 +134,8 @@ export async function createOpenRouterStream(
 
 	let reasoning: { max_tokens: number } | undefined
 	switch (model.id) {
+		case "anthropic/claude-sonnet-4.5":
+		case "anthropic/claude-4.5-sonnet":
 		case "anthropic/claude-sonnet-4":
 		case "anthropic/claude-opus-4.1":
 		case "anthropic/claude-opus-4":
@@ -186,7 +197,7 @@ export async function createOpenRouterStream(
 			? { provider: { order: ["groq", "together", "baseten", "parasail", "novita", "deepinfra"], allow_fallbacks: false } }
 			: {}),
 		// limit providers to only those that support the 1m context window
-		...(isClaudeSonnet41m ? { provider: { order: ["anthropic", "amazon-bedrock"], allow_fallbacks: false } } : {}),
+		...(isClaudeSonnet1m ? { provider: { order: ["anthropic", "google-vertex/global"], allow_fallbacks: false } } : {}),
 	})
 
 	return stream
diff --git a/src/core/controller/models/refreshOpenRouterModels.ts b/src/core/controller/models/refreshOpenRouterModels.ts
@@ -6,7 +6,12 @@ import axios from "axios"
 import cloneDeep from "clone-deep"
 import fs from "fs/promises"
 import path from "path"
-import { CLAUDE_SONNET_4_1M_TIERS, clineMicrowaveAlphaModelInfo, openRouterClaudeSonnet41mModelId } from "@/shared/api"
+import {
+	CLAUDE_SONNET_1M_TIERS,
+	clineMicrowaveAlphaModelInfo,
+	openRouterClaudeSonnet41mModelId,
+	openRouterClaudeSonnet451mModelId,
+} from "@/shared/api"
 import { Controller } from ".."
 
 type OpenRouterSupportedParams =
@@ -109,6 +114,8 @@ export async function refreshOpenRouterModels(
 				})
 
 				switch (rawModel.id) {
+					case "anthropic/claude-sonnet-4.5":
+					case "anthropic/claude-4.5-sonnet":
 					case "anthropic/claude-sonnet-4":
 					case "anthropic/claude-3-7-sonnet":
 					case "anthropic/claude-3-7-sonnet:beta":
@@ -204,11 +211,14 @@ export async function refreshOpenRouterModels(
 				models[rawModel.id] = modelInfo
 
 				// add custom :1m model variant
-				if (rawModel.id === "anthropic/claude-sonnet-4") {
-					const claudeSonnet41mModelInfo = cloneDeep(modelInfo)
-					claudeSonnet41mModelInfo.contextWindow = 1_000_000 // limiting providers to those that support 1m context window
-					claudeSonnet41mModelInfo.tiers = CLAUDE_SONNET_4_1M_TIERS
-					models[openRouterClaudeSonnet41mModelId] = claudeSonnet41mModelInfo
+				if (rawModel.id === "anthropic/claude-sonnet-4" || rawModel.id === "anthropic/claude-sonnet-4.5") {
+					const claudeSonnet1mModelInfo = cloneDeep(modelInfo)
+					claudeSonnet1mModelInfo.contextWindow = 1_000_000 // limiting providers to those that support 1m context window
+					claudeSonnet1mModelInfo.tiers = CLAUDE_SONNET_1M_TIERS
+					// sonnet 4
+					models[openRouterClaudeSonnet41mModelId] = claudeSonnet1mModelInfo
+					// sonnet 4.5
+					models[openRouterClaudeSonnet451mModelId] = claudeSonnet1mModelInfo
 				}
 			}
 
diff --git a/src/extension.ts b/src/extension.ts
@@ -426,7 +426,7 @@ export async function activate(context: vscode.ExtensionContext) {
 
 	// Register the command handlers
 	context.subscriptions.push(
-		vscode.commands.registerCommand("cline.addToChat", async (range?: vscode.Range, diagnostics?: vscode.Diagnostic[]) => {
+		vscode.commands.registerCommand("hai.addToChat", async (range?: vscode.Range, diagnostics?: vscode.Diagnostic[]) => {
 			const context = await getContextForCommand(range, diagnostics)
 			if (!context) {
 				return
diff --git a/src/shared/api.ts b/src/shared/api.ts
diff --git a/webview-ui/src/components/settings/OpenRouterModelPicker.tsx b/webview-ui/src/components/settings/OpenRouterModelPicker.tsx
diff --git a/webview-ui/src/components/settings/providers/AnthropicProvider.tsx b/webview-ui/src/components/settings/providers/AnthropicProvider.tsx
diff --git a/webview-ui/src/components/settings/providers/BedrockProvider.tsx b/webview-ui/src/components/settings/providers/BedrockProvider.tsx
diff --git a/webview-ui/src/components/settings/providers/ClaudeCodeProvider.tsx b/webview-ui/src/components/settings/providers/ClaudeCodeProvider.tsx

-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +---
 +"hai-build-code-generator": patch
 +---
++
 +Enhanced support for Claude Sonnet 4.5, extending its maximum context window to 1 million tokens and enabling tiered pricing for more flexible usage models.