Skip to content

Commit a2d9245

Browse files
committed
fix: type checking
1 parent a6f76fe commit a2d9245

File tree

13 files changed

+1830
-2
lines changed

13 files changed

+1830
-2
lines changed

docs/howtos/customizations/customize_models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ Ragas may use a LLM and or Embedding for evaluation and synthetic data generatio
99

1010
- If you are using Langchain, you can pass the Langchain LLM and Embeddings directly and Ragas will wrap it with `LangchainLLMWrapper` or `LangchainEmbeddingsWrapper` as needed.
1111

12+
!!! tip "Batch API Support"
13+
OpenAI models (ChatOpenAI, AzureChatOpenAI) automatically support [Batch Evaluation](../metrics/batch_evaluation.md) for up to 50% cost savings on large-scale evaluations. The `LangchainLLMWrapper` automatically detects batch support and enables cost-optimized evaluation workflows.
14+
1215
## Examples
1316

1417
- [Azure OpenAI](#azure-openai)

docs/howtos/customizations/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ How to customize various aspects of Ragas to suit your needs.
1414
- [Adapt metrics to target language](./metrics/_metrics_language_adaptation.md)
1515
- [Trace evaluations with Observability tools](metrics/tracing.md)
1616
- [Train and align metric](./metrics/train_your_own_metric.md)
17+
- [Batch evaluation for cost optimization](./metrics/batch_evaluation.md) 🆕
1718

1819

1920
## Testset Generation

docs/howtos/customizations/metrics/_cost.md

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,43 @@
11
# Understand Cost and Usage of Operations
22

3-
When using LLMs for evaluation and test set generation, cost will be an important factor. Ragas provides you some tools to help you with that.
3+
When using LLMs for evaluation and test set generation, cost will be an important factor. Ragas provides several tools to help you optimize costs, including **Batch API support** for up to 50% savings on large-scale evaluations.
4+
5+
## Cost Optimization Strategies
6+
7+
### 1. Use Batch API for Large Evaluations (50% Savings)
8+
9+
For non-urgent evaluation workloads, Ragas supports OpenAI's Batch API which provides 50% cost savings:
10+
11+
```python
12+
from ragas.batch_evaluation import BatchEvaluator, estimate_batch_cost_savings
13+
from ragas.metrics import Faithfulness
14+
from langchain_openai import ChatOpenAI
15+
from ragas.llms import LangchainLLMWrapper
16+
17+
# Setup batch-capable LLM
18+
llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
19+
faithfulness = Faithfulness(llm=llm)
20+
21+
# Estimate cost savings
22+
cost_info = estimate_batch_cost_savings(
23+
sample_count=1000,
24+
metrics=[faithfulness],
25+
regular_cost_per_1k_tokens=0.15, # GPT-4o-mini cost
26+
batch_discount=0.5 # 50% savings
27+
)
28+
29+
print(f"Regular cost: ${cost_info['regular_cost']}")
30+
print(f"Batch cost: ${cost_info['batch_cost']}")
31+
print(f"Savings: ${cost_info['savings']} ({cost_info['savings_percentage']}%)")
32+
33+
# Run batch evaluation
34+
evaluator = BatchEvaluator(metrics=[faithfulness])
35+
results = evaluator.evaluate(samples, wait_for_completion=True)
36+
```
37+
38+
Learn more about [Batch Evaluation](batch_evaluation.md).
39+
40+
### 2. Monitor Token Usage
441

542
## Understanding `TokenUsageParser`
643

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# Batch Evaluation for Cost Optimization
2+
3+
When running large-scale evaluations, cost can be a significant factor. Ragas now supports OpenAI's Batch API, which offers **up to 50% cost savings** compared to regular API calls, making it ideal for non-urgent evaluation workloads.
4+
5+
## What is Batch Evaluation?
6+
7+
OpenAI's Batch API allows you to submit multiple requests for asynchronous processing at half the cost of synchronous requests. Batch jobs are processed within 24 hours and have separate rate limits, making them perfect for large-scale evaluations where immediate results aren't required.
8+
9+
### Key Benefits
10+
11+
- **50% Cost Savings** on both input and output tokens
12+
- **Higher Rate Limits** that don't interfere with real-time usage
13+
- **Guaranteed Processing** within 24 hours (often much sooner)
14+
- **Large Scale Support** up to 50,000 requests per batch
15+
16+
## Quick Start
17+
18+
### Basic Batch Evaluation
19+
20+
```python
21+
import os
22+
from ragas.batch_evaluation import BatchEvaluator, estimate_batch_cost_savings
23+
from ragas.dataset_schema import SingleTurnSample
24+
from ragas.metrics import Faithfulness
25+
from ragas.llms import LangchainLLMWrapper
26+
from langchain_openai import ChatOpenAI
27+
28+
# Ensure you have your OpenAI API key set
29+
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
30+
31+
# Setup LLM with batch support (automatically detected)
32+
llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
33+
faithfulness = Faithfulness(llm=llm)
34+
35+
# Prepare your evaluation samples
36+
samples = [
37+
SingleTurnSample(
38+
user_input="What is the capital of France?",
39+
response="The capital of France is Paris.",
40+
retrieved_contexts=["Paris is the capital city of France."]
41+
),
42+
# ... more samples
43+
]
44+
45+
# Create batch evaluator
46+
evaluator = BatchEvaluator(metrics=[faithfulness])
47+
48+
# Run batch evaluation (blocks until completion)
49+
results = evaluator.evaluate(samples, wait_for_completion=True)
50+
51+
# Check results
52+
for result in results:
53+
print(f"Metric: {result.metric_name}")
54+
print(f"Job ID: {result.job_id}")
55+
print(f"Success Rate: {result.success_rate:.2%}")
56+
print(f"Sample Count: {result.sample_count}")
57+
```
58+
59+
### Cost Estimation
60+
61+
Before running batch evaluations, you can estimate your cost savings:
62+
63+
```python
64+
from ragas.batch_evaluation import estimate_batch_cost_savings
65+
66+
# Estimate costs for 1000 samples
67+
cost_info = estimate_batch_cost_savings(
68+
sample_count=1000,
69+
metrics=[faithfulness],
70+
regular_cost_per_1k_tokens=0.15, # GPT-4o-mini input cost
71+
batch_discount=0.5 # 50% savings
72+
)
73+
74+
print(f"Regular API Cost: ${cost_info['regular_cost']}")
75+
print(f"Batch API Cost: ${cost_info['batch_cost']}")
76+
print(f"Total Savings: ${cost_info['savings']} ({cost_info['savings_percentage']}%)")
77+
```
78+
79+
### Asynchronous Batch Evaluation
80+
81+
For non-blocking operations, use async evaluation:
82+
83+
```python
84+
import asyncio
85+
86+
async def run_batch_evaluation():
87+
evaluator = BatchEvaluator(metrics=[faithfulness])
88+
89+
# Submit jobs without waiting
90+
results = await evaluator.aevaluate(
91+
samples=samples,
92+
wait_for_completion=False # Don't block
93+
)
94+
95+
# Jobs are submitted, check back later
96+
for result in results:
97+
print(f"Submitted job {result.job_id} for {result.metric_name}")
98+
99+
# Run async evaluation
100+
asyncio.run(run_batch_evaluation())
101+
```
102+
103+
## Checking Batch Support
104+
105+
Not all LLMs support batch evaluation. Here's how to check:
106+
107+
```python
108+
# Check if metric supports batch evaluation
109+
if faithfulness.supports_batch_evaluation():
110+
print(f"{faithfulness.name} supports batch evaluation")
111+
else:
112+
print(f"{faithfulness.name} requires regular API")
113+
114+
# Check LLM batch support
115+
if llm.supports_batch_api():
116+
print("✅ LLM supports batch processing")
117+
else:
118+
print("❌ LLM does not support batch processing")
119+
```
120+
121+
## Supported Models
122+
123+
Currently, batch evaluation is supported for:
124+
- OpenAI models (ChatOpenAI, AzureChatOpenAI)
125+
- All metrics that use these LLMs
126+
127+
### Supported Metrics
128+
129+
- ✅ Faithfulness (partial support)
130+
- 🔄 More metrics coming soon...
131+
132+
For metrics not yet supporting batch evaluation, they will automatically fall back to regular API calls.
133+
134+
## Configuration Options
135+
136+
### BatchEvaluator Parameters
137+
138+
```python
139+
evaluator = BatchEvaluator(
140+
metrics=metrics,
141+
max_batch_size=1000, # Max samples per batch
142+
poll_interval=300.0, # Status check interval (5 minutes)
143+
timeout=86400.0 # Max wait time (24 hours)
144+
)
145+
```
146+
147+
### Custom Metadata
148+
149+
Add metadata to track your batch jobs:
150+
151+
```python
152+
results = evaluator.evaluate(
153+
samples=samples,
154+
metadata={
155+
"experiment": "model_comparison",
156+
"version": "v1.0",
157+
"dataset": "production_qa"
158+
}
159+
)
160+
```
161+
162+
## Best Practices
163+
164+
### When to Use Batch Evaluation
165+
166+
**Ideal for:**
167+
- Large-scale evaluations (100+ samples)
168+
- Non-urgent evaluation workloads
169+
- Cost optimization scenarios
170+
- Regular evaluation pipelines
171+
172+
**Avoid for:**
173+
- Real-time evaluation needs
174+
- Interactive applications
175+
- Small datasets (<50 samples)
176+
- Time-sensitive workflows
177+
178+
### Optimization Tips
179+
180+
1. **Batch Size**: Use 1000-5000 samples per batch for optimal performance
181+
2. **Model Selection**: Use cost-effective models like `gpt-4o-mini`
182+
3. **Concurrent Processing**: Submit multiple metrics simultaneously
183+
4. **Monitoring**: Set up logging for long-running jobs
184+
185+
```python
186+
import logging
187+
188+
# Enable batch evaluation logging
189+
logging.basicConfig(level=logging.INFO)
190+
logger = logging.getLogger('ragas.batch_evaluation')
191+
```
192+
193+
## Error Handling
194+
195+
```python
196+
try:
197+
results = evaluator.evaluate(samples)
198+
199+
for result in results:
200+
if result.errors:
201+
print(f"❌ Errors in {result.metric_name}:")
202+
for error in result.errors:
203+
print(f" - {error}")
204+
else:
205+
print(f"{result.metric_name}: {result.success_rate:.2%} success")
206+
207+
except Exception as e:
208+
print(f"Batch evaluation failed: {e}")
209+
```
210+
211+
## Low-Level Batch API
212+
213+
For advanced use cases, you can use the low-level batch API directly:
214+
215+
```python
216+
from ragas.llms.batch_api import create_batch_api, BatchRequest
217+
from openai import OpenAI
218+
219+
# Direct batch API usage
220+
client = OpenAI()
221+
batch_api = create_batch_api(client)
222+
223+
# Create custom requests
224+
requests = [
225+
BatchRequest(
226+
custom_id="eval-1",
227+
body={
228+
"model": "gpt-4o-mini",
229+
"messages": [{"role": "user", "content": "Evaluate this response..."}]
230+
}
231+
)
232+
]
233+
234+
# Submit batch job
235+
batch_job = batch_api.create_batch(requests)
236+
print(f"Batch job created: {batch_job.batch_id}")
237+
238+
# Monitor progress
239+
status = batch_job.get_status()
240+
print(f"Status: {status.value}")
241+
242+
# Retrieve results when complete
243+
if status.value == "completed":
244+
results = batch_job.get_results()
245+
for result in results:
246+
print(f"Response for {result.custom_id}: {result.response}")
247+
```
248+
249+
## Troubleshooting
250+
251+
### Common Issues
252+
253+
**Issue**: "Batch API not supported for this LLM"
254+
```python
255+
# Solution: Use OpenAI-based LLM
256+
from langchain_openai import ChatOpenAI
257+
llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
258+
```
259+
260+
**Issue**: "Metric does not support batch evaluation"
261+
```python
262+
# Solution: Check metric support or wait for future updates
263+
if not metric.supports_batch_evaluation():
264+
print(f"Metric {metric.name} will use regular API")
265+
```
266+
267+
**Issue**: Timeout waiting for batch completion
268+
```python
269+
# Solution: Use non-blocking evaluation or increase timeout
270+
results = evaluator.evaluate(
271+
samples,
272+
wait_for_completion=False # Don't wait
273+
)
274+
# Or increase timeout
275+
evaluator = BatchEvaluator(timeout=172800.0) # 48 hours
276+
```
277+
278+
## Migration from Regular Evaluation
279+
280+
Converting existing evaluations to use batch processing is simple:
281+
282+
### Before (Regular API)
283+
```python
284+
from ragas import evaluate
285+
from ragas.metrics import Faithfulness
286+
287+
results = evaluate(
288+
dataset=eval_dataset,
289+
metrics=[Faithfulness(llm=llm)]
290+
)
291+
```
292+
293+
### After (Batch API)
294+
```python
295+
from ragas.batch_evaluation import BatchEvaluator
296+
from ragas.metrics import Faithfulness
297+
298+
# Convert dataset to samples if needed
299+
samples = [sample for sample in eval_dataset]
300+
301+
evaluator = BatchEvaluator(metrics=[Faithfulness(llm=llm)])
302+
results = evaluator.evaluate(samples)
303+
```
304+
305+
The batch API provides significant cost savings while maintaining the same evaluation quality, making it an excellent choice for large-scale evaluation workloads.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ nav:
103103
- Write your own Metrics - (advanced): howtos/customizations/metrics/_write_your_own_metric_advanced.md
104104
- Train and Align Metrics: howtos/customizations/metrics/train_your_own_metric.md
105105
- Systematic Approach for Prompt Optimization: howtos/applications/prompt_optimization.md
106+
- Batch Evaluation: howtos/customizations/metrics/batch_evaluation.md
106107
- Testset Generation:
107108
- Non-English Testset Generation: howtos/customizations/testgenerator/_language_adaptation.md
108109
- Persona Generation: howtos/customizations/testgenerator/_persona_generator.md

0 commit comments

Comments
 (0)