Skip to content

Commit c09a576

Browse files
committed
docs: update adaptive crawler docs and cache defaults; remove deprecated examples (unclecode#1330)
- Replace BaseStrategy with CrawlStrategy in custom strategy examples (DomainSpecificStrategy, HybridStrategy) - Remove “Custom Link Scoring” and “Caching Strategy” sections no longer aligned with current library - Revise memory pruning example to use adaptive.get_relevant_content and index-based retention of top 500 docs - Correct Quickstart note: default cache mode is CacheMode.BYPASS; instruct enabling with CacheMode.ENABLED
1 parent 8bb0e68 commit c09a576

File tree

2 files changed

+10
-42
lines changed

2 files changed

+10
-42
lines changed

docs/md_v2/advanced/adaptive-strategies.md

Lines changed: 9 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -126,30 +126,6 @@ Factors:
126126
- URL depth (fewer slashes = higher authority)
127127
- Clean URL structure
128128

129-
### Custom Link Scoring
130-
131-
```python
132-
class CustomLinkScorer:
133-
def score(self, link: Link, query: str, state: CrawlState) -> float:
134-
# Prioritize specific URL patterns
135-
if "/api/reference/" in link.href:
136-
return 2.0 # Double the score
137-
138-
# Deprioritize certain sections
139-
if "/archive/" in link.href:
140-
return 0.1 # Reduce score by 90%
141-
142-
# Default scoring
143-
return 1.0
144-
145-
# Use with adaptive crawler
146-
adaptive = AdaptiveCrawler(
147-
crawler,
148-
config=config,
149-
link_scorer=CustomLinkScorer()
150-
)
151-
```
152-
153129
## Domain-Specific Configurations
154130

155131
### Technical Documentation
@@ -230,8 +206,12 @@ config = AdaptiveConfig(
230206

231207
# Periodically clean state
232208
if len(state.knowledge_base) > 1000:
233-
# Keep only most relevant
234-
state.knowledge_base = get_top_relevant(state.knowledge_base, 500)
209+
# Keep only the top 500 most relevant docs
210+
top_content = adaptive.get_relevant_content(top_k=500)
211+
keep_indices = {d["index"] for d in top_content}
212+
state.knowledge_base = [
213+
doc for i, doc in enumerate(state.knowledge_base) if i in keep_indices
214+
]
235215
```
236216

237217
### Parallel Processing
@@ -252,18 +232,6 @@ tasks = [
252232
results = await asyncio.gather(*tasks)
253233
```
254234

255-
### Caching Strategy
256-
257-
```python
258-
# Enable caching for repeated crawls
259-
async with AsyncWebCrawler(
260-
config=BrowserConfig(
261-
cache_mode=CacheMode.ENABLED
262-
)
263-
) as crawler:
264-
adaptive = AdaptiveCrawler(crawler, config)
265-
```
266-
267235
## Debugging & Analysis
268236

269237
### Enable Verbose Logging
@@ -322,9 +290,9 @@ with open("crawl_analysis.json", "w") as f:
322290
### Implementing a Custom Strategy
323291

324292
```python
325-
from crawl4ai.adaptive_crawler import BaseStrategy
293+
from crawl4ai.adaptive_crawler import CrawlStrategy
326294

327-
class DomainSpecificStrategy(BaseStrategy):
295+
class DomainSpecificStrategy(CrawlStrategy):
328296
def calculate_coverage(self, state: CrawlState) -> float:
329297
# Custom coverage calculation
330298
# e.g., weight certain terms more heavily
@@ -351,7 +319,7 @@ adaptive = AdaptiveCrawler(
351319
### Combining Strategies
352320

353321
```python
354-
class HybridStrategy(BaseStrategy):
322+
class HybridStrategy(CrawlStrategy):
355323
def __init__(self):
356324
self.strategies = [
357325
TechnicalDocStrategy(),

docs/md_v2/core/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ if __name__ == "__main__":
7979
asyncio.run(main())
8080
```
8181

82-
> IMPORTANT: By default cache mode is set to `CacheMode.ENABLED`. So to have fresh content, you need to set it to `CacheMode.BYPASS`
82+
> IMPORTANT: By default cache mode is set to `CacheMode.BYPASS` to have fresh content. Set `CacheMode.ENABLED` to enable caching.
8383
8484
We’ll explore more advanced config in later tutorials (like enabling proxies, PDF output, multi-tab sessions, etc.). For now, just note how you pass these objects to manage crawling.
8585

0 commit comments

Comments
 (0)