@@ -4,73 +4,34 @@ This guide covers proxy configuration and security features in Crawl4AI, includi
44
55## Understanding Proxy Configuration
66
7- Crawl4AI supports proxy configuration at two levels:
8-
9- ### BrowserConfig.proxy_config
10- Sets proxy at the ** browser level** - affects all pages/tabs in that browser instance. Use this when:
11- - You want all crawls from this browser to use the same proxy
12- - You're using a single proxy for the entire session
13- - You need persistent proxy settings across multiple crawls
14-
15- ### CrawlerRunConfig.proxy_config
16- Sets proxy at the ** request level** - can be different for each crawl operation. Use this when:
17- - You want per-request proxy control
18- - You're implementing proxy rotation
19- - Different URLs need different proxies
7+ Crawl4AI recommends configuring proxies per request through ` CrawlerRunConfig.proxy_config ` . This gives you precise control, enables rotation strategies, and keeps examples simple enough to copy, paste, and run.
208
219## Basic Proxy Setup
2210
23- ### Browser-Level Proxy (BrowserConfig)
24-
25- Configure proxies that apply to the entire browser session:
11+ Configure proxies that apply to each crawl operation:
2612
2713``` python
28- from crawl4ai import AsyncWebCrawler, BrowserConfig
29-
30- # Using dictionary configuration
31- browser_config = BrowserConfig(proxy_config = {
32- " server" : " http://proxy.example.com:8080"
33- })
34-
35- # Using ProxyConfig object
36- from crawl4ai import ProxyConfig
37- proxy = ProxyConfig(server = " http://proxy.example.com:8080" )
38- browser_config = BrowserConfig(proxy_config = proxy)
39-
40- # Using string (auto-parsed)
41- browser_config = BrowserConfig(proxy_config = " http://proxy.example.com:8080" )
42-
43- async with AsyncWebCrawler(config = browser_config) as crawler:
44- result = await crawler.arun(url = " https://example.com" )
45- ```
46-
47- ### Request-Level Proxy (CrawlerRunConfig)
48-
49- Configure proxies that can be customized per crawl operation:
14+ import asyncio
15+ from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, ProxyConfig
5016
51- ``` python
52- from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
17+ run_config = CrawlerRunConfig(proxy_config = ProxyConfig(server = " http://proxy.example.com:8080" ))
18+ # run_config = CrawlerRunConfig(proxy_config={"server": "http://proxy.example.com:8080"})
19+ # run_config = CrawlerRunConfig(proxy_config="http://proxy.example.com:8080")
5320
54- # Using dictionary configuration
55- run_config = CrawlerRunConfig(proxy_config = {
56- " server" : " http://proxy.example.com:8080"
57- })
5821
59- # Using ProxyConfig object
60- from crawl4ai import ProxyConfig
61- proxy = ProxyConfig(server = " http://proxy.example.com:8080" )
62- run_config = CrawlerRunConfig(proxy_config = proxy)
22+ async def main ():
23+ browser_config = BrowserConfig()
24+ async with AsyncWebCrawler(config = browser_config) as crawler:
25+ result = await crawler.arun(url = " https://example.com" , config = run_config)
26+ print (f " Success: { result.success} -> { result.url} " )
6327
64- # Using string (auto-parsed)
65- run_config = CrawlerRunConfig(proxy_config = " http://proxy.example.com:8080" )
6628
67- browser_config = BrowserConfig()
68- async with AsyncWebCrawler(config = browser_config) as crawler:
69- result = await crawler.arun(url = " https://example.com" , config = run_config)
29+ if __name__ == " __main__" :
30+ asyncio.run(main())
7031```
7132
72- !!! note "Priority Order "
73- When both ` BrowserConfig .proxy_config` and ` CrawlerRunConfig.proxy_config ` are set, ` CrawlerRunConfig.proxy_config ` takes precedence for that specific crawl operation .
33+ !!! note "Why request-level? "
34+ ` CrawlerRunConfig .proxy_config` keeps each request self-contained, so swapping proxies or rotation strategies is just a matter of building a new run configuration .
7435
7536## Supported Proxy Formats
7637
@@ -100,27 +61,33 @@ proxy5 = ProxyConfig.from_string("192.168.1.1:8080:user:pass")
10061For proxies requiring authentication:
10162
10263``` python
103- from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
104-
105- # Using dictionary
106- run_config = CrawlerRunConfig(proxy_config = {
107- " server" : " http://proxy.example.com:8080" ,
108- " username" : " your_username" ,
109- " password" : " your_password"
110- })
64+ import asyncio
65+ from crawl4ai import AsyncWebCrawler,BrowserConfig, CrawlerRunConfig, ProxyConfig
11166
112- # Using ProxyConfig object
113- from crawl4ai import ProxyConfig
114- proxy = ProxyConfig(
115- server = " http://proxy.example.com:8080 " ,
116- username = " your_username " ,
117- password = " your_password "
67+ run_config = CrawlerRunConfig(
68+ proxy_config = ProxyConfig(
69+ server = " http://proxy.example.com:8080 " ,
70+ username = " your_username " ,
71+ password = " your_password " ,
72+ )
11873)
119- run_config = CrawlerRunConfig(proxy_config = proxy)
74+ # Or dictionary style:
75+ # run_config = CrawlerRunConfig(proxy_config={
76+ # "server": "http://proxy.example.com:8080",
77+ # "username": "your_username",
78+ # "password": "your_password",
79+ # })
12080
121- browser_config = BrowserConfig()
122- async with AsyncWebCrawler(config = browser_config) as crawler:
123- result = await crawler.arun(url = " https://example.com" , config = run_config)
81+
82+ async def main ():
83+ browser_config = BrowserConfig()
84+ async with AsyncWebCrawler(config = browser_config) as crawler:
85+ result = await crawler.arun(url = " https://example.com" , config = run_config)
86+ print (f " Success: { result.success} -> { result.url} " )
87+
88+
89+ if __name__ == " __main__" :
90+ asyncio.run(main())
12491```
12592
12693## Environment Variable Configuration
@@ -149,9 +116,10 @@ Crawl4AI supports automatic proxy rotation to distribute requests across multipl
149116
150117### Proxy Rotation (recommended)
151118``` python
119+ import asyncio
120+ import re
152121from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, ProxyConfig
153122from crawl4ai.proxy_strategy import RoundRobinProxyStrategy
154- import re
155123
156124async def main ():
157125 # Load proxies from environment
@@ -195,7 +163,8 @@ async def main():
195163 else :
196164 print (f " ❌ Request { i+ 1 } : Failed - { result.error_message} " )
197165
198- asyncio.run(main())
166+ if __name__ == " __main__" :
167+ asyncio.run(main())
199168```
200169
201170## SSL Certificate Analysis
@@ -204,58 +173,69 @@ Combine proxy usage with SSL certificate inspection for enhanced security analys
204173
205174### Per-Request SSL Certificate Analysis
206175``` python
176+ import asyncio
207177from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
208178
209- # Configure proxy with SSL certificate fetching per request
210179run_config = CrawlerRunConfig(
211180 proxy_config = {
212181 " server" : " http://proxy.example.com:8080" ,
213182 " username" : " user" ,
214- " password" : " pass"
183+ " password" : " pass" ,
215184 },
216- fetch_ssl_certificate = True # Enable SSL certificate analysis for this request
185+ fetch_ssl_certificate = True , # Enable SSL certificate analysis for this request
217186)
218187
219- browser_config = BrowserConfig()
220- async with AsyncWebCrawler(config = browser_config) as crawler:
221- result = await crawler.arun(url = " https://example.com" , config = run_config)
222-
223- if result.success:
224- print (f " ✅ Crawled via proxy: { result.url} " )
225-
226- # Analyze SSL certificate
227- if result.ssl_certificate:
228- cert = result.ssl_certificate
229- print (" 🔒 SSL Certificate Info:" )
230- print (f " Issuer: { cert.issuer} " )
231- print (f " Subject: { cert.subject} " )
232- print (f " Valid until: { cert.valid_until} " )
233- print (f " Fingerprint: { cert.fingerprint} " )
234-
235- # Export certificate
236- cert.to_json(" certificate.json" )
237- print (" 💾 Certificate exported to certificate.json" )
238- else :
239- print (" ⚠️ No SSL certificate information available" )
188+
189+ async def main ():
190+ browser_config = BrowserConfig()
191+ async with AsyncWebCrawler(config = browser_config) as crawler:
192+ result = await crawler.arun(url = " https://example.com" , config = run_config)
193+
194+ if result.success:
195+ print (f " ✅ Crawled via proxy: { result.url} " )
196+
197+ # Analyze SSL certificate
198+ if result.ssl_certificate:
199+ cert = result.ssl_certificate
200+ print (" 🔒 SSL Certificate Info:" )
201+ print (f " Issuer: { cert.issuer} " )
202+ print (f " Subject: { cert.subject} " )
203+ print (f " Valid until: { cert.valid_until} " )
204+ print (f " Fingerprint: { cert.fingerprint} " )
205+
206+ # Export certificate
207+ cert.to_json(" certificate.json" )
208+ print (" 💾 Certificate exported to certificate.json" )
209+ else :
210+ print (" ⚠️ No SSL certificate information available" )
211+
212+
213+ if __name__ == " __main__" :
214+ asyncio.run(main())
240215```
241216
242217## Security Best Practices
243218
244219### 1. Proxy Rotation for Anonymity
245220``` python
221+ from crawl4ai import CrawlerRunConfig, ProxyConfig
222+ from crawl4ai.proxy_strategy import RoundRobinProxyStrategy
223+
246224# Use multiple proxies to avoid IP blocking
247225proxies = ProxyConfig.from_env(" PROXIES" )
248226strategy = RoundRobinProxyStrategy(proxies)
249227
250228# Configure rotation per request (recommended)
251229run_config = CrawlerRunConfig(proxy_rotation_strategy = strategy)
252230
253- # If you want a single static proxy across all requests, set a fixed ProxyConfig at browser-level:
254- # browser_config = BrowserConfig(proxy_config=proxies[0])
231+ # For a fixed proxy across all requests, just reuse the same run_config instance
232+ static_run_config = run_config
255233```
256234
257235### 2. SSL Certificate Verification
258236``` python
237+ from crawl4ai import CrawlerRunConfig
238+
259239# Always verify SSL certificates when possible
260240# Per-request (affects specific requests)
261241run_config = CrawlerRunConfig(fetch_ssl_certificate = True )
@@ -270,30 +250,24 @@ export PROXIES="ip1:port1:user1:pass1,ip2:port2:user2:pass2"
270250
271251### 4. SOCKS5 for Enhanced Security
272252``` python
273- # Prefer SOCKS5 proxies for better protocol support
274- # Browser-level
275- browser_config = BrowserConfig(proxy_config = " socks5://proxy.example.com:1080" )
253+ from crawl4ai import CrawlerRunConfig
276254
277- # Or request-level
255+ # Prefer SOCKS5 proxies for better protocol support
278256run_config = CrawlerRunConfig(proxy_config = " socks5://proxy.example.com:1080" )
279257```
280258
281259## Migration from Deprecated ` proxy ` Parameter
282260
283261!!! warning "Deprecation Notice"
284- The ` proxy ` parameter in ` BrowserConfig ` is deprecated. Use ` proxy_config ` in either ` BrowserConfig ` or ` CrawlerRunConfig ` instead .
262+ The legacy ` proxy ` argument on ` BrowserConfig ` is deprecated. Configure proxies through ` CrawlerRunConfig. proxy_config` so each request fully describes its network settings .
285263
286264``` python
287- # Old (deprecated)
288- browser_config = BrowserConfig(proxy = " http://proxy.example.com:8080" )
289-
290- # You will see a warning similar to:
291- # DeprecationWarning: BrowserConfig.proxy is deprecated and ignored. Use proxy_config instead.
292-
293- # New (recommended) - Browser-level default
294- browser_config = BrowserConfig(proxy_config = " http://proxy.example.com:8080" )
265+ # Old (deprecated) approach
266+ # from crawl4ai import BrowserConfig
267+ # browser_config = BrowserConfig(proxy="http://proxy.example.com:8080")
295268
296- # Or request-level override (takes precedence per request)
269+ # New (preferred) approach
270+ from crawl4ai import CrawlerRunConfig
297271run_config = CrawlerRunConfig(proxy_config = " http://proxy.example.com:8080" )
298272```
299273
@@ -311,23 +285,20 @@ def safe_proxy_repr(proxy: ProxyConfig):
311285
312286### Common Issues
313287
314- 1 . ** Proxy Connection Failed**
315- - Verify proxy server is accessible
316- - Check authentication credentials
317- - Ensure correct protocol (http/https/socks5)
318-
319- 2 . ** SSL Certificate Errors**
320- - Some proxies may interfere with SSL inspection
321- - Try different proxy or disable SSL verification if necessary
322-
323- 3 . ** Environment Variables Not Loading**
324- - Ensure PROXIES variable is set correctly
325- - Check comma separation and format: ` ip:port:user:pass,ip:port:user:pass `
288+ ???+ question "Proxy connection failed"
289+ - Verify the proxy server is reachable from your network.
290+ - Double-check authentication credentials.
291+ - Ensure the protocol matches (` http ` , ` https ` , or ` socks5 ` ).
326292
327- 4 . ** Proxy Rotation Not Working**
328- - Verify proxies are loaded: ` len(proxies) > 0 `
329- - Check proxy strategy is set on ` CrawlerRunConfig ` via ` proxy_rotation_strategy `
330- - Ensure ` proxy_config ` is a valid ` ProxyConfig ` (when using a static proxy)
293+ ???+ question "SSL certificate errors"
294+ - Some proxies break SSL inspection; switch proxies if you see repeated failures.
295+ - Consider temporarily disabling certificate fetching to isolate the issue.
331296
332- <!-- Removed duplicate Supported Proxy Formats section (already covered above) -->
297+ ???+ question "Environment variables not loading"
298+ - Confirm ` PROXIES ` (or your custom env var) is set before running the script.
299+ - Check formatting: ` ip:port:user:pass,ip:port:user:pass ` .
333300
301+ ???+ question "Proxy rotation not working"
302+ - Ensure ` ProxyConfig.from_env() ` actually loaded entries (` len(proxies) > 0 ` ).
303+ - Attach ` proxy_rotation_strategy ` to ` CrawlerRunConfig ` .
304+ - Validate the proxy definitions you pass into the strategy.
0 commit comments