File tree Expand file tree Collapse file tree 1 file changed +19
-0
lines changed
Expand file tree Collapse file tree 1 file changed +19
-0
lines changed Original file line number Diff line number Diff line change @@ -13,5 +13,24 @@ Current Features of the crawler include:
1313 4. Filter Duplicates.
1414 5. Filter URLs that fail a HEAD request.
1515 6. User specifiable max timeout between two successive url requests.
16+
17+
18+ Pipeline Overview
19+
20+ The overview of the Pipeline is given below:
21+ 1. Ingest
22+ 2. Link Absolution
23+ 3. Protocol Filter
24+ 4. Duplicate Filter
25+ 5. Invalid Url Filter (Urls whose HEAD request Fails)
26+ 6. Make GET Request
27+ 7a. Send to Output Adapter
28+ 7b. Check for Timeout (gap between two output on this channel).
29+ 8. Max Links Crawled Limit Filter
30+ 9. Depth Limit Filter
31+ 10. Parse Page for more URLs.
32+
33+ Note: The output from 7b. is fed to 8.
34+ 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7b -> 8 -> 9 -> 10 -> 1
1635 */
1736package octopus
You can’t perform that action at this time.
0 commit comments