Skip to content

Commit 0962c36

Browse files
committed
#avmxf - Octopus Pipeline Overview
1 parent d175631 commit 0962c36

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

octopus/doc.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,24 @@ Current Features of the crawler include:
1313
4. Filter Duplicates.
1414
5. Filter URLs that fail a HEAD request.
1515
6. User specifiable max timeout between two successive url requests.
16+
17+
18+
Pipeline Overview
19+
20+
The overview of the Pipeline is given below:
21+
1. Ingest
22+
2. Link Absolution
23+
3. Protocol Filter
24+
4. Duplicate Filter
25+
5. Invalid Url Filter (Urls whose HEAD request Fails)
26+
6. Make GET Request
27+
7a. Send to Output Adapter
28+
7b. Check for Timeout (gap between two output on this channel).
29+
8. Max Links Crawled Limit Filter
30+
9. Depth Limit Filter
31+
10. Parse Page for more URLs.
32+
33+
Note: The output from 7b. is fed to 8.
34+
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7b -> 8 -> 9 -> 10 -> 1
1635
*/
1736
package octopus

0 commit comments

Comments
 (0)