Skip to content

Commit bb4367f

Browse files
committed
v 0.0.2 - CrawlRate, RespectRobots
- Addition of CrawlRate and option to respect robots.txt to CrawlOptions Signed-off-by: Rahul Thomas <thomas.rah@husky.neu.edu>
1 parent f481793 commit bb4367f

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

octopus/models.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
package octopus
22

3-
import "io"
3+
import (
4+
"io"
5+
"time"
6+
)
47

58
// Node is used to represent each crawled link and its associated depth of crawl.
69
type Node struct {
@@ -26,11 +29,15 @@ type webOctopus struct {
2629
// IncludeBody - Include the response Body in the crawled Node (for further processing).
2730
// OpAdapter is a user specified concrete implementation of an Output Adapter. The crawler
2831
// will pump output onto the implementation's channel returned by its Consume method.
32+
// CrawlRate is the rate at which requests will be made.
33+
// RespectRobots (unimplemented) choose whether to respect robots.txt or not.
2934
type CrawlOptions struct {
3035
DepthPerLink int16
3136
MaxLinksCrawled int64
3237
StayWithinBaseHost bool
3338
BaseURLString string
39+
CrawlRate time.Duration
40+
RespectRobots bool
3441
IncludeBody bool
3542
OpAdapter OutputAdapter
3643
}

0 commit comments

Comments
 (0)