-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
canCrawl() wrongly returns true no matter what in some situations. Example:
const robotsParser = require('robots-txt-parser')
const robots = robotsParser()
const test = async () => {
await robots.useRobotsFor('https://google.com/')
console.log(await robots.canCrawl('https://google.com/search'))
}
(async () => {
await test() // Returns true, the wrong answer
await test() // Returns false, the right answer
})()I haven't dived into what's causing this weirdness, but I'm guessing the problem might be that active only gets set if the link is in the cache and not when it is freshly fetched here:
robots-txt-parser/src/robots.js
Lines 74 to 92 in a510f1a
| useRobotsFor(url, callback) { | |
| const link = util.formatLink(url); | |
| if (this.isCached(link)) { | |
| this.active = link; | |
| if (util.isFunction(callback)) { | |
| return callback(); | |
| } | |
| return Promise.resolve(); | |
| } | |
| const fetch = this.fetch(url); | |
| if (util.isFunction(callback)) { | |
| return fetch.then(callback); | |
| } | |
| return fetch; | |
| } |
abhagsain
Metadata
Metadata
Assignees
Labels
No labels