Skip to content

Bug (in useRobotsFor()?) causes canCrawl() to sometimes return incorrect result #5

@Trott

Description

@Trott

canCrawl() wrongly returns true no matter what in some situations. Example:

const robotsParser = require('robots-txt-parser')
const robots = robotsParser()

const test = async () => {
  await robots.useRobotsFor('https://google.com/')

  console.log(await robots.canCrawl('https://google.com/search'))
}

(async () => {
  await test() // Returns true, the wrong answer
  await test() // Returns false, the right answer
})()

I haven't dived into what's causing this weirdness, but I'm guessing the problem might be that active only gets set if the link is in the cache and not when it is freshly fetched here:

useRobotsFor(url, callback) {
const link = util.formatLink(url);
if (this.isCached(link)) {
this.active = link;
if (util.isFunction(callback)) {
return callback();
}
return Promise.resolve();
}
const fetch = this.fetch(url);
if (util.isFunction(callback)) {
return fetch.then(callback);
}
return fetch;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions