-
Notifications
You must be signed in to change notification settings - Fork 9
Description
when you try to get sitemap for a website like https://www.sainsburys.co.uk it returns an empty array. But i have checked https://www.sainsburys.co.uk/robots.txt. The sitemap url exists in robots.txt.
So I did a little digging and found out the server was denying the request. The response was this.
`https://www.sainsburys.co.uk/robots.txt
<TITLE>Access Denied</TITLE>Access Denied
You don't have permission to access "http://www.sainsburys.co.uk/robots.txt" on this server.
Reference #18.878f7b5c.1723733159.22628e9
https://errors.edgesuite.net/18.878f7b5c.1723733159.22628e9
`I can see that there were no headers added when requesting respective robots.txt url. So I added headers following headers in the get.concat and it worked for me.
headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.9', 'Accept-Encoding': 'gzip, deflate, br' }
I'll be happy to contribute. As it is a small change.
Regards,
Vamse.