Skip to content

Facing issue regarding deltafetch #30

@gopal1414

Description

@gopal1414

Kinldy help me on below issue:

I tried to crawl the data using “DeltaFetch”, but facing below issue:

My DB file is getting updated both time when i am using below command to run the Crawler
“$ scrapy crawl quotes -a deltafetch_reset=1”
“$ scrapy crawl quotes -a deltafetch_reset=0”

My DB file is not getting updated when i am using below command:
“$ scrapy crawl quotes”

Below are the updation i have done in setting.py file:
SPIDER_MIDDLEWARES = {
‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True,
‘scrapy_deltafetch.DeltaFetch’: 100,
}

COOKIES_ENABLED = True
COOKIES_DEBUG = True
DELTAFETCH_ENABLED = True
DELTAFETCH_DIR = ‘/home/administrator/apps/scrapy-deltafetch/Crawling/Crawling/crawl_output’
DOTSCRAPY_ENABLED = True

please find my below code:

import scrapy
from selenium import webdriver
from w3lib.url import url_query_parameter

class QuotesSpider(scrapy.Spider):
name = “quotes_git”

def start_requests(self):
urls = [
‘https://www.wikipedia.org/’,
]
for url in urls:
yield scrapy.Request(url=url, meta={‘deltafetch_key’: url_query_parameter(url, ‘abc001’)}, callback=self.parse)
def parse(self, response):
print (‘testing’)
print(response.url)
self.driver = webdriver.Chrome(‘/home/administrator/Downloads/Gopal/Crawling/Crawling/spiders/chromedriver’)

self.driver.get(response.url)
print(‘check point1’)

title = self.driver.title
print (title)

filename = ‘sample_git.txt’
with open(filename, ‘wb’) as f:
f.write(response.url + title)
print (‘done’)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions