Skip to content

olacodes/webscraper

Repository files navigation

Web Scraper with Selenium

Web Scrapper

Black Code Style

Technologies

  • Python 3.9 : Base programming language for development
  • Django Framework : Development framework used for the application
  • Django Rest Framework : Provides API development tools for easy API development
  • Bash Scripting : Create convenient script for easy development experience
  • Selenium : A free (open-source) automated testing framework used to validate web applications across different browsers and platforms
  • Celery: A simple, flexible, and reliable distributed system to process vast amounts of tasks
  • Flower: A web based tool for monitoring and administrating Celery clusters.
  • SQLite: Application relational databases for development
  • Redis: A NoSQL Database that serves as a Celery Broker and Result Backend
  • Github Actions : Continuous Integration and Deployment
  • Docker Engine and Docker Compose : Containerization of the application and services orchestration

A Simple Architecture

A Web Scrapper Architecture

Getting Started

Getting started with this project is very simple, all you need is to have Git and Docker Engine installed on your machine.

  • Clone the repository git clone https://github.com/olacodes/webscraper.git
  • change directory cd webscraper.
  • Run docker-compose up --build
    • NB: Running the above command for the first time will download all docker-images and third party packages needed for the app. This will take up to 5 minutes or more for the first build, others will be in a blink of an eye

At this moment, your project should be up and running and start up the following Servers:

Exploring The App

Make sure that all the above servers are running before you start exploring the project. If those servers are up and running, Let's have fun with the app!!!

Web Scraper

  • Go to http://localhost:8000 on your browser

  • Click on the Scrape button to scrape data (pdf urls) from greenbooklive.com

  • Click on Download button to download the pdf files.

    • NB: This files will be saved in the root directory pdfs/
  • You can click on the about to read more about the Scraper app.

Flower

You can also monitor and administer PDF Downloads Background Job with flower. Go to http://localhost:5555 on your browser.

Login with username: debug and password: secret

Selenium

Selenium process is running on http://localhost:4444

License

The MIT License - Copyright (c) 2022 - Present, WebScraper.

Author

Sodiq Olatunde

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published