- Python 3.9 : Base programming language for development
- Django Framework : Development framework used for the application
- Django Rest Framework : Provides API development tools for easy API development
- Bash Scripting : Create convenient script for easy development experience
- Selenium : A free (open-source) automated testing framework used to validate web applications across different browsers and platforms
- Celery: A simple, flexible, and reliable distributed system to process vast amounts of tasks
- Flower: A web based tool for monitoring and administrating Celery clusters.
- SQLite: Application relational databases for development
- Redis: A NoSQL Database that serves as a Celery Broker and Result Backend
- Github Actions : Continuous Integration and Deployment
- Docker Engine and Docker Compose : Containerization of the application and services orchestration
Getting started with this project is very simple, all you need is to have Git and Docker Engine installed on your machine.
- Clone the repository
git clone https://github.com/olacodes/webscraper.git - change directory
cd webscraper. - Run
docker-compose up --build- NB: Running the above command for the first time will download all docker-images and third party packages needed for the app. This will take up to 5 minutes or more for the first build, others will be in a blink of an eye
At this moment, your project should be up and running and start up the following Servers:
- Django Development Server: http://localhost:8000
- Redis Server: http://localhost:6379
- Flower: http://localhost:5555
- Selenium: http://localhost:4444
Make sure that all the above servers are running before you start exploring the project. If those servers are up and running, Let's have fun with the app!!!
-
Go to
http://localhost:8000on your browser -
Click on the Scrape button to scrape data (pdf urls) from
greenbooklive.com -
Click on
Downloadbutton to download the pdf files.- NB: This files will be saved in the root directory
pdfs/
- NB: This files will be saved in the root directory
-
You can click on the
aboutto read more about the Scraper app.
You can also monitor and administer PDF Downloads Background Job with flower. Go to http://localhost:5555 on your browser.
Login with username: debug and password: secret
Selenium process is running on http://localhost:4444
The MIT License - Copyright (c) 2022 - Present, WebScraper.
Sodiq Olatunde
