Web Scraper with Selenium

Technologies

Python 3.9 : Base programming language for development
Django Framework : Development framework used for the application
Django Rest Framework : Provides API development tools for easy API development
Bash Scripting : Create convenient script for easy development experience
Selenium : A free (open-source) automated testing framework used to validate web applications across different browsers and platforms
Celery: A simple, flexible, and reliable distributed system to process vast amounts of tasks
Flower: A web based tool for monitoring and administrating Celery clusters.
SQLite: Application relational databases for development
Redis: A NoSQL Database that serves as a Celery Broker and Result Backend
Github Actions : Continuous Integration and Deployment
Docker Engine and Docker Compose : Containerization of the application and services orchestration

A Simple Architecture

Getting Started

Getting started with this project is very simple, all you need is to have Git and Docker Engine installed on your machine.

Clone the repository git clone https://github.com/olacodes/webscraper.git
change directory cd webscraper.
Run docker-compose up --build
- NB: Running the above command for the first time will download all docker-images and third party packages needed for the app. This will take up to 5 minutes or more for the first build, others will be in a blink of an eye

At this moment, your project should be up and running and start up the following Servers:

Django Development Server: http://localhost:8000
Redis Server: http://localhost:6379
Flower: http://localhost:5555
Selenium: http://localhost:4444

Exploring The App

Make sure that all the above servers are running before you start exploring the project. If those servers are up and running, Let's have fun with the app!!!

Web Scraper

Go to http://localhost:8000 on your browser
Click on the Scrape button to scrape data (pdf urls) from greenbooklive.com
Click on Download button to download the pdf files.
- NB: This files will be saved in the root directory pdfs/
You can click on the about to read more about the Scraper app.

Flower

You can also monitor and administer PDF Downloads Background Job with flower. Go to http://localhost:5555 on your browser.

Login with username: debug and password: secret

Selenium

Selenium process is running on http://localhost:4444

License

Author

Sodiq Olatunde

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
compose/django		compose/django
config		config
requirements		requirements
static		static
webscraper		webscraper
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env		.env
.gitignore		.gitignore
.pre-commit-config.yml		.pre-commit-config.yml
README.md		README.md
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
manage.py		manage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper with Selenium

Technologies

A Simple Architecture

Getting Started

Exploring The App

Web Scraper

Flower

Selenium

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

olacodes/webscraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper with Selenium

Technologies

A Simple Architecture

Getting Started

Exploring The App

Web Scraper

Flower

Selenium

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages