MtA Alumni LinkedIn Scraping

The main.py script dynamically scrapes LinkedIn for Mount Allison University (MtA) alumni data using Selenium. The schema for each alum (using Polars data types) is as follows:

full_name: String
latest_title: String
latest_company: String
mta_degree: String
mta_grad_year: UInt16
location: String
profile_url: String

(If an alum has multiple education entries for MtA—say, a Bachelor's followed by a Master's—only the most recent degree is listed. Listed "alumni" with no graduation year or with a graduation year later than the current year are excluded, as these may be current students.) The structured data is then saved to the mta_alumni.csv file (a sample file from a custom run is provided in this repository). Profile URLs collected in the initial phase of the scraping process are also saved to a temporary text file in a temp/ directory, just in case the account is flagged and banned by LinkedIn midway through the scraping process (or any other error is thrown).

This script was created at the behest of MtA's Recruitment and Admissions Coordinator, Curtis Michaelis, for data analysis by the Recruitment and Admissions Office. To run it yourself, you must have Firefox installed and the GeckoDriver executable available in your system PATH. You must also specify the following command-line arguments in the given order:

the email address associated with their LinkedIn account;
the password for their LinkedIn account; and
the maximum number of times to click the "Show more results" button on the alumni page (each click loads approximately 15 to 20 additional profiles).

(DISCLAIMER: There is always the risk that your LinkedIn account may be flagged and banned should you use this script. I have taken reasonable measures to mimic human behavior, but I cannot guarantee foolproof undetectability.)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
mta_alumni.csv		mta_alumni.csv
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MtA Alumni LinkedIn Scraping

About

Uh oh!

Releases

Packages

Languages

License

Luis-Varona/mta-alumni-linkedin-scraping

Folders and files

Latest commit

History

Repository files navigation

MtA Alumni LinkedIn Scraping

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages