Repo to contain the assignments for DSCI 553: Foundations and Applications of Data Mining course at USC.
Instructor: Professor Wei-Min Shen (Spring 2023)
Follow these instructions to run the script locally and on Vocareum.
For additional details, look at the particular README of the homeworks individually.
| Assignment | Topic | Implementation | Concepts | Dataset |
|---|---|---|---|---|
| Homework 0 | Setting up development environment |
Python, Scala | Map-Reduce |
None |
| Homework 1 | Data Exploration on Yelp Dataset |
Python | Map-Reduce |
Test, Full |
| Homework 2 | Frequent Item-set Mining |
Python | SON Algorithm, Apriori Algorithm, Frequent Item-sets |
Simulated, Real-world |
| Homework 3 | Locality Sensitive Hashing (LSH), Collaborative Filtering, Recommendation Systems |
Python | Min-Hashing, Locality Sensitive Hashing, Pearson Similarity, Model-based Recommendation System |
Training and Validation |
| Homework 4 | Community Detection | Python | Girvan-Newman Algorithm, Label Propagation Algorithm |
Graph Data |
| Homework 5 | Processing Data Streams | Python | Bloom Filter, Flajolet-Martin Algorithm, Reservoir Sampling |
Seed dataset for stream + Stream Generator |
| Homework 6 | Clustering | Python | Bradley-Fayyad-Reina (BFR) Algorithm |
Synthetic dataset |
| Competition Project | Recommendation System | Python | Recommendation Systems |
Same as homework 3 |