This project implements a simple sentiment analysis classifier using logistic regression, trained using manual gradient descent, without relying on any machine learning libraries like scikit-learn or TensorFlow. It demonstrates end-to-end model development from raw text to evaluation.
โโโ tweets.txt # Your labeled tweet dataset
โโโ sentiment_analysis.py
โโโ README.md
- ๐ฆ Logistic Regression from scratch
- ๐ Manual Gradient Descent for weight updates
- ๐งน Text Preprocessing & Word Frequency Vectorization
- ๐ Loss vs Epochs graph
- ๐ Parameter Convergence plots
- ๐ Evaluation with Confusion Matrix & Custom Metrics
The dataset file tweets.txt should contain tweets in the following format:
I love this product || Positive
This is the worst thing ever || Negative
Each line contains a tweet and its sentiment label (Positive or Negative) separated by ||.
| Component | Complexity | Description |
|---|---|---|
| Vocabulary Build | O(N ร L) | N = # of tweets, L = avg. words per tweet |
| Vectorization | O(N ร V) | V = vocabulary size |
| Training Loop | O(E ร N ร V) | E = # of epochs (includes gradient computation) |
| Evaluation | O(N ร V) | Same as vectorization for test set |
We use a logistic regression model where:
Gradient Descent Weight Update:
error = predicted - actual
w1 -= learning_rate * error * pos_freq
w2 -= learning_rate * error * neg_freq
bias -= learning_rate * error- ๐ Loss vs Epochs: Shows how training error decreases over time
- ๐ Parameter Convergence: Plots
w1,w2, andbiasvs loss with circle markers for better interpretability
Evaluation is done using a custom implementation, without any external libraries:
- โ Accuracy
- ๐ Precision
- ๐ฏ Recall
- ๐งฎ F1 Score
- ๐งฎ Confusion Matrix
- Python 3.6+
matplotlib(for plotting)
pip install matplotlibpython sentiment_analysis.pyWeights: w1=0.45, w2=-0.27, bias=0.62
Confusion Matrix:
[[7, 2],
[1, 10]]
Accuracy: 0.85
Precision: 0.83
Recall: 0.91
F1 Score: 0.87Created by Waqar
Inspired by hands-on ML principles and low-level learning