LibSenti is an end-to-end, AI-powered application that leverages advanced machine learning and natural language processing (NLP) techniques to classify sentiment polarity โ Positive, Neutral, or Negative โ from student-submitted reviews of IIT and NIT libraries. It combines a robust backend sentiment analysis engine with a visually rich, interactive Streamlit application for real-time review exploration, data insights, and institutional benchmarking.
-
โ Real-time Review Prediction (๐ Sentiment Predictor Tab)
Users can input custom reviews to receive live sentiment predictions with confidence probabilities and visual feedback. -
โ Unigram WordCloud Visualization (๐ค Unigram WordClouds Tab)
Generates wordclouds for individual institutions using most frequent single-word terms found in reviews. -
โ Bigram WordCloud Comparison (๐ Bigram WordClouds Tab)
Displays two-institution comparison of most common word pairs (bigrams) extracted from reviews. -
โ Sentiment Pie Chart Comparison (๐ Pie Chart Comparison Tab)
Side-by-side sentiment distribution pie charts for any two selected institutions, including precise percentage labels. -
โ IIT vs NIT Sentiment Analysis (๐ IIT vs NIT Chart Tab)
Presents a consolidated sentiment comparison chart contrasting IITs and NITs at a glance. -
โ Library Experience Highlights (๐ Library Experiences Tab)
Displays standout user-submitted reviewsโboth best and worst experiencesโcurated by sentiment and length.
- Architecture:
BERTForSequenceClassification - Dataset: IIT & NIT library reviews (labeled Positive, Neutral, Negative)
- Frameworks:
PyTorch,Transformers (HuggingFace) - Accuracy: ~96% on test data
- Label Distribution Handling: Threshold-based for confident classification
LibSenti/ โโโ assets/ โ โโโ wordclouds/ # Wordcloud PNGs for each IIT/NIT โ โโโ iit_vs_nit_sentiment_comparison.png โโโ saved_model/ # Trained BERT model and tokenizer โโโ app.py # Main Streamlit application โโโ train_model.py # Script to train the BERT model โโโ cleaned_iit+nit_library_reviews.csv โโโ sentiment_iit_library_reviews.csv โโโ README.md # This file
| Label | Class |
|---|---|
| Negative | 0 |
| Neutral | 1 |
| Positive | 2 |
Class imbalance is addressed using weighted loss during training and probability thresholds during prediction.
- Base Model:
bert-base-uncased - Framework: Hugging Face Transformers + PyTorch
- Strategy: Fine-tuned with weighted cross-entropy loss for class imbalance
- Threshold logic for better handling of imbalanced classes
- Trained using
TrainerAPI with evaluation metrics and early stopping
Run training script:
python train_model.pyThis script handles:
- Preprocessing
- Tokenization
- Model fine-tuning
- Class weight balancing
- Model saving to ./saved_model/
Launch the Application:
streamlit run app.py
-
๐ฅ Review Classifier (๐ Sentiment Predictor Tab)
Enter any library review and instantly receive a sentiment prediction (Positive, Neutral, Negative). -
๐ Sentiment Probabilities (๐ Sentiment Predictor Tab)
Visualize the confidence scores for each sentiment using interactive progress bars to assess prediction certainty. -
โ๏ธ WordCloud Comparator
- ๐ค (Unigram WordClouds Tab): Select and compare two institutions to explore most frequent individual keywords.
- ๐ (Bigram WordClouds Tab): Compare most common two-word combinations to find phrase patterns in reviews.
-
๐ Sentiment Pie Chart Comparison (๐ Pie Chart Comparison Tab)
Instantly loads sentiment distribution charts for selected institutions side-by-side for intuitive visual analysis. -
๐งฎ IIT vs NIT Overall Chart (๐ IIT vs NIT Chart Tab)
A comparative sentiment distribution chart to analyze trends across all IITs vs NITs. -
๐ Library Experience Highlights (๐ Library Experiences Tab)
Shows handpicked positive and negative user reviews with institution tags and styled formatting.
-
Add LIME/SHAP Explainability for BERT
Integrate model interpretation techniques to explain why a review was labeled positive/negative. -
Include More Institutions
Expand dataset to cover regional universities, IIITs, NLUs, and other public libraries for broader benchmarking. -
Review Metadata Integration
Include attributes like review date, source, device, or student/staff tag for richer context and filtering. -
Clustering or Topic Modeling
Apply LDA/BERT-topic to identify trending topics or issues discussed across institutions. -
Sentiment Timeline Analysis
Show how sentiment for a specific institution evolves over time (e.g., semester-wise or pre/post renovation). -
User Feedback Module
Allow users to correct or rate the model's prediction to improve performance and trust. -
Multilingual Support
Add language detection and support for Hindi, Tamil, etc., using multilingual BERT (e.g.,xlm-roberta-base).
Aman Srivastava [amansri345@gmail.com]