-
-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add NLP techniques and GUI to Social Media Analyzer #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces two major enhancements to the Social Media Analyzer: - Integration of NLP techniques for more sophisticated scam and fake news detection. - A new web-based GUI built with React to replace the command-line interface. Backend changes: - Added `nltk` and `textblob` for NLP tasks. - Integrated sentiment analysis into the scam detector to identify messages with strong negative sentiment. - Enhanced the fake news detector with Named Entity Recognition (NER) to identify organizations and people mentioned in articles. - Created a Flask API to expose the analyzer's functionality to the frontend. Frontend changes: - Created a new React application with components for: - Scam Analyzer - Fake News Analyzer - The GUI allows users to analyze text and URLs in a user-friendly interface.
Reviewer's GuideThis pull request augments the Social Media Analyzer by transforming the CLI into a Flask-based API, integrating NLP techniques in both scam and fake-news detectors, and layering a React GUI on top for interactive analysis. Class diagram for updated scam and fake news detectorsclassDiagram
class ScamDetector {
+analyze_text_for_scams(text_content, platform, api_key)
-Sentiment Analysis (TextBlob)
-Keyword-based checks
-Regex-based checks
-Financial Identifiers
-Phone Numbers
score: float
indicators_found: list
urls_analyzed_details: list
}
class FakeNewsDetector {
+analyze_url_for_fake_news(url)
-Named Entity Recognition (NLTK)
-Fake news domain check
-Clickbait pattern check
score: float
indicators_found: list
named_entities: dict
}
ScamDetector <.. FlaskAPI
FakeNewsDetector <.. FlaskAPI
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Insecure Configuration (1)
More info on how to fix Insecure Configuration in Python. Vulnerable Libraries (1)
More info on how to fix Vulnerable Libraries in Python. 👉 Go to the dashboard for detailed results. 📥 Happy? Share your feedback with us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables. (link)
General comments:
- You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call
/analyze/*endpoints without cross‐origin errors. - Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call `/analyze/*` endpoints without cross‐origin errors.
- Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.
## Individual Comments
### Comment 1
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
-if __name__ == "__main__":
- app.run(host="0.0.0.0", port=8080)
+if __name__ == '__main__':
+ app.run(debug=True)
</code_context>
<issue_to_address>
**🚨 issue (security):** Enabling debug mode in production can expose sensitive information.
Ensure debug mode is disabled in production, ideally by controlling it with an environment variable.
</issue_to_address>
### Comment 2
<location> `social_media_analyzer/fake_news_detector.py:29-32` </location>
<code_context>
score = 0.0
indicators_found = []
+ named_entities = {
+ "organizations": [],
+ "persons": [],
+ }
# 1. Check against known fake news domains
</code_context>
<issue_to_address>
**issue:** Named entity extraction does not handle cases where NLTK data is missing.
Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.
</issue_to_address>
### Comment 3
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
app.run(debug=True)
</code_context>
<issue_to_address>
**security (python.flask.security.audit.debug-enabled):** Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables.
*Source: opengrep*
</issue_to_address>
### Comment 4
<location> `social_media_analyzer/fake_news_detector.py:23` </location>
<code_context>
def analyze_url_for_fake_news(url):
"""
Analyzes a URL for indicators of fake news.
NOTE: This function requires the following NLTK data to be downloaded:
- 'punkt'
- 'averaged_perceptron_tagger'
- 'maxent_ne_chunker'
- 'words'
"""
if not url.startswith(('http://', 'https://')):
url = 'http://' + url
domain = urlparse(url).netloc.lower()
score = 0.0
indicators_found = []
named_entities = {
"organizations": [],
"persons": [],
}
# 1. Check against known fake news domains
if domain in FAKE_NEWS_DOMAINS:
score += HEURISTIC_WEIGHTS.get("KNOWN_FAKE_NEWS_DOMAIN", 5.0)
indicators_found.append(f"Domain '{domain}' is a known source of fake news.")
return {
"url": url,
"score": round(score, 2),
"indicators_found": indicators_found
}
# 2. Fetch and analyze content
try:
headers = {'User-Agent': 'Mozilla/5.0'}
request = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(request, timeout=10) as response:
if response.status == 200:
html_content = response.read().decode('utf-8', errors='ignore')
text_content = re.sub(r'<[^>]+>', '', html_content).lower()
# 3. Analyze text for sensationalist keywords
for keyword in SENSATIONALIST_KEYWORDS:
if keyword in text_content:
score += HEURISTIC_WEIGHTS.get("SENSATIONALIST_KEYWORD", 1.0)
indicators_found.append(f"Found sensationalist keyword: '{keyword}'")
# 4. Analyze text for clickbait patterns
for pattern in CLICKBAIT_PATTERNS:
if re.search(pattern, text_content, re.IGNORECASE):
score += HEURISTIC_WEIGHTS.get("CLICKBAIT_PATTERN", 1.5)
indicators_found.append(f"Found clickbait pattern: '{pattern}'")
# 5. Named Entity Recognition
tokens = nltk.word_tokenize(text_content)
tagged = nltk.pos_tag(tokens)
entities = nltk.ne_chunk(tagged)
for entity in entities:
if isinstance(entity, nltk.Tree):
entity_text = " ".join([word for word, tag in entity.leaves()])
if entity.label() == 'ORGANIZATION':
if entity_text not in named_entities["organizations"]:
named_entities["organizations"].append(entity_text)
elif entity.label() == 'PERSON':
if entity_text not in named_entities["persons"]:
named_entities["persons"].append(entity_text)
else:
return {"error": f"Failed to fetch URL: HTTP status code {response.status}"}
except Exception as e:
return {"error": f"An error occurred: {e}"}
return {
"url": url,
"score": round(score, 2),
"indicators_found": indicators_found,
"named_entities": named_entities
}
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>
### Comment 5
<location> `social_media_analyzer/test_scam_detector.py:10-15` </location>
<code_context>
def test_sentiment_analysis(self):
# Test case for negative sentiment
text_negative = "This is a terrible, awful, no good, very bad message."
result_negative = analyze_text_for_scams(text_negative)
self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]])
# Test case for positive sentiment
text_positive = "This is a wonderful, amazing, great message."
result_positive = analyze_text_for_scams(text_positive)
self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]])
</code_context>
<issue_to_address>
**issue (code-quality):** Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))
<br/><details><summary>Explanation</summary>Convert list/set/tuple comprehensions that do not change the input elements into.
#### Before
```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}
```
#### After
```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)
```
All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.
#### Before
```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}
```
#### After
```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)
```
All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>
### Comment 6
<location> `social_media_analyzer/test_scam_detector.py:19-26` </location>
<code_context>
def test_keyword_matching(self):
# Test case for urgency keyword
text_urgency = "URGENT: Your account has been compromised."
result_urgency = analyze_text_for_scams(text_urgency)
self.assertIn("Presence of 'Urgency' keyword: 'urgent'", [indicator for indicator in result_urgency["indicators_found"]])
# Test case for stemming
text_stemming = "I need you to verify your account immediately."
result_stemming = analyze_text_for_scams(text_stemming)
self.assertIn("Presence of 'Sensitive Info' keyword: 'verify your account'", [indicator for indicator in result_stemming["indicators_found"]])
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Extract duplicate code into method ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))
<br/><details><summary>Explanation</summary>
Convert list/set/tuple comprehensions that do not change the input elements into.
#### Before
```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}
```
#### After
```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)
```
All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.
#### Before
```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}
```
#### After
```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)
```
All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| named_entities = { | ||
| "organizations": [], | ||
| "persons": [], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Named entity extraction does not handle cases where NLTK data is missing.
Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.
| self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]]) | ||
|
|
||
| # Test case for positive sentiment | ||
| text_positive = "This is a wonderful, amazing, great message." | ||
| result_positive = analyze_text_for_scams(text_positive) | ||
| self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Replace identity comprehension with call to collection constructor [×2] (identity-comprehension)
Explanation
Convert list/set/tuple comprehensions that do not change the input elements into.Before
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}After
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.
Before
# List comprehensions
[item for item in coll]
[item for item in friends.names()]
# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()} # Only if we know coll is a `dict`
# Unneeded call to `.items()`
dict(coll.items()) # Only if we know coll is a `dict`
# Set comprehensions
{item for item in coll}After
# List comprehensions
list(iter(coll))
list(iter(friends.names()))
# Dict comprehensions
dict(coll)
dict(coll)
# Unneeded call to `.items()`
dict(coll)
# Set comprehensions
set(coll)All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
This commit introduces two major enhancements to the Social Media Analyzer:
Backend changes:
nltkandtextblobfor NLP tasks.Frontend changes:
Summary by Sourcery
Add NLP-driven fraud and misinformation detection and provide a React-based web interface powered by a Flask API.
New Features:
/analyze/scamand/analyze/fake-newsendpoints via a Flask API.Enhancements:
Build:
Tests: