feat: Add NLP techniques and GUI to Social Media Analyzer #22

GYFX35 · 2025-09-23T10:15:42Z

This commit introduces two major enhancements to the Social Media Analyzer:

Integration of NLP techniques for more sophisticated scam and fake news detection.
A new web-based GUI built with React to replace the command-line interface.

Backend changes:

Added nltk and textblob for NLP tasks.
Integrated sentiment analysis into the scam detector to identify messages with strong negative sentiment.
Enhanced the fake news detector with Named Entity Recognition (NER) to identify organizations and people mentioned in articles.
Created a Flask API to expose the analyzer's functionality to the frontend.

Frontend changes:

Created a new React application with components for:
- Scam Analyzer
- Fake News Analyzer
The GUI allows users to analyze text and URLs in a user-friendly interface.

Summary by Sourcery

Add NLP-driven fraud and misinformation detection and provide a React-based web interface powered by a Flask API.

New Features:

Integrate sentiment analysis into the scam detector to flag strong negative sentiment.
Incorporate Named Entity Recognition in the fake news detector to extract organizations and persons from content.
Expose separate /analyze/scam and /analyze/fake-news endpoints via a Flask API.
Introduce a React frontend with ScamAnalyzer and FakeNewsAnalyzer components for user-friendly analysis.

Enhancements:

Load Google API key from environment variables and improve input validation and error messages.
Update server launch configuration to enable debug mode.

Build:

Add nltk and textblob to project dependencies.

Tests:

Add unit tests for sentiment analysis and keyword matching in the scam detector.

This commit introduces two major enhancements to the Social Media Analyzer: - Integration of NLP techniques for more sophisticated scam and fake news detection. - A new web-based GUI built with React to replace the command-line interface. Backend changes: - Added `nltk` and `textblob` for NLP tasks. - Integrated sentiment analysis into the scam detector to identify messages with strong negative sentiment. - Enhanced the fake news detector with Named Entity Recognition (NER) to identify organizations and people mentioned in articles. - Created a Flask API to expose the analyzer's functionality to the frontend. Frontend changes: - Created a new React application with components for: - Scam Analyzer - Fake News Analyzer - The GUI allows users to analyze text and URLs in a user-friendly interface.

sourcery-ai · 2025-09-23T10:15:52Z

Reviewer's Guide

This pull request augments the Social Media Analyzer by transforming the CLI into a Flask-based API, integrating NLP techniques in both scam and fake-news detectors, and layering a React GUI on top for interactive analysis.

Class diagram for updated scam and fake news detectors

classDiagram
    class ScamDetector {
        +analyze_text_for_scams(text_content, platform, api_key)
        -Sentiment Analysis (TextBlob)
        -Keyword-based checks
        -Regex-based checks
        -Financial Identifiers
        -Phone Numbers
        score: float
        indicators_found: list
        urls_analyzed_details: list
    }
    class FakeNewsDetector {
        +analyze_url_for_fake_news(url)
        -Named Entity Recognition (NLTK)
        -Fake news domain check
        -Clickbait pattern check
        score: float
        indicators_found: list
        named_entities: dict
    }
    ScamDetector <.. FlaskAPI
    FakeNewsDetector <.. FlaskAPI

File-Level Changes

Change	Details	Files
Expose analysis via Flask API with distinct endpoints	Replaced root route and unified analyze logic Added get_api_key helper for environment configuration Introduced /analyze/scam and /analyze/fake-news POST endpoints	`text_message_analyzer/app.py`
Incorporate sentiment analysis into scam detection	Inserted TextBlob polarity check to flag strong negative sentiment Updated heuristic ordering and scoring weights Added nltk and textblob to requirements	`social_media_analyzer/scam_detector.py` `social_media_analyzer/requirements.txt`
Augment fake-news detection with Named Entity Recognition	Initialized named_entities dict for organizations and persons Tokenized, POS-tagged, and NE-chunked content to extract entities Expanded API response to include named_entities and documented NLTK data prerequisites	`social_media_analyzer/fake_news_detector.py`
Build a React-based GUI for analysis workflows	Revamped App.jsx to include navigation and stateful view switching Created ScamAnalyzer component to POST text and render results Created FakeNewsAnalyzer component to POST URL and display indicators and entities	`src/App.jsx` `src/ScamAnalyzer.jsx` `src/FakeNewsAnalyzer.jsx`
Add unit tests for scam detection enhancements	Implemented tests for sentiment-based flagging Covered keyword matching scenarios with urgency and stemming cases	`social_media_analyzer/test_scam_detector.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

guardrails · 2025-09-23T10:16:03Z

⚠️ We detected 2 security issues in this pull request:

Insecure Configuration (1)

Severity

Details

Docs

Critical

Title: Debugging Enabled (Flask)

codespaces-react/text_message_analyzer/app.py

Line 36 in dae6260

app.run(debug=True)

📚

More info on how to fix Insecure Configuration in Python.

Vulnerable Libraries (1)

Severity	Details
N/A	pkg:pypi/nltk@0.0.0 upgrade to: 1405aad979c6b8080dbbc8e0858f89b2e3690341,3.6.5

More info on how to fix Vulnerable Libraries in Python.

👉 Go to the dashboard for detailed results.

📥 Happy? Share your feedback with us.

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables. (link)

General comments:

You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call /analyze/* endpoints without cross‐origin errors.
Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call `/analyze/*` endpoints without cross‐origin errors.
- Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.

## Individual Comments

### Comment 1
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
-if __name__ == "__main__":
-    app.run(host="0.0.0.0", port=8080)
+if __name__ == '__main__':
+    app.run(debug=True)
</code_context>

<issue_to_address>
**🚨 issue (security):** Enabling debug mode in production can expose sensitive information.

Ensure debug mode is disabled in production, ideally by controlling it with an environment variable.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/fake_news_detector.py:29-32` </location>
<code_context>

     score = 0.0
     indicators_found = []
+    named_entities = {
+        "organizations": [],
+        "persons": [],
+    }

     # 1. Check against known fake news domains
</code_context>

<issue_to_address>
**issue:** Named entity extraction does not handle cases where NLTK data is missing.

Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.
</issue_to_address>

### Comment 3
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
    app.run(debug=True)
</code_context>

<issue_to_address>
**security (python.flask.security.audit.debug-enabled):** Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables.

*Source: opengrep*
</issue_to_address>

### Comment 4
<location> `social_media_analyzer/fake_news_detector.py:23` </location>
<code_context>
def analyze_url_for_fake_news(url):
    """
    Analyzes a URL for indicators of fake news.

    NOTE: This function requires the following NLTK data to be downloaded:
    - 'punkt'
    - 'averaged_perceptron_tagger'
    - 'maxent_ne_chunker'
    - 'words'
    """
    if not url.startswith(('http://', 'https://')):
        url = 'http://' + url

    domain = urlparse(url).netloc.lower()

    score = 0.0
    indicators_found = []
    named_entities = {
        "organizations": [],
        "persons": [],
    }

    # 1. Check against known fake news domains
    if domain in FAKE_NEWS_DOMAINS:
        score += HEURISTIC_WEIGHTS.get("KNOWN_FAKE_NEWS_DOMAIN", 5.0)
        indicators_found.append(f"Domain '{domain}' is a known source of fake news.")
        return {
            "url": url,
            "score": round(score, 2),
            "indicators_found": indicators_found
        }

    # 2. Fetch and analyze content
    try:
        headers = {'User-Agent': 'Mozilla/5.0'}
        request = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(request, timeout=10) as response:
            if response.status == 200:
                html_content = response.read().decode('utf-8', errors='ignore')
                text_content = re.sub(r'<[^>]+>', '', html_content).lower()

                # 3. Analyze text for sensationalist keywords
                for keyword in SENSATIONALIST_KEYWORDS:
                    if keyword in text_content:
                        score += HEURISTIC_WEIGHTS.get("SENSATIONALIST_KEYWORD", 1.0)
                        indicators_found.append(f"Found sensationalist keyword: '{keyword}'")

                # 4. Analyze text for clickbait patterns
                for pattern in CLICKBAIT_PATTERNS:
                    if re.search(pattern, text_content, re.IGNORECASE):
                        score += HEURISTIC_WEIGHTS.get("CLICKBAIT_PATTERN", 1.5)
                        indicators_found.append(f"Found clickbait pattern: '{pattern}'")

                # 5. Named Entity Recognition
                tokens = nltk.word_tokenize(text_content)
                tagged = nltk.pos_tag(tokens)
                entities = nltk.ne_chunk(tagged)

                for entity in entities:
                    if isinstance(entity, nltk.Tree):
                        entity_text = " ".join([word for word, tag in entity.leaves()])
                        if entity.label() == 'ORGANIZATION':
                            if entity_text not in named_entities["organizations"]:
                                named_entities["organizations"].append(entity_text)
                        elif entity.label() == 'PERSON':
                            if entity_text not in named_entities["persons"]:
                                named_entities["persons"].append(entity_text)

            else:
                return {"error": f"Failed to fetch URL: HTTP status code {response.status}"}
    except Exception as e:
        return {"error": f"An error occurred: {e}"}

    return {
        "url": url,
        "score": round(score, 2),
        "indicators_found": indicators_found,
        "named_entities": named_entities
    }

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>

### Comment 5
<location> `social_media_analyzer/test_scam_detector.py:10-15` </location>
<code_context>
    def test_sentiment_analysis(self):
        # Test case for negative sentiment
        text_negative = "This is a terrible, awful, no good, very bad message."
        result_negative = analyze_text_for_scams(text_negative)
        self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]])

        # Test case for positive sentiment
        text_positive = "This is a wonderful, amazing, great message."
        result_positive = analyze_text_for_scams(text_positive)
        self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]])

</code_context>

<issue_to_address>
**issue (code-quality):** Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))

<br/><details><summary>Explanation</summary>Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>

### Comment 6
<location> `social_media_analyzer/test_scam_detector.py:19-26` </location>
<code_context>
    def test_keyword_matching(self):
        # Test case for urgency keyword
        text_urgency = "URGENT: Your account has been compromised."
        result_urgency = analyze_text_for_scams(text_urgency)
        self.assertIn("Presence of 'Urgency' keyword: 'urgent'", [indicator for indicator in result_urgency["indicators_found"]])

        # Test case for stemming
        text_stemming = "I need you to verify your account immediately."
        result_stemming = analyze_text_for_scams(text_stemming)
        self.assertIn("Presence of 'Sensitive Info' keyword: 'verify your account'", [indicator for indicator in result_stemming["indicators_found"]])

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Extract duplicate code into method ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))

<br/><details><summary>Explanation</summary>
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-09-23T10:16:44Z

social_media_analyzer/fake_news_detector.py

+    named_entities = {
+        "organizations": [],
+        "persons": [],
+    }


issue: Named entity extraction does not handle cases where NLTK data is missing.

Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.

sourcery-ai · 2025-09-23T10:16:44Z

social_media_analyzer/test_scam_detector.py

+        self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]])
+
+        # Test case for positive sentiment
+        text_positive = "This is a wonderful, amazing, great message."
+        result_positive = analyze_text_for_scams(text_positive)
+        self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]])


issue (code-quality): Replace identity comprehension with call to collection constructor [×2] (identity-comprehension)

Explanation
Convert list/set/tuple comprehensions that do not change the input elements into.

Before

# List comprehensions [item for item in coll] [item for item in friends.names()] # Dict comprehensions {k: v for k, v in coll} {k: v for k, v in coll.items()} # Only if we know coll is a `dict` # Unneeded call to `.items()` dict(coll.items()) # Only if we know coll is a `dict` # Set comprehensions {item for item in coll}

After

# List comprehensions list(iter(coll)) list(iter(friends.names())) # Dict comprehensions dict(coll) dict(coll) # Unneeded call to `.items()` dict(coll) # Set comprehensions set(coll)

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

Before

# List comprehensions [item for item in coll] [item for item in friends.names()] # Dict comprehensions {k: v for k, v in coll} {k: v for k, v in coll.items()} # Only if we know coll is a `dict` # Unneeded call to `.items()` dict(coll.items()) # Only if we know coll is a `dict` # Set comprehensions {item for item in coll}

After

# List comprehensions list(iter(coll)) list(iter(friends.names())) # Dict comprehensions dict(coll) dict(coll) # Unneeded call to `.items()` dict(coll) # Set comprehensions set(coll)

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.

GYFX35 merged commit 7b864ba into main Sep 23, 2025
1 of 7 checks passed

sourcery-ai bot requested changes Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add NLP techniques and GUI to Social Media Analyzer #22

feat: Add NLP techniques and GUI to Social Media Analyzer #22

Uh oh!

GYFX35 commented Sep 23, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Sep 23, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

guardrails bot commented Sep 23, 2025

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Sep 23, 2025

Uh oh!

sourcery-ai bot Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: Add NLP techniques and GUI to Social Media Analyzer #22

feat: Add NLP techniques and GUI to Social Media Analyzer #22

Uh oh!

Conversation

GYFX35 commented Sep 23, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for updated scam and fake news detectors

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

guardrails bot commented Sep 23, 2025

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 23, 2025

Choose a reason for hiding this comment

Before

After

Before

After

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GYFX35 commented Sep 23, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Sep 23, 2025 •

edited

Loading