Skip to content

Conversation

@GYFX35
Copy link
Owner

@GYFX35 GYFX35 commented Sep 23, 2025

This commit introduces two major enhancements to the Social Media Analyzer:

  • Integration of NLP techniques for more sophisticated scam and fake news detection.
  • A new web-based GUI built with React to replace the command-line interface.

Backend changes:

  • Added nltk and textblob for NLP tasks.
  • Integrated sentiment analysis into the scam detector to identify messages with strong negative sentiment.
  • Enhanced the fake news detector with Named Entity Recognition (NER) to identify organizations and people mentioned in articles.
  • Created a Flask API to expose the analyzer's functionality to the frontend.

Frontend changes:

  • Created a new React application with components for:
    • Scam Analyzer
    • Fake News Analyzer
  • The GUI allows users to analyze text and URLs in a user-friendly interface.

Summary by Sourcery

Add NLP-driven fraud and misinformation detection and provide a React-based web interface powered by a Flask API.

New Features:

  • Integrate sentiment analysis into the scam detector to flag strong negative sentiment.
  • Incorporate Named Entity Recognition in the fake news detector to extract organizations and persons from content.
  • Expose separate /analyze/scam and /analyze/fake-news endpoints via a Flask API.
  • Introduce a React frontend with ScamAnalyzer and FakeNewsAnalyzer components for user-friendly analysis.

Enhancements:

  • Load Google API key from environment variables and improve input validation and error messages.
  • Update server launch configuration to enable debug mode.

Build:

  • Add nltk and textblob to project dependencies.

Tests:

  • Add unit tests for sentiment analysis and keyword matching in the scam detector.

This commit introduces two major enhancements to the Social Media Analyzer:
- Integration of NLP techniques for more sophisticated scam and fake news detection.
- A new web-based GUI built with React to replace the command-line interface.

Backend changes:
- Added `nltk` and `textblob` for NLP tasks.
- Integrated sentiment analysis into the scam detector to identify messages with strong negative sentiment.
- Enhanced the fake news detector with Named Entity Recognition (NER) to identify organizations and people mentioned in articles.
- Created a Flask API to expose the analyzer's functionality to the frontend.

Frontend changes:
- Created a new React application with components for:
  - Scam Analyzer
  - Fake News Analyzer
- The GUI allows users to analyze text and URLs in a user-friendly interface.
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 23, 2025

Reviewer's Guide

This pull request augments the Social Media Analyzer by transforming the CLI into a Flask-based API, integrating NLP techniques in both scam and fake-news detectors, and layering a React GUI on top for interactive analysis.

Class diagram for updated scam and fake news detectors

classDiagram
    class ScamDetector {
        +analyze_text_for_scams(text_content, platform, api_key)
        -Sentiment Analysis (TextBlob)
        -Keyword-based checks
        -Regex-based checks
        -Financial Identifiers
        -Phone Numbers
        score: float
        indicators_found: list
        urls_analyzed_details: list
    }
    class FakeNewsDetector {
        +analyze_url_for_fake_news(url)
        -Named Entity Recognition (NLTK)
        -Fake news domain check
        -Clickbait pattern check
        score: float
        indicators_found: list
        named_entities: dict
    }
    ScamDetector <.. FlaskAPI
    FakeNewsDetector <.. FlaskAPI
Loading

File-Level Changes

Change Details Files
Expose analysis via Flask API with distinct endpoints
  • Replaced root route and unified analyze logic
  • Added get_api_key helper for environment configuration
  • Introduced /analyze/scam and /analyze/fake-news POST endpoints
text_message_analyzer/app.py
Incorporate sentiment analysis into scam detection
  • Inserted TextBlob polarity check to flag strong negative sentiment
  • Updated heuristic ordering and scoring weights
  • Added nltk and textblob to requirements
social_media_analyzer/scam_detector.py
social_media_analyzer/requirements.txt
Augment fake-news detection with Named Entity Recognition
  • Initialized named_entities dict for organizations and persons
  • Tokenized, POS-tagged, and NE-chunked content to extract entities
  • Expanded API response to include named_entities and documented NLTK data prerequisites
social_media_analyzer/fake_news_detector.py
Build a React-based GUI for analysis workflows
  • Revamped App.jsx to include navigation and stateful view switching
  • Created ScamAnalyzer component to POST text and render results
  • Created FakeNewsAnalyzer component to POST URL and display indicators and entities
src/App.jsx
src/ScamAnalyzer.jsx
src/FakeNewsAnalyzer.jsx
Add unit tests for scam detection enhancements
  • Implemented tests for sentiment-based flagging
  • Covered keyword matching scenarios with urgency and stemming cases
social_media_analyzer/test_scam_detector.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@guardrails
Copy link

guardrails bot commented Sep 23, 2025

⚠️ We detected 2 security issues in this pull request:

Insecure Configuration (1)
Severity Details Docs
Critical Title: Debugging Enabled (Flask)
📚

More info on how to fix Insecure Configuration in Python.


Vulnerable Libraries (1)
Severity Details
N/A pkg:pypi/nltk@0.0.0 upgrade to: 1405aad979c6b8080dbbc8e0858f89b2e3690341,3.6.5

More info on how to fix Vulnerable Libraries in Python.


👉 Go to the dashboard for detailed results.

📥 Happy? Share your feedback with us.

@GYFX35 GYFX35 merged commit 7b864ba into main Sep 23, 2025
1 of 7 checks passed
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables. (link)

General comments:

  • You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call /analyze/* endpoints without cross‐origin errors.
  • Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- You should configure CORS (e.g., via flask-cors) on the Flask API so the React frontend can reliably call `/analyze/*` endpoints without cross‐origin errors.
- Consider handling missing NLTK data and the GOOGLE_API_KEY at startup (or auto–download required corpora) to avoid runtime failures when the environment isn’t preconfigured.

## Individual Comments

### Comment 1
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
-if __name__ == "__main__":
-    app.run(host="0.0.0.0", port=8080)
+if __name__ == '__main__':
+    app.run(debug=True)
</code_context>

<issue_to_address>
**🚨 issue (security):** Enabling debug mode in production can expose sensitive information.

Ensure debug mode is disabled in production, ideally by controlling it with an environment variable.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/fake_news_detector.py:29-32` </location>
<code_context>

     score = 0.0
     indicators_found = []
+    named_entities = {
+        "organizations": [],
+        "persons": [],
+    }

     # 1. Check against known fake news domains
</code_context>

<issue_to_address>
**issue:** Named entity extraction does not handle cases where NLTK data is missing.

Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.
</issue_to_address>

### Comment 3
<location> `text_message_analyzer/app.py:36` </location>
<code_context>
    app.run(debug=True)
</code_context>

<issue_to_address>
**security (python.flask.security.audit.debug-enabled):** Detected Flask app with debug=True. Do not deploy to production with this flag enabled as it will leak sensitive information. Instead, consider using Flask configuration variables or setting 'debug' using system environment variables.

*Source: opengrep*
</issue_to_address>

### Comment 4
<location> `social_media_analyzer/fake_news_detector.py:23` </location>
<code_context>
def analyze_url_for_fake_news(url):
    """
    Analyzes a URL for indicators of fake news.

    NOTE: This function requires the following NLTK data to be downloaded:
    - 'punkt'
    - 'averaged_perceptron_tagger'
    - 'maxent_ne_chunker'
    - 'words'
    """
    if not url.startswith(('http://', 'https://')):
        url = 'http://' + url

    domain = urlparse(url).netloc.lower()

    score = 0.0
    indicators_found = []
    named_entities = {
        "organizations": [],
        "persons": [],
    }

    # 1. Check against known fake news domains
    if domain in FAKE_NEWS_DOMAINS:
        score += HEURISTIC_WEIGHTS.get("KNOWN_FAKE_NEWS_DOMAIN", 5.0)
        indicators_found.append(f"Domain '{domain}' is a known source of fake news.")
        return {
            "url": url,
            "score": round(score, 2),
            "indicators_found": indicators_found
        }

    # 2. Fetch and analyze content
    try:
        headers = {'User-Agent': 'Mozilla/5.0'}
        request = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(request, timeout=10) as response:
            if response.status == 200:
                html_content = response.read().decode('utf-8', errors='ignore')
                text_content = re.sub(r'<[^>]+>', '', html_content).lower()

                # 3. Analyze text for sensationalist keywords
                for keyword in SENSATIONALIST_KEYWORDS:
                    if keyword in text_content:
                        score += HEURISTIC_WEIGHTS.get("SENSATIONALIST_KEYWORD", 1.0)
                        indicators_found.append(f"Found sensationalist keyword: '{keyword}'")

                # 4. Analyze text for clickbait patterns
                for pattern in CLICKBAIT_PATTERNS:
                    if re.search(pattern, text_content, re.IGNORECASE):
                        score += HEURISTIC_WEIGHTS.get("CLICKBAIT_PATTERN", 1.5)
                        indicators_found.append(f"Found clickbait pattern: '{pattern}'")

                # 5. Named Entity Recognition
                tokens = nltk.word_tokenize(text_content)
                tagged = nltk.pos_tag(tokens)
                entities = nltk.ne_chunk(tagged)

                for entity in entities:
                    if isinstance(entity, nltk.Tree):
                        entity_text = " ".join([word for word, tag in entity.leaves()])
                        if entity.label() == 'ORGANIZATION':
                            if entity_text not in named_entities["organizations"]:
                                named_entities["organizations"].append(entity_text)
                        elif entity.label() == 'PERSON':
                            if entity_text not in named_entities["persons"]:
                                named_entities["persons"].append(entity_text)

            else:
                return {"error": f"Failed to fetch URL: HTTP status code {response.status}"}
    except Exception as e:
        return {"error": f"An error occurred: {e}"}

    return {
        "url": url,
        "score": round(score, 2),
        "indicators_found": indicators_found,
        "named_entities": named_entities
    }

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>

### Comment 5
<location> `social_media_analyzer/test_scam_detector.py:10-15` </location>
<code_context>
    def test_sentiment_analysis(self):
        # Test case for negative sentiment
        text_negative = "This is a terrible, awful, no good, very bad message."
        result_negative = analyze_text_for_scams(text_negative)
        self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]])

        # Test case for positive sentiment
        text_positive = "This is a wonderful, amazing, great message."
        result_positive = analyze_text_for_scams(text_positive)
        self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]])

</code_context>

<issue_to_address>
**issue (code-quality):** Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))

<br/><details><summary>Explanation</summary>Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>

### Comment 6
<location> `social_media_analyzer/test_scam_detector.py:19-26` </location>
<code_context>
    def test_keyword_matching(self):
        # Test case for urgency keyword
        text_urgency = "URGENT: Your account has been compromised."
        result_urgency = analyze_text_for_scams(text_urgency)
        self.assertIn("Presence of 'Urgency' keyword: 'urgent'", [indicator for indicator in result_urgency["indicators_found"]])

        # Test case for stemming
        text_stemming = "I need you to verify your account immediately."
        result_stemming = analyze_text_for_scams(text_stemming)
        self.assertIn("Presence of 'Sensitive Info' keyword: 'verify your account'", [indicator for indicator in result_stemming["indicators_found"]])

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Extract duplicate code into method ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Replace identity comprehension with call to collection constructor [×2] ([`identity-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/identity-comprehension/))

<br/><details><summary>Explanation</summary>
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

#### Before

```python
# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}
```

#### After

```python
# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)
```

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +29 to +32
named_entities = {
"organizations": [],
"persons": [],
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Named entity extraction does not handle cases where NLTK data is missing.

Catch NLTK exceptions and provide a user-friendly error message or guidance on downloading missing data.

Comment on lines +10 to +15
self.assertIn("Strong negative sentiment detected in text.", [indicator for indicator in result_negative["indicators_found"]])

# Test case for positive sentiment
text_positive = "This is a wonderful, amazing, great message."
result_positive = analyze_text_for_scams(text_positive)
self.assertNotIn("Strong negative sentiment detected in text.", [indicator for indicator in result_positive["indicators_found"]])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Replace identity comprehension with call to collection constructor [×2] (identity-comprehension)


ExplanationConvert list/set/tuple comprehensions that do not change the input elements into.

Before

# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}

After

# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.
Convert list/set/tuple comprehensions that do not change the input elements into.

Before

# List comprehensions
[item for item in coll]
[item for item in friends.names()]

# Dict comprehensions
{k: v for k, v in coll}
{k: v for k, v in coll.items()}  # Only if we know coll is a `dict`

# Unneeded call to `.items()`
dict(coll.items())  # Only if we know coll is a `dict`

# Set comprehensions
{item for item in coll}

After

# List comprehensions
list(iter(coll))
list(iter(friends.names()))

# Dict comprehensions
dict(coll)
dict(coll)

# Unneeded call to `.items()`
dict(coll)

# Set comprehensions
set(coll)

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants