Skip to content

Conversation

@GYFX35
Copy link
Owner

@GYFX35 GYFX35 commented Sep 29, 2025

This commit introduces a new set of tools focused on protecting teenagers from online threats. The new features include detection for:

  • Cyberbullying
  • Inappropriate content
  • Privacy risks (oversharing)

A new module, teen_protection.py, has been added to house the analysis logic. The main application has been updated to include a new menu for these tools. Heuristics have been expanded with relevant keywords and weights.

Unit tests for the new functionality have been added and integrated into the existing test suite, and all tests are passing.

Summary by Sourcery

Integrate new teenager protection tools into the social media analyzer by adding a dedicated module for risk detection, expanding heuristics, updating the CLI menu, and including comprehensive unit tests.

New Features:

  • Add teen_protection module to detect cyberbullying, inappropriate content, and privacy risks
  • Integrate teenager protection option into the main CLI menu

Enhancements:

  • Expand heuristics with keywords and weights for teen risk categories
  • Adjust test runner and menu indexing to include the new teen protection tools

Tests:

  • Add unit tests for teen protection analysis covering all risk types

This commit introduces a new set of tools focused on protecting teenagers from online threats. The new features include detection for:
- Cyberbullying
- Inappropriate content
- Privacy risks (oversharing)

A new module, `teen_protection.py`, has been added to house the analysis logic. The main application has been updated to include a new menu for these tools. Heuristics have been expanded with relevant keywords and weights.

Unit tests for the new functionality have been added and integrated into the existing test suite, and all tests are passing.
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 29, 2025

Reviewer's Guide

This PR integrates a new teenager protection toolkit into the social media analyzer by introducing a dedicated teen_protection module that applies heuristic keyword detection for cyberbullying, inappropriate content, and privacy risks, updating the CLI to expose these analyses, extending heuristics with new keyword lists and weights, and adding comprehensive unit tests into the existing test suite.

Entity relationship diagram for new heuristic keyword lists and weights

erDiagram
    CYBERBULLYING_KEYWORDS {
        string keyword
    }
    INAPPROPRIATE_CONTENT_KEYWORDS {
        string keyword
    }
    PRIVACY_RISK_KEYWORDS {
        string keyword
    }
    HEURISTIC_WEIGHTS {
        string category
        float weight
    }
    CYBERBULLYING_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category CYBERBULLYING"
    INAPPROPRIATE_CONTENT_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category INAPPROPRIATE_CONTENT"
    PRIVACY_RISK_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category PRIVACY_RISK"
Loading

Class diagram for the new teen_protection module

classDiagram
    class TeenProtection {
        +analyze_text_for_teen_risks(text, analysis_type)
        +analyze_for_cyberbullying(text)
        +analyze_for_inappropriate_content(text)
        +analyze_for_privacy_risks(text)
    }
    class Heuristics {
        +CYBERBULLYING_KEYWORDS
        +INAPPROPRIATE_CONTENT_KEYWORDS
        +PRIVACY_RISK_KEYWORDS
        +HEURISTIC_WEIGHTS
    }
    TeenProtection --|> Heuristics : uses
Loading

File-Level Changes

Change Details Files
Introduce teen protection analysis module
  • Implement analyze_text_for_teen_risks function with keyword mapping and scoring
  • Provide specific wrappers for cyberbullying, inappropriate content, and privacy risks
social_media_analyzer/teen_protection.py
Integrate teen protection into CLI
  • Import teen_protection in main.py
  • Add analyze_for_teen_risks handler and new menu option
  • Update menu numbering to include teen protection
social_media_analyzer/main.py
Extend heuristics with teenager protection keywords and weights
  • Add keyword lists for cyberbullying, inappropriate content, and privacy risks
  • Define corresponding heuristic weights
social_media_analyzer/heuristics.py
Add and integrate unit tests for teenager protection
  • Create TestTeenProtection suite covering all analysis types
  • Integrate TestTeenProtection into test_runner and adjust imports
social_media_analyzer/test_teen_protection.py
social_media_analyzer/test_runner.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@GYFX35 GYFX35 merged commit 4f21e9c into main Sep 29, 2025
0 of 6 checks passed
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider refactoring analyze_for_teen_risks to separate user I/O from the core analysis logic so it can be reused and more easily tested without interactive input calls.
  • Instead of manually constructing the TestSuite in test_runner, leverage unittest discovery to automatically pick up all test modules and avoid forgetting new tests.
  • Simple substring matching for heuristics may lead to false positives/negatives—consider normalizing the text and using word-boundary or regex-based checks to improve accuracy.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider refactoring analyze_for_teen_risks to separate user I/O from the core analysis logic so it can be reused and more easily tested without interactive input calls.
- Instead of manually constructing the TestSuite in test_runner, leverage unittest discovery to automatically pick up all test modules and avoid forgetting new tests.
- Simple substring matching for heuristics may lead to false positives/negatives—consider normalizing the text and using word-boundary or regex-based checks to improve accuracy.

## Individual Comments

### Comment 1
<location> `social_media_analyzer/main.py:167` </location>
<code_context>
+        print("\n--- Analyzing for Privacy Risks ---")
+        result = teen_protection.analyze_for_privacy_risks(text_to_analyze)
+
+    print(f"Score: {result['score']} (Higher is more suspicious)")
+    if result['indicators_found']:
+        print("Indicators Found:")
</code_context>

<issue_to_address>
**issue (bug_risk):** Accessing 'score' and 'indicators_found' without error handling may cause issues if the result contains an error.

Check for an 'error' key in the result before accessing 'score' or 'indicators_found' to avoid KeyError exceptions.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/teen_protection.py:33-41` </location>
<code_context>
+    category, keywords = keyword_map[analysis_type]
+    weight = HEURISTIC_WEIGHTS.get(category.upper(), 1.0)
+
+    for keyword in keywords:
+        if keyword in text_lower:
+            message = f"Detected potential {category.replace('_', ' ').lower()} keyword: '{keyword}'"
</code_context>

<issue_to_address>
**suggestion:** Simple substring matching may lead to false positives.

Consider using regular expressions with word boundaries to avoid matching keywords within other words and reduce false positives.

```suggestion
    import re

    category, keywords = keyword_map[analysis_type]
    weight = HEURISTIC_WEIGHTS.get(category.upper(), 1.0)

    for keyword in keywords:
        # Use regex with word boundaries to avoid matching substrings within other words
        pattern = r'\b' + re.escape(keyword) + r'\b'
        if re.search(pattern, text_lower):
            message = f"Detected potential {category.replace('_', ' ').lower()} keyword: '{keyword}'"
            if message not in indicators_found:
                indicators_found.append(message)
                score += weight
```
</issue_to_address>

### Comment 3
<location> `social_media_analyzer/heuristics.py:273-278` </location>
<code_context>
     "SUSPICIOUS_URL_PATTERN": 3.0, # High weight for matching a suspicious URL pattern
     "GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
+    # Teenager Protection Weights
+    "CYBERBULLYING": 2.5,
+    "INAPPROPRIATE_CONTENT": 3.0,
+    "PRIVACY_RISK": 3.5,
 }

</code_context>

<issue_to_address>
**suggestion:** Heuristic weights for teen protection categories may need calibration.

These weights appear arbitrary; please validate them against expert input or data to ensure appropriate risk prioritization.

```suggestion
    "GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
    # Teenager Protection Weights
    # NOTE: The following weights for teen protection categories are provisional.
    # TODO: Validate these weights against expert input or empirical data to ensure appropriate risk prioritization.
    "CYBERBULLYING": 2.5,
    "INAPPROPRIATE_CONTENT": 3.0,
    "PRIVACY_RISK": 3.5,
 }
```
</issue_to_address>

### Comment 4
<location> `social_media_analyzer/test_teen_protection.py:8` </location>
<code_context>
+    analyze_for_privacy_risks
+)
+
+class TestTeenProtection(unittest.TestCase):
+
+    def test_cyberbullying(self):
</code_context>

<issue_to_address>
**suggestion (testing):** Missing tests for edge cases: partial keyword matches and case sensitivity.

Add tests for partial keyword matches and case sensitivity to ensure accurate and robust detection.
</issue_to_address>

### Comment 5
<location> `social_media_analyzer/test_teen_protection.py:10` </location>
<code_context>
+
+class TestTeenProtection(unittest.TestCase):
+
+    def test_cyberbullying(self):
+        """Test the cyberbullying detection."""
+        # Test case with bullying keywords
</code_context>

<issue_to_address>
**suggestion (testing):** No tests for invalid analysis type or error handling.

Add a test to verify that an error dictionary is returned for invalid analysis types, ensuring error handling is covered.

Suggested implementation:

```python
class TestTeenProtection(unittest.TestCase):

    def test_cyberbullying(self):
        """Test the cyberbullying detection."""
        # Test case with bullying keywords
        text1 = "You are such a loser and an idiot."
        result1 = analyze_for_cyberbullying(text1)
        self.assertGreater(result1['score'], 0)
        self.assertIn("Detected potential cyberbullying keyword: 'loser'", result1['indicators_found'])
        self.assertIn("Detected potential cyberbullying keyword: 'idiot'", result1['indicators_found'])

        # Test case with no bullying keywords
        text2 = "Have a great day!"
        result2 = analyze_for_cyberbullying(text2)
        self.assertEqual(result2['score'], 0)

    def test_invalid_analysis_type(self):
        """Test error handling for invalid analysis type."""
        from .teen_protection import analyze_text
        text = "This is a test message."
        result = analyze_text(text, analysis_type="unknown_type")
        self.assertIsInstance(result, dict)
        self.assertIn("error", result)
        self.assertIn("Invalid analysis type", result["error"])

```

If the dispatcher function is not named `analyze_text`, or its signature differs, you should adjust the import and call accordingly. Also, ensure that the error dictionary returned by the function contains an "error" key with a message about the invalid type.
</issue_to_address>

### Comment 6
<location> `social_media_analyzer/test_teen_protection.py:25` </location>
<code_context>
+        self.assertEqual(result2['score'], 0)
+        self.assertEqual(len(result2['indicators_found']), 0)
+
+    def test_inappropriate_content(self):
+        """Test the inappropriate content detection."""
+        # Test case with inappropriate keywords
</code_context>

<issue_to_address>
**suggestion (testing):** Missing tests for multiple keyword occurrences in a single text.

Add a test where a keyword appears multiple times to ensure correct score accumulation and indicator handling.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 273 to 278
"GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
# Teenager Protection Weights
"CYBERBULLYING": 2.5,
"INAPPROPRIATE_CONTENT": 3.0,
"PRIVACY_RISK": 3.5,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Heuristic weights for teen protection categories may need calibration.

These weights appear arbitrary; please validate them against expert input or data to ensure appropriate risk prioritization.

Suggested change
"GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
# Teenager Protection Weights
"CYBERBULLYING": 2.5,
"INAPPROPRIATE_CONTENT": 3.0,
"PRIVACY_RISK": 3.5,
}
"GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
# Teenager Protection Weights
# NOTE: The following weights for teen protection categories are provisional.
# TODO: Validate these weights against expert input or empirical data to ensure appropriate risk prioritization.
"CYBERBULLYING": 2.5,
"INAPPROPRIATE_CONTENT": 3.0,
"PRIVACY_RISK": 3.5,
}

self.assertEqual(result2['score'], 0)
self.assertEqual(len(result2['indicators_found']), 0)

def test_inappropriate_content(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing tests for multiple keyword occurrences in a single text.

Add a test where a keyword appears multiple times to ensure correct score accumulation and indicator handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants