-
-
Notifications
You must be signed in to change notification settings - Fork 1
Integrate teenager protection tools to social media analyzer. #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces a new set of tools focused on protecting teenagers from online threats. The new features include detection for: - Cyberbullying - Inappropriate content - Privacy risks (oversharing) A new module, `teen_protection.py`, has been added to house the analysis logic. The main application has been updated to include a new menu for these tools. Heuristics have been expanded with relevant keywords and weights. Unit tests for the new functionality have been added and integrated into the existing test suite, and all tests are passing.
Reviewer's GuideThis PR integrates a new teenager protection toolkit into the social media analyzer by introducing a dedicated teen_protection module that applies heuristic keyword detection for cyberbullying, inappropriate content, and privacy risks, updating the CLI to expose these analyses, extending heuristics with new keyword lists and weights, and adding comprehensive unit tests into the existing test suite. Entity relationship diagram for new heuristic keyword lists and weightserDiagram
CYBERBULLYING_KEYWORDS {
string keyword
}
INAPPROPRIATE_CONTENT_KEYWORDS {
string keyword
}
PRIVACY_RISK_KEYWORDS {
string keyword
}
HEURISTIC_WEIGHTS {
string category
float weight
}
CYBERBULLYING_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category CYBERBULLYING"
INAPPROPRIATE_CONTENT_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category INAPPROPRIATE_CONTENT"
PRIVACY_RISK_KEYWORDS ||--o| HEURISTIC_WEIGHTS : "uses category PRIVACY_RISK"
Class diagram for the new teen_protection moduleclassDiagram
class TeenProtection {
+analyze_text_for_teen_risks(text, analysis_type)
+analyze_for_cyberbullying(text)
+analyze_for_inappropriate_content(text)
+analyze_for_privacy_risks(text)
}
class Heuristics {
+CYBERBULLYING_KEYWORDS
+INAPPROPRIATE_CONTENT_KEYWORDS
+PRIVACY_RISK_KEYWORDS
+HEURISTIC_WEIGHTS
}
TeenProtection --|> Heuristics : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider refactoring analyze_for_teen_risks to separate user I/O from the core analysis logic so it can be reused and more easily tested without interactive input calls.
- Instead of manually constructing the TestSuite in test_runner, leverage unittest discovery to automatically pick up all test modules and avoid forgetting new tests.
- Simple substring matching for heuristics may lead to false positives/negatives—consider normalizing the text and using word-boundary or regex-based checks to improve accuracy.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring analyze_for_teen_risks to separate user I/O from the core analysis logic so it can be reused and more easily tested without interactive input calls.
- Instead of manually constructing the TestSuite in test_runner, leverage unittest discovery to automatically pick up all test modules and avoid forgetting new tests.
- Simple substring matching for heuristics may lead to false positives/negatives—consider normalizing the text and using word-boundary or regex-based checks to improve accuracy.
## Individual Comments
### Comment 1
<location> `social_media_analyzer/main.py:167` </location>
<code_context>
+ print("\n--- Analyzing for Privacy Risks ---")
+ result = teen_protection.analyze_for_privacy_risks(text_to_analyze)
+
+ print(f"Score: {result['score']} (Higher is more suspicious)")
+ if result['indicators_found']:
+ print("Indicators Found:")
</code_context>
<issue_to_address>
**issue (bug_risk):** Accessing 'score' and 'indicators_found' without error handling may cause issues if the result contains an error.
Check for an 'error' key in the result before accessing 'score' or 'indicators_found' to avoid KeyError exceptions.
</issue_to_address>
### Comment 2
<location> `social_media_analyzer/teen_protection.py:33-41` </location>
<code_context>
+ category, keywords = keyword_map[analysis_type]
+ weight = HEURISTIC_WEIGHTS.get(category.upper(), 1.0)
+
+ for keyword in keywords:
+ if keyword in text_lower:
+ message = f"Detected potential {category.replace('_', ' ').lower()} keyword: '{keyword}'"
</code_context>
<issue_to_address>
**suggestion:** Simple substring matching may lead to false positives.
Consider using regular expressions with word boundaries to avoid matching keywords within other words and reduce false positives.
```suggestion
import re
category, keywords = keyword_map[analysis_type]
weight = HEURISTIC_WEIGHTS.get(category.upper(), 1.0)
for keyword in keywords:
# Use regex with word boundaries to avoid matching substrings within other words
pattern = r'\b' + re.escape(keyword) + r'\b'
if re.search(pattern, text_lower):
message = f"Detected potential {category.replace('_', ' ').lower()} keyword: '{keyword}'"
if message not in indicators_found:
indicators_found.append(message)
score += weight
```
</issue_to_address>
### Comment 3
<location> `social_media_analyzer/heuristics.py:273-278` </location>
<code_context>
"SUSPICIOUS_URL_PATTERN": 3.0, # High weight for matching a suspicious URL pattern
"GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
+ # Teenager Protection Weights
+ "CYBERBULLYING": 2.5,
+ "INAPPROPRIATE_CONTENT": 3.0,
+ "PRIVACY_RISK": 3.5,
}
</code_context>
<issue_to_address>
**suggestion:** Heuristic weights for teen protection categories may need calibration.
These weights appear arbitrary; please validate them against expert input or data to ensure appropriate risk prioritization.
```suggestion
"GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match
# Teenager Protection Weights
# NOTE: The following weights for teen protection categories are provisional.
# TODO: Validate these weights against expert input or empirical data to ensure appropriate risk prioritization.
"CYBERBULLYING": 2.5,
"INAPPROPRIATE_CONTENT": 3.0,
"PRIVACY_RISK": 3.5,
}
```
</issue_to_address>
### Comment 4
<location> `social_media_analyzer/test_teen_protection.py:8` </location>
<code_context>
+ analyze_for_privacy_risks
+)
+
+class TestTeenProtection(unittest.TestCase):
+
+ def test_cyberbullying(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for edge cases: partial keyword matches and case sensitivity.
Add tests for partial keyword matches and case sensitivity to ensure accurate and robust detection.
</issue_to_address>
### Comment 5
<location> `social_media_analyzer/test_teen_protection.py:10` </location>
<code_context>
+
+class TestTeenProtection(unittest.TestCase):
+
+ def test_cyberbullying(self):
+ """Test the cyberbullying detection."""
+ # Test case with bullying keywords
</code_context>
<issue_to_address>
**suggestion (testing):** No tests for invalid analysis type or error handling.
Add a test to verify that an error dictionary is returned for invalid analysis types, ensuring error handling is covered.
Suggested implementation:
```python
class TestTeenProtection(unittest.TestCase):
def test_cyberbullying(self):
"""Test the cyberbullying detection."""
# Test case with bullying keywords
text1 = "You are such a loser and an idiot."
result1 = analyze_for_cyberbullying(text1)
self.assertGreater(result1['score'], 0)
self.assertIn("Detected potential cyberbullying keyword: 'loser'", result1['indicators_found'])
self.assertIn("Detected potential cyberbullying keyword: 'idiot'", result1['indicators_found'])
# Test case with no bullying keywords
text2 = "Have a great day!"
result2 = analyze_for_cyberbullying(text2)
self.assertEqual(result2['score'], 0)
def test_invalid_analysis_type(self):
"""Test error handling for invalid analysis type."""
from .teen_protection import analyze_text
text = "This is a test message."
result = analyze_text(text, analysis_type="unknown_type")
self.assertIsInstance(result, dict)
self.assertIn("error", result)
self.assertIn("Invalid analysis type", result["error"])
```
If the dispatcher function is not named `analyze_text`, or its signature differs, you should adjust the import and call accordingly. Also, ensure that the error dictionary returned by the function contains an "error" key with a message about the invalid type.
</issue_to_address>
### Comment 6
<location> `social_media_analyzer/test_teen_protection.py:25` </location>
<code_context>
+ self.assertEqual(result2['score'], 0)
+ self.assertEqual(len(result2['indicators_found']), 0)
+
+ def test_inappropriate_content(self):
+ """Test the inappropriate content detection."""
+ # Test case with inappropriate keywords
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for multiple keyword occurrences in a single text.
Add a test where a keyword appears multiple times to ensure correct score accumulation and indicator handling.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| "GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match | ||
| # Teenager Protection Weights | ||
| "CYBERBULLYING": 2.5, | ||
| "INAPPROPRIATE_CONTENT": 3.0, | ||
| "PRIVACY_RISK": 3.5, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Heuristic weights for teen protection categories may need calibration.
These weights appear arbitrary; please validate them against expert input or data to ensure appropriate risk prioritization.
| "GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match | |
| # Teenager Protection Weights | |
| "CYBERBULLYING": 2.5, | |
| "INAPPROPRIATE_CONTENT": 3.0, | |
| "PRIVACY_RISK": 3.5, | |
| } | |
| "GOOGLE_SAFE_BROWSING_HIT": 10.0, # Very high weight for a positive Google Safe Browsing match | |
| # Teenager Protection Weights | |
| # NOTE: The following weights for teen protection categories are provisional. | |
| # TODO: Validate these weights against expert input or empirical data to ensure appropriate risk prioritization. | |
| "CYBERBULLYING": 2.5, | |
| "INAPPROPRIATE_CONTENT": 3.0, | |
| "PRIVACY_RISK": 3.5, | |
| } |
| self.assertEqual(result2['score'], 0) | ||
| self.assertEqual(len(result2['indicators_found']), 0) | ||
|
|
||
| def test_inappropriate_content(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing tests for multiple keyword occurrences in a single text.
Add a test where a keyword appears multiple times to ensure correct score accumulation and indicator handling.
This commit introduces a new set of tools focused on protecting teenagers from online threats. The new features include detection for:
A new module,
teen_protection.py, has been added to house the analysis logic. The main application has been updated to include a new menu for these tools. Heuristics have been expanded with relevant keywords and weights.Unit tests for the new functionality have been added and integrated into the existing test suite, and all tests are passing.
Summary by Sourcery
Integrate new teenager protection tools into the social media analyzer by adding a dedicated module for risk detection, expanding heuristics, updating the CLI menu, and including comprehensive unit tests.
New Features:
Enhancements:
Tests: