Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions dev_edition_trial_center/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
PIPFILE.lock

# Virtual environments
venv/
env/
ENV/
env.bak/
venv.bak/
.venv/

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db

# Streamlit
.streamlit/secrets.toml

# Output files
output/
*.txt.protected
*.txt.redacted

# Logs
*.log

# Temporary files
*.tmp
*.bak
90 changes: 90 additions & 0 deletions dev_edition_trial_center/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Dev Edition Trial Center Architecture

## Overview

Dev Edition Trial Center wraps Protegrity Developer Edition services into a guided sandbox that shows how semantic guardrails, data discovery, protection, and redaction cooperate to prepare GenAI prompts. It consists of a reusable Python pipeline, a comprehensive launch script, and a Streamlit UI that presents the pipeline as an interactive trial experience with multiple execution modes.

## Component map

- **Launch Script (`launch_trial_center.sh`)** – Comprehensive bash launcher that validates prerequisites (Docker, Python environment, services), manages Docker Compose lifecycle, performs health checks, and launches the Streamlit UI. Provides clear feedback about missing credentials and configuration status.
- **Streamlit UI (`app.py`)** – Interactive web interface that collects prompts, displays guardrail scores, previews protected/redacted outputs, and exposes a Run log tab that streams pipeline diagnostics. Features:
- **Sample prompts** – Pre-loaded examples (Approved, Data Leakage, Malicious, Off-Topic) demonstrating different guardrail outcomes
- **Execution modes** – Five pipeline configurations:
- Full Pipeline: All steps with sequential numbering (Steps 1-5)
- Semantic Guardrail: Guardrail scoring only
- Discover Sensitive Data: Entity discovery only
- Find, Protect & Unprotect: Discovery → Protection → Unprotection (Steps 1-3)
- Find & Redact: Discovery → Redaction (Steps 1-2)
- **Dynamic step numbering** – Each mode shows appropriate step numbers for its workflow
- **Error handling** – Displays clear error messages when protection fails without showing sensitive data
- **Themed UI** – Custom CSS for dropdown menus matching the Streamlit dark theme
- **Pipeline core (`trial_center_pipeline.py`)** – Provides `SemanticGuardrailClient`, `PromptSanitizer`, and helper utilities that the UI and CLI reuse. Key features:
- **Silent failure detection** – Identifies when protection doesn't modify text (indicating credential or authentication issues)
- **No fallback logic** – Removed automatic fallback from protection to redaction; instead surfaces clear errors
- **Structured results** – `SanitizationResult` includes `sanitize_error` field for tracking protection failures
- **CLI (`run_trial_center.py`)** – Batch-friendly entry point for processing files via the same pipeline and persisting reports to disk.
- **Developer Edition containers** – Docker Compose brings up Semantic Guardrail (port 8581) and Data Discovery/Classification services (port 8580). The pipeline communicates with these services via REST (guardrail) and the `protegrity_developer_python` SDK (discovery/protection/redaction).

## Data flow

```
User Prompt
Streamlit UI ──► Pipeline Mode Selection ──► Execution Path
│ │
│ ├─► Full Pipeline
│ ├─► Semantic Guardrail Only
│ ├─► Discover Only
│ ├─► Find, Protect & Unprotect
│ └─► Find & Redact
SemanticGuardrailClient ──► Semantic Guardrail service (REST)
│ │
│ └─► GuardrailResult (score/outcome/explanation)
├─► Data Discovery via SDK ──► Discovery entities
├─► PromptSanitizer (protect) ──► Protection attempt
│ │ │
│ │ ├─► Success: Protected tokens
│ │ └─► Failure: Error displayed (no data shown)
│ │
│ └─► find_and_unprotect via SDK (only if protection succeeded)
│ │
│ ├─► Success: Original text restored
│ └─► Failure: Error with credential tips
├─► PromptSanitizer (redact) ──► Redacted output (always succeeds)
└─► Results rendered with dynamic step numbering and error handling
```

1. The user selects a sample prompt or writes their own in the trial UI.
2. User chooses an execution mode from the dropdown menu.
3. Based on selected mode, the pipeline executes only relevant steps with appropriate step numbering.
4. `SemanticGuardrailClient` posts the prompt to the local Semantic Guardrail service and surfaces the outcome exactly as returned.
5. `PromptSanitizer` executes `find_and_protect`. If the text remains unchanged (indicating credential failure), `sanitize_error` is set and no data is displayed.
6. When protection succeeds, `find_and_unprotect` is attempted to verify reversibility.
7. A dedicated `PromptSanitizer` instance always performs redaction in Full Pipeline and Find & Redact modes.
8. The UI renders each stage with mode-appropriate step numbers and comprehensive error handling.

## Configuration & extensibility

- **Guardrail settings** – Configured for customer-support vertical; adjustable in `GuardrailConfig`
- **Environment variables** – `DEV_EDITION_EMAIL`, `DEV_EDITION_PASSWORD`, and `DEV_EDITION_API_KEY` enable reversible protection. Launch script detects missing credentials and provides clear warnings.
- **Caching** – The UI caches service client construction so repeated runs stay responsive.
- **Line-wise sanitisation** – `PromptSanitizer` processes multi-line prompts one line at a time, matching the sample CLI behaviour and yielding predictable redaction/protection output while preserving blank lines.
- **Modular rendering** – UI functions (`_render_protection`, `_render_unprotect`, etc.) accept dynamic step numbers for flexible display across execution modes
- **Sample prompts** – Easily extensible by adding new files to `samples/` directory and updating `SAMPLE_PROMPTS` dictionary
- **Next steps** – Teams can extend the architecture with conversation-level guardrails, policy presets stored under `configs/`, or alternative UIs (Gradio, FastAPI) that reuse the same pipeline module.

## Error handling philosophy

The Trial Center follows a transparent error handling approach:
- **No automatic fallbacks** – When protection fails, errors are surfaced clearly rather than silently switching to redaction
- **Security-first display** – Sensitive data is never displayed when protection fails
- **Clear guidance** – Error messages include actionable tips (e.g., setting environment variables)
- **Silent failure detection** – Detects when SDK operations complete without errors but don't modify data (indicating authentication issues)
- **Comprehensive feedback** – Each step provides success/failure indication with specific error details
173 changes: 173 additions & 0 deletions dev_edition_trial_center/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Dev Edition Trial Center

An interactive Streamlit application demonstrating how to integrate Protegrity Developer Edition services for privacy-aware GenAI workflows. The Trial Center provides a hands-on environment to explore semantic guardrails, data discovery, protection, and redaction capabilities through an easy-to-use web interface.

## Features

- **Interactive UI** – Web-based interface with sample prompts and multiple execution modes
- **Semantic Guardrail** – Validates prompts for topic relevance and risk detection
- **Data Discovery** – Identifies and classifies sensitive data (PII, credentials, etc.)
- **Reversible Protection** – Tokenizes sensitive data with ability to restore original values
- **Irreversible Redaction** – Permanently masks sensitive information
- **Pipeline Flexibility** – Five execution modes to test different combinations of services
- **Comprehensive Logging** – Built-in run log to observe service interactions
- **Developer-Friendly** – Includes CLI, Python package, unit tests, and automated launcher

## Prerequisites

Before running the Trial Center, ensure you have:

- **Docker Desktop** – macOS, Linux, or Windows with Docker Desktop (or equivalent Docker engine) installed and running. At least 4 GB RAM available for containers.
- **Python 3.11+** – The repository includes a virtual environment at `.venv/` with all dependencies.
- **Protegrity credentials (optional)** – Set `DEV_EDITION_EMAIL`, `DEV_EDITION_PASSWORD`, and `DEV_EDITION_API_KEY` environment variables to enable reversible protection. Without credentials, protection operations will display error messages, but semantic guardrail, discovery, and redaction features remain fully functional.


## Quick Start

### Option 1: Using the Launch Script (Recommended)

The easiest way to run the Trial Center is using the automated launch script:

```bash
cd dev_edition_trial_center
./launch_trial_center.sh
```

The launch script automatically handles everything:
- ✅ Validates Docker installation and running status
- ✅ Checks Python virtual environment
- ✅ Starts Developer Edition services (`docker compose up -d`)
- ✅ Performs health checks on all services (Semantic Guardrail, Data Discovery)
- ✅ Detects and displays credential configuration status
- ✅ Sets up the output directory
- ✅ Launches the Streamlit UI at `http://localhost:8501`

**Note:** If credentials are missing, the script will warn you but still launch. Protection operations will show error messages; discovery and redaction will work normally.

### Option 2: Manual Setup

If you prefer to run each step manually or troubleshoot issues:

1. Ensure the Developer Edition services are running (from repository root):
```bash
docker compose up -d
```
2. Activate the project virtual environment:
```bash
source .venv/bin/activate
```
3. Install optional UI dependencies:
```bash
pip install streamlit
```
4. Run the CLI with the provided test prompt:
```bash
python -m dev_edition_trial_center.run_trial_center \
dev_edition_trial_center/samples/input_test.txt \
--output-dir dev_edition_trial_center/output
```
- If protection credentials are unavailable, the pipeline will report clear errors.
- Use `--method redact` to force redaction.
5. Launch the Streamlit UI:
```bash
streamlit run dev_edition_trial_center/app.py
```
Point your browser to the provided local URL (typically `http://localhost:8501`).

6. (Optional) Run the lightweight unit tests:
```bash
python -m pytest dev_edition_trial_center/tests
```

## Using the Trial Center UI

### Sample Prompts

The UI includes four pre-loaded sample prompts demonstrating different guardrail scenarios:
- **Approved** – Customer support query that passes semantic guardrail validation
- **Data Leakage** – Prompt containing extensive PII that should be detected and protected
- **Malicious** – Prompt attempting harmful or inappropriate requests
- **Off-Topic** – Prompt outside the customer-support domain

Click any sample button to load the prompt into the text area.

### Execution Modes

Choose from five execution modes to explore different product combinations:

1. **Full Pipeline** – Complete workflow with all steps:
- Step 1: Semantic Guardrail
- Step 2: Discovery
- Step 3: Protection
- Step 4: Unprotection
- Step 5: Redaction

2. **Semantic Guardrail** – Guardrail scoring only

3. **Discover Sensitive Data** – Entity discovery only (Step 1)

4. **Find, Protect & Unprotect** – Three-step workflow:
- Step 1: Discovery
- Step 2: Protection
- Step 3: Unprotection

5. **Find & Redact** – Two-step workflow:
- Step 1: Discovery
- Step 2: Redaction

Each mode displays only the relevant steps with appropriate numbering.

### Run Log

Switch to the **Run log** tab to observe the guardrail and sanitization calls executed behind the scenes. INFO-level logs and SDK traces are captured automatically, showing:
- Service endpoints being called
- Entity detection details
- Protection/redaction operations
- Any warnings or errors

## Blueprint internals

1. **Semantic Guardrail** – Scores the prompt for topic relevance and risk. Trained on customer-support vertical using open-source datasets. Displays outcome with score and explanation.
2. **Data Discovery** – Enumerates detected entities (PII, sensitive data types) for audit trails.
3. **Protection** – Runs reversible tokenization with `find_and_protect`. **Requires credentials** to function. If protection fails (no credentials or authentication errors):
- Displays clear error message
- Shows credential setup instructions
- Does NOT display sensitive data
4. **Unprotect** – Verifies reversibility with `find_and_unprotect`, confirming that authorized services can reconstruct the original prompt. Only runs if protection succeeded.
5. **Redaction** – Provides irreversible masking with `find_and_redact`. Always available, works without credentials.
6. **Error Handling** – Transparent approach:
- No automatic fallbacks that mask failures
- Clear error messages with actionable guidance
- Security-first: never displays sensitive data when protection fails
- Detects silent failures (when SDK completes but doesn't modify data)

## Generated artifacts

CLI generates:
- `dev_edition_trial_center/output/input_test_sanitized.txt`
- `dev_edition_trial_center/output/input_test_report.json`

UI provides download buttons for:
- Protected prompts (when protection succeeds)
- Redacted prompts

## Extending the prototype

- Add multi-turn conversations by supplying a JSON conversation to
`metadata` or by extending `trial_center_pipeline.py`.
- Create additional sample prompts by adding files to `dev_edition_trial_center/samples/` and updating the `SAMPLE_PROMPTS` dictionary in `app.py`.
- Customize the UI theme by modifying the CSS in the Streamlit markdown section.
- Add new execution modes by extending the pipeline mode logic in `app.py`.
- Build policy templates per business unit by storing configuration presets in
`dev_edition_trial_center/configs/`.

## Validation checklist

- ✅ Runs entirely on Developer Edition modules.
- ✅ Demonstrates creative integration with GenAI safety workflows.
- ✅ Ships in a reusable form factor (package + CLI + samples + launcher).
- ✅ Comprehensive launch script with prerequisite validation and health checks.
- ✅ Interactive UI with sample prompts and multiple execution modes.
- ✅ Transparent error handling with security-first approach.
- ✅ Dynamic step numbering adapts to selected execution mode.
- ✅ Ready for iterative prototyping with modular architecture.
17 changes: 17 additions & 0 deletions dev_edition_trial_center/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Dev Edition Trial Center package for privacy-aware GenAI prompt workflows."""

from .trial_center_pipeline import (
TrialCenterReport,
GuardrailConfig,
TrialCenterPipeline,
SanitizationConfig,
process_from_file,
)

__all__ = [
"TrialCenterPipeline",
"GuardrailConfig",
"SanitizationConfig",
"TrialCenterReport",
"process_from_file",
]
Loading