Skip to content

Commit a0de0bd

Browse files
dosoanand-presidio
authored andcommitted
feat: Add CorMatrix integration for AI code origin tracking (#169)
* feat: Add CorMatrix integration for AI code origin tracking Implements optional code retention analysis with privacy-first design. Only cryptographic hashes are transmitted, source code stays local. Includes comprehensive documentation and workspace configuration support. * fix: Improve CorMatrix code tracking and diff calculation - Corrected iteration in CorMatrixService.track to use 'change' instead of 'diff' for accurate code origin tracking. - Enhanced DiffViewProvider to normalize line endings before diffing, ensuring accurate change detection across different OSs. This fixes potential issues with incorrect diffs due to inconsistent line endings.
1 parent 57b3d66 commit a0de0bd

File tree

10 files changed

+571
-24
lines changed

10 files changed

+571
-24
lines changed

.changeset/tall-rockets-greet.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
"hai-build-code-generator": minor
3+
---
4+
5+
Add optional CorMatrix integration for AI code origin tracking
6+
7+
Introduces privacy-first code retention analysis that tracks AI-generated code patterns through cryptographic hashes. Includes comprehensive documentation, workspace configuration support, and zero-impact background processing.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,4 @@ src/core/controller/*/index.ts
3232
src/core/controller/grpc-service-config.ts
3333
webview-ui/src/services/grpc-client.ts
3434
src/standalone/server-setup.ts
35+
.hai.config

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
- [📝 HAI Tasks](#-hai-tasks) : Integrate AI-generated user stories and tasks seamlessly into your workflow
2828
- [🔍 File Identification](#-file-identification) : Discover and contextualize code files with intelligent indexing
2929
- [⚙️ Settings Interface](#-settings-interface) : Easily configure LLMs and embedding models for tailored performance
30+
- [📊 CorMatrix Integration](#-cormatrix-integration) : Track AI code retention patterns and analyze code origin over time
3031

3132
<br>
3233

@@ -170,6 +171,18 @@ Customize and seamlessly integrate advanced language and embedding models into y
170171

171172
---
172173

174+
### 📊 CorMatrix Integration
175+
Track AI code retention patterns and analyze how much AI-generated code remains in your codebase over time.
176+
177+
- **Code Origin Tracking**: Monitor AI-generated code longevity and evolution patterns
178+
- **Privacy-First**: Only cryptographic hashes are transmitted, your code stays local
179+
- **Optional Integration**: Activate through workspace configuration when needed
180+
- **Zero Performance Impact**: Background processing with graceful degradation
181+
182+
For detailed setup and configuration, see our [CorMatrix Integration Guide](docs/features/cormatrix-integration.md).
183+
184+
---
185+
173186
### 📊 Telemetry
174187
Configure external telemetry settings to monitor and analyze your AI-powered development workflows with environment-specific customization capabilities.
175188

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# CorMatrix Integration
2+
3+
The HAI Code Generator includes built-in integration with CorMatrix, a Code Origin Ratio tracking system that helps you understand how much AI-generated code is retained over time.
4+
5+
## What is CorMatrix?
6+
7+
CorMatrix is a Node.js SDK and CLI that analyzes AI code retention patterns by tracking how much AI-generated code remains in your codebase versus how much gets modified or removed over time. The HAI Code Generator automatically tracks file operations performed by the AI assistant and sends this data to CorMatrix for analysis.
8+
9+
This provides valuable insights into:
10+
11+
- **AI Code Longevity**: Whether AI-generated code tends to be temporary scaffolding or permanent solutions
12+
- **Code Evolution**: How developers iterate on AI-generated code
13+
- **Retention Rates**: What percentage of AI-generated code survives in the final codebase
14+
- **Usage Patterns**: Understanding the real-world effectiveness of AI coding assistance
15+
16+
For detailed information about CorMatrix SDK and CLI, see the [official documentation](https://www.npmjs.com/package/@presidio-dev/cor-matrix).
17+
18+
## How It Works
19+
20+
The HAI Code Generator conditionally tracks file operations through the `CorMatrixService` only when **all** conditions are met:
21+
22+
1. The AI assistant performs a file modification or creation
23+
2. The operation contains valid file content with line-level changes
24+
3. Required CorMatrix configuration is present in your workspace
25+
4. The CorMatrix service is available and properly configured
26+
27+
> **Important Note**: Your actual source code **never leaves your system**. Only cryptographic hash signatures are generated locally and sent to CorMatrix for analysis. All data is encrypted in transit and at rest. Tracking runs in the background with batch processing, ensuring **zero impact** on AI assistant performance.
28+
29+
## Privacy & Security
30+
31+
CorMatrix integration is designed with privacy and security in mind:
32+
33+
- **Your Code Stays Local**: Your actual source code **never leaves your development environment**
34+
- **Hash-Only Transmission**: Only cryptographic hash signatures are generated locally and sent to CorMatrix
35+
- **Encryption**: All transmitted data is encrypted in transit and at rest
36+
- **Selective Tracking**: Only AI-generated code additions are monitored (deletions are ignored)
37+
- **Background Processing**: Tracking uses batching and background processing for zero performance impact
38+
39+
## Configuration
40+
41+
CorMatrix integration is **completely optional** and activates only when configured.
42+
43+
### Workspace Configuration
44+
45+
Create a `.hai.config` file in your workspace root with the following CorMatrix settings:
46+
47+
```
48+
# CorMatrix Configuration
49+
cormatrix.baseURL=https://your-cormatrix-instance.com
50+
cormatrix.token=your-api-token
51+
cormatrix.workspaceId=your-workspace-id
52+
```
53+
54+
### Configuration Parameters
55+
56+
- **`baseURL`**: Your CorMatrix server endpoint
57+
- **`token`**: Authentication token for CorMatrix API
58+
- **`workspaceId`**: Unique identifier for your workspace
59+
60+
All parameters are optional, but the integration will only activate when all required parameters are provided.
61+
62+
### Configuration File Security
63+
64+
> **Important**: The `.hai.config` file is not git-excluded by default. Ensure sensitive tokens are not committed unintentionally to your repository.
65+
66+
## Optional Integration
67+
68+
CorMatrix integration provides graceful operation:
69+
70+
- **Default Behavior**: HAI Code Generator operates normally without CorMatrix configuration
71+
- **Silent Activation**: Integration only activates when required configuration is present
72+
- **Graceful Degradation**: If CorMatrix service is unavailable, the AI assistant continues working unaffected
73+
- **Zero Performance Impact**: All tracking happens in the background without affecting your development workflow
74+
75+
## How Tracking Works
76+
77+
The integration automatically:
78+
79+
1. **Monitors File Operations**: Tracks when the AI assistant modifies or creates files
80+
2. **Captures Line Diffs**: Records line-by-line changes made by the AI
81+
3. **Processes Added Code**: Only tracks newly added code (deletions are ignored)
82+
4. **Generates Hashes**: Creates cryptographic signatures of the added code locally
83+
5. **Transmits Safely**: Sends only hash signatures and metadata to CorMatrix
84+
6. **Associates with Files**: Links generated code signatures to specific file paths
85+
86+
## Troubleshooting
87+
88+
### Integration Not Working
89+
90+
If CorMatrix integration isn't tracking changes:
91+
92+
1. **Check Configuration**: Ensure all required parameters are set in `.hai.config`
93+
2. **Verify Connectivity**: Test connection to your CorMatrix instance
94+
3. **Review Logs**: Check HAI Code Generator logs for CorMatrix-related errors
95+
4. **Validate Credentials**: Confirm your token and workspace ID are correct
96+
97+
### Performance Concerns
98+
99+
CorMatrix integration is designed for zero performance impact:
100+
101+
- All processing happens in background threads
102+
- Batch processing minimizes network requests
103+
- Local hash generation is computationally lightweight
104+
- Graceful degradation prevents blocking operations
105+
106+
### Privacy Questions
107+
108+
**Q: What data is sent to CorMatrix?**
109+
A: Only cryptographic hash signatures of added code and associated file paths. Your actual source code never leaves your system.
110+
111+
**Q: Can CorMatrix reconstruct my code from hashes?**
112+
A: No. Cryptographic hashes are one-way functions that cannot be reversed to reveal the original code.
113+
114+
**Q: Is tracking mandatory?**
115+
A: No. CorMatrix integration is completely optional and only activates when explicitly configured.

0 commit comments

Comments
 (0)