Skip to content

Commit 2d225f9

Browse files
committed
Added support for processing .xz and .gz files
1 parent 00c7b9f commit 2d225f9

File tree

4 files changed

+36
-6
lines changed

4 files changed

+36
-6
lines changed

README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,13 @@ A powerful CLI tool to extract, deduplicate, and analyze SQL logs for **Cockroac
3838

3939
## 📦 Installation
4040

41-
### Option A: Local Dev Install
41+
### Option A: Quick Install from PyPI
42+
43+
```bash
44+
pip install crdb-sql-audit
45+
```
46+
47+
### Option B: Local Dev Install
4248
```bash
4349
git clone https://github.com/your-org/crdb-sql-audit.git
4450
cd crdb-sql-audit
@@ -47,7 +53,7 @@ source venv/bin/activate
4753
pip install .
4854
```
4955

50-
### Option B: Build via `pyproject.toml`
56+
### Option C: Build via `pyproject.toml`
5157
```bash
5258
python -m build
5359
pip install dist/crdb_sql_audit-0.2.0-py3-none-any.whl
@@ -139,6 +145,20 @@ split -b 50M sql_only.log chunks/sql_chunk_
139145
crdb-sql-audit --dir chunks --terms execute,pg_ --out output/report
140146
```
141147

148+
### 🗜 Supported Log Formats
149+
150+
This tool automatically supports reading:
151+
152+
* ✅ Regular `.log` or `.txt` files
153+
* ✅ Compressed files: `.gz`, `.xz`
154+
* ✅ Folders with mixed log formats
155+
156+
You can pass these directly using `--file` or `--dir`:
157+
158+
```bash
159+
crdb-sql-audit --file logs/app.log.gz --out output/report_from_gz
160+
```
161+
142162
## 📚 Rule Engine Format
143163

144164
Rules are written in YAML and matched against each SQL line. Example:

crdb_sql_audit/audit.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,22 @@
22
import re
33
import shutil
44
import logging
5+
import gzip
6+
import lzma
57
import importlib.resources as pkg_resources
68
import pandas as pd
79
import matplotlib.pyplot as plt
810

911
from .rules_engine import load_rules, apply_rules
1012

13+
def open_log_file(path):
14+
if path.endswith(".gz"):
15+
return gzip.open(path, "rt", encoding="utf-8", errors="ignore")
16+
elif path.endswith(".xz"):
17+
return lzma.open(path, "rt", encoding="utf-8", errors="ignore")
18+
else:
19+
return open(path, "r", encoding="utf-8", errors="ignore")
20+
1121
def extract_sql(logs_path, search_terms, raw_mode=False):
1222
seen_sql = set()
1323

@@ -20,7 +30,7 @@ def extract_sql(logs_path, search_terms, raw_mode=False):
2030

2131
for path in paths:
2232
if os.path.isfile(path):
23-
with open(path, "r", encoding="utf-8", errors="ignore") as f:
33+
with open_log_file(path) as f:
2434
for line in f:
2535
if any(term in line for term in search_terms):
2636
if raw_mode:

crdb_sql_audit/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import sys
55
from crdb_sql_audit.audit import extract_sql, analyze_compatibility, generate_reports
66

7-
__version__ = "0.2.6"
7+
__version__ = "0.2.7"
88

99
logging.basicConfig(
1010
level=logging.INFO,

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "crdb-sql-audit"
7-
version = "0.2.6"
8-
description = "Analyze PostgreSQL SQL logs for CockroachDB compatibility"
7+
version = "0.2.7"
8+
description = "Analyze SQL logs for CockroachDB compatibility"
99
authors = [
1010
{ name = "Virag Tripathi", email = "virag.tripathi@gmail.com" }
1111
]

0 commit comments

Comments
 (0)