Skip to content

Commit 8e9f2d8

Browse files
committed
[fix]: minor changes of mistakes and update to new model version in hf
1 parent a723d33 commit 8e9f2d8

File tree

7 files changed

+15
-13
lines changed

7 files changed

+15
-13
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ To integrate the detector with your project on the SuperAnnotate platform, pleas
1919
The Generated Text Detection model is built on a fine-tuned RoBERTa Large architecture. It has been extensively trained on a diverse dataset that includes internal generation and subset of RAID train dataset, enabling it to accurately classify text as either generated (synthetic) or human-written. \
2020
This model is optimized for robust detection, offering two configurations based on specific needs:
2121

22-
- **Optimized for Low False Positive Rate (FPR):** [AI Detector](https://huggingface.co/SuperAnnotate/ai-detector)
23-
- **Optimized for High Overall Prediction Accuracy:** [LLM Content Detector V2](https://huggingface.co/SuperAnnotate/roberta-large-llm-content-detector-V2)
22+
- **Optimized for Low False Positive Rate (FPR):** [AI Detector Low FPR](https://huggingface.co/SuperAnnotate/ai-detector-low-fpr)
23+
- **Optimized for High Overall Prediction Accuracy:** [AI Detector](https://huggingface.co/SuperAnnotate/ai-detector)
2424

2525
For more details and access to the model weights, please refer to the links above on the Hugging Face Model Hub.
2626

@@ -52,7 +52,9 @@ Hardware requirements will depend on your on your deployment type. Recommended e
5252
### As python file ###
5353

5454
1. Install requirements: `pip install -r generated_text_detector/requirements.txt`
55-
2. Set the Python path variable: `export PYTHONPATH="."`
55+
2. Set the Python path variable:
56+
- `export PYTHONPATH="."`
57+
- `export DETECTOR_CONFIG_PATH="etc/configs/detector_config.json"`
5658
3. Run the API: `uvicorn --host 0.0.0.0 --port 8080 --ssl-keyfile=./key.pem --ssl-certfile=./cert.pem generated_text_detector.fastapi_app:app`
5759

5860
### As docker containers ###

etc/configs/detector_config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"text_detector_model": "SuperAnnotate/roberta-large-llm-content-detector",
2+
"text_detector_model": "SuperAnnotate/ai-detector",
33
"code_default_probability": 0.5
44
}

generated_text_detector/fastapi_app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
def parse_args():
3939
DEFAULT_HOST = "0.0.0.0"
4040
DEFAULT_PORT = "8080"
41-
DEFAULT_DETECTOR_CONFIG_PATH = "etc/detector_config.json"
41+
DEFAULT_DETECTOR_CONFIG_PATH = "etc/configs/detector_config.json"
4242
DEFAULT_DEVICE = "cuda:0"
4343

4444
parser = argparse.ArgumentParser(

generated_text_detector/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
fastapi==0.110.0
2+
numpy==1.25.2
23
nltk==3.8.1
34
starlette==0.36.3
45
torch==2.2.1

generated_text_detector/utils/aggregated_detector.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def __init__(
2626
self.code_block_pattern = re.compile(r"```(\w+)?\s*([\s\S]*?)\s*```")
2727

2828

29-
def __split_text_and_code(self, text: str) -> tuple[str, str, str]:
29+
def __split_text_and_code(self, text: str) -> tuple[str, str]:
3030
"""Split input text to text and code blocks.
3131
3232
:param text: Input text
@@ -44,13 +44,13 @@ def __split_text_and_code(self, text: str) -> tuple[str, str, str]:
4444
return text, code
4545

4646

47-
def detect_report(self, text: str) -> list[tuple[str, float]]:
47+
def detect_report(self, text: str) -> dict:
4848
"""Detects if text is generated and prepare a report.
4949
5050
:param text: Input text
5151
:type text: str
5252
:return: Text chunks with generated scores
53-
:rtype: list[tuple[str, float]]
53+
:rtype: dict with keys: 'generated_score' and 'author'
5454
"""
5555
text, code = self.__split_text_and_code(text)
5656

@@ -124,7 +124,7 @@ def __determine_author(generated_score: float) -> Author:
124124
if __name__ == "__main__":
125125
import json
126126

127-
with open("etc/detector_config.json") as f:
127+
with open("etc/configs/detector_config.json") as f:
128128
detector_config = json.load(f)
129129

130130
detector = AggregatedDetector(

generated_text_detector/utils/model/roberta_classifier.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,5 +99,5 @@ def forward(
9999

100100

101101
if __name__ == "__main__":
102-
model = RobertaClassifier.from_pretrained("SuperAnnotate/roberta-large-llm-content-detector")
102+
model = RobertaClassifier.from_pretrained("SuperAnnotate/ai-detector")
103103
print(model)

generated_text_detector/utils/text_detector.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
from nltk.tokenize import sent_tokenize
44
from transformers import RobertaTokenizer
55

6-
from generated_text_detector.controllers.schemas_type import Author
76
from generated_text_detector.utils.model.roberta_classifier import RobertaClassifier
87

98

@@ -123,10 +122,10 @@ def detect(self, text: str) -> list[tuple[str, float]]:
123122

124123
if __name__ == "__main__":
125124
detector = GeneratedTextDetector(
126-
"SuperAnnotate/roberta-large-llm-content-detector",
125+
"SuperAnnotate/ai-detector",
127126
"cuda:0"
128127
)
129128

130-
res = detector.detect_report("Hello, world!")
129+
res = detector.detect("Hello, world!")
131130

132131
print(res)

0 commit comments

Comments
 (0)