Skip to content

Commit 3732b0a

Browse files
authored
webhook enrichment samples
webhook enrichment samples
1 parent 8d56271 commit 3732b0a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2358
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM --platform=linux/amd64 python:3-alpine
2+
3+
WORKDIR /app
4+
5+
COPY requirements.txt main.py /app
6+
7+
RUN pip install --upgrade pip && \
8+
pip install -r requirements.txt && \
9+
rm requirements.txt
10+
11+
CMD ["python", "main.py"]
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
web: python main.py
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Entity Extraction using a foundation model of [watsonx.ai](https://www.ibm.com/products/watsonx-ai)
2+
3+
In this tutorial, we will extract entities from email using watsonx.ai Granite model.
4+
5+
## Requirements
6+
- Instance of Watson Discovery Plus/Enterprise plan on IBM Cloud.
7+
- Instance of [Watson Machine Learning](https://cloud.ibm.com/catalog/services/watson-machine-learning).
8+
- An API key of IBM Cloud. You can see how to manage API keys [here](https://cloud.ibm.com/docs/account?topic=account-manapikey).
9+
10+
## Setup Instructions
11+
12+
### Deploy the webhook enrichment app to Code Engine
13+
In this tutorial, we will use [IBM Cloud Code Engine](https://www.ibm.com/cloud/code-engine) as the infrastructure for the application of webhook enrichment. Of course, you can deploy the application in any environment you like.
14+
15+
1. [Create a project](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project#create-a-project) of Code Engine.
16+
2. [Create a secret](https://cloud.ibm.com/docs/codeengine?topic=codeengine-secret#secret-create) in the project. This secret contains the following key-value pairs:
17+
- `WD_API_URL`: The API endpoint URL of your Discovery instance
18+
- `WD_API_KEY`: The API key of your Discovery instance
19+
- `WEBHOOK_SECRET`: A key to pass with the request that can be used to authenticate with the application. e.g. `purple unicorn`
20+
- `IBM_CLOUD_API_KEY`: The API key of IBM Cloud. It is used to access Watson Machine Leanring API.
21+
- `WML_ENDPOINT_URL`: The API endpoint URL of your Watson Machine Learning. See [the documentation](https://cloud.ibm.com/apidocs/machine-learning).
22+
- `WML_INSTANCE_CRN`: The CRN of your Watson Mechine Learning instance. You can find your instance and CRN using `ibmcloud` command: `ibmcloud resources`
23+
3. [Deploy the application](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-source-code) from this repository source code.
24+
- In **Create application**, click **Specify build details** and enter the following:
25+
- Source
26+
- Code repo URL: **TODO: public URL of this repository. https://github.com/watson-developer-cloud/discovery-webhook-enrichment ...?**
27+
- Code repo access: `None`
28+
- Branch name: `main`
29+
- Context directory: `granite`
30+
- Strategy
31+
- Strategy: `Dockerfile`
32+
- Output
33+
- Enter your container image registry information.
34+
- Open **Environment variables (optional)**, and add environment variables.
35+
- Define as: `Reference to full secret`
36+
- Secret: The name of the secret you created in Step 2.
37+
- We recommend setting **Min number of instances** to `1`.
38+
4. Confirm that the application status changes to **Ready**.
39+
40+
### Configure Discovery webhook enrichment
41+
1. Create a project.
42+
2. Create a webhook enrichment using Discovery API.
43+
```bash
44+
curl -X POST {auth} \
45+
--header 'Content-Type: multipart/form-data' \
46+
--form 'enrichment={"name":"my-first-webhook-enrichment", \
47+
"type":"webhook", \
48+
"options":{"url":"{your_code_engine_app_domain}/webhook", \
49+
"secret":"{your_webhook_secret}", \
50+
"location_encoding":"utf-32"}}' \
51+
'{url}/v2/projects/{project_id}/enrichments?version=2023-03-31'
52+
```
53+
3. Create a collection in the project and apply the webhook enrichment to the collection.
54+
```bash
55+
curl -X POST {auth} \
56+
--header 'Content-Type: application/json' \
57+
--data '{"name":"my-collection", \
58+
"enrichments":[{"enrichment_id":"{enrichment_id}", \
59+
"fields":["text"]}]}' \
60+
'{url}/v2/projects/{project_id}/collections?version=2023-03-31'
61+
```
62+
63+
### Ingest documents to Discovery
64+
1. Upload [email.txt](data/email.txt) to the collection.
65+
2. You can find the enrichment results by webhook by previewing your query results after the document processing is complete.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Dear team,
2+
3+
I hope this email finds you well. I wanted to share some outstanding achievements from our recent sales efforts.
4+
5+
We now have a multimillion dollar contract with Golden Retail, where we are providing cutting-edge software solution that streamlines inventory management for retail businesses. The customer was struggling with manual inventory tracking, leading to inefficiencies and errors. We have great testimonials from John Doe who is our contact at Golden Retail.
6+
7+
Best regards,
8+
9+
Sarah
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
import flask
2+
import gzip
3+
import json
4+
import jwt
5+
import logging
6+
import os
7+
import queue
8+
import re
9+
import requests
10+
import threading
11+
import time
12+
13+
WD_API_URL = os.getenv('WD_API_URL')
14+
WD_API_KEY = os.getenv('WD_API_KEY')
15+
WEBHOOK_SECRET = os.getenv('WEBHOOK_SECRET')
16+
IBM_CLOUD_API_KEY = os.getenv('IBM_CLOUD_API_KEY')
17+
WML_ENDPOINT_URL = os.getenv('WML_ENDPOINT_URL', 'https://us-south.ml.cloud.ibm.com')
18+
WML_INSTANCE_CRN = os.getenv('WML_INSTANCE_CRN')
19+
20+
# Enrichment task queue
21+
q = queue.Queue()
22+
23+
app = flask.Flask(__name__)
24+
app.logger.setLevel(logging.INFO)
25+
app.logger.handlers[0].setFormatter(logging.Formatter('[%(asctime)s] %(levelname)s in %(module)s: %(message)s (%(filename)s:%(lineno)d)'))
26+
27+
def get_iam_token():
28+
data = {'grant_type': 'urn:ibm:params:oauth:grant-type:apikey', 'apikey': IBM_CLOUD_API_KEY}
29+
response = requests.post('https://iam.cloud.ibm.com/identity/token', data=data)
30+
if response.status_code == 200:
31+
return response.json()['access_token']
32+
else:
33+
raise Exception('Failed to get IAM token.')
34+
35+
IAM_TOKEN = None
36+
37+
def extract_entities(text):
38+
global IAM_TOKEN
39+
if IAM_TOKEN is None:
40+
IAM_TOKEN = get_iam_token()
41+
# Prompt
42+
payload = {
43+
'model_id': 'ibm/granite-13b-instruct-v1',
44+
'input': f'''Act as a webmaster who must extract structured information from emails. Read the below email and extract and categorize each entity. If no entity is found, output "None".
45+
46+
Input:
47+
"Golden Bank is a competitor of Silver Bank in the US" said John Doe.
48+
49+
Named Entities:
50+
Golden Bank: company, Silver Bank: company, US: country, John Doe: person
51+
52+
Input:
53+
{text}
54+
55+
Named Entities:
56+
''',
57+
'parameters': {
58+
'decoding_method': 'greedy',
59+
'max_new_tokens': 50,
60+
'min_new_tokens': 1,
61+
'stop_sequences': [],
62+
'repetition_penalty': 1
63+
},
64+
'wml_instance_crn': WML_INSTANCE_CRN
65+
}
66+
params = {'version': '2023-05-29'}
67+
headers = {'Authorization': f'Bearer {IAM_TOKEN}'}
68+
response = requests.post(f'{WML_ENDPOINT_URL}/ml/v1-beta/generation/text', json=payload, params=params, headers=headers)
69+
if response.status_code == 200:
70+
result = response.json()['results'][0]['generated_text']
71+
app.logger.info('LLM result: %s', result)
72+
entities = []
73+
if result == 'None':
74+
# No entity found
75+
return entities
76+
for pair in re.split(r',\s*', result):
77+
text_type = re.split(r':\s*', pair)
78+
entities.append({'text': text_type[0], 'type': text_type[1]})
79+
return entities
80+
elif response.status_code == 401:
81+
# Token expired. Re-generate it.
82+
IAM_TOKEN = get_iam_token()
83+
return extract_entities(text)
84+
else:
85+
raise Exception(f'Failed to generate: {response.text}')
86+
87+
def enrich(doc):
88+
app.logger.info('doc: %s', doc)
89+
features_to_send = []
90+
for feature in doc['features']:
91+
# Target 'text' field
92+
if feature['properties']['field_name'] != 'text':
93+
continue
94+
location = feature['location']
95+
begin = location['begin']
96+
end = location['end']
97+
text = doc['artifact'][begin:end]
98+
try:
99+
# Entity extraction example
100+
results = extract_entities(text)
101+
app.logger.info('entities: %s', results)
102+
for entity in results:
103+
entity_text = entity['text']
104+
entity_type = entity['type']
105+
for matched in re.finditer(re.escape(entity_text), text):
106+
features_to_send.append(
107+
{
108+
'type': 'annotation',
109+
'location': {
110+
'begin': matched.start() + begin,
111+
'end': matched.end() + begin,
112+
},
113+
'properties': {
114+
'type': 'entities',
115+
'confidence': 1.0,
116+
'entity_type': entity_type,
117+
'entity_text': matched.group(0),
118+
},
119+
}
120+
)
121+
except Exception as e:
122+
# Notice example
123+
features_to_send.append(
124+
{
125+
'type': 'notice',
126+
'properties': {
127+
'description': str(e),
128+
'created': round(time.time() * 1000),
129+
},
130+
}
131+
)
132+
app.logger.info('features_to_send: %s', features_to_send)
133+
return {'document_id': doc['document_id'], 'features': features_to_send}
134+
135+
def enrichment_worker():
136+
while True:
137+
item = q.get()
138+
version = item['version']
139+
data = item['data']
140+
project_id = data['project_id']
141+
collection_id = data['collection_id']
142+
batch_id = data['batch_id']
143+
batch_api = f'{WD_API_URL}/v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}'
144+
params = {'version': version}
145+
auth = ('apikey', WD_API_KEY)
146+
headers = {'Accept-Encoding': 'gzip'}
147+
try:
148+
# Get documents from WD
149+
response = requests.get(batch_api, params=params, auth=auth, headers=headers, stream=True)
150+
status_code = response.status_code
151+
app.logger.info('Pulled a batch: %s, status: %d', batch_id, status_code)
152+
if status_code == 200:
153+
# Annotate documents
154+
enriched_docs = [enrich(json.loads(line)) for line in response.iter_lines()]
155+
files = {
156+
'file': (
157+
'data.ndjson.gz',
158+
gzip.compress(
159+
'\n'.join(
160+
[json.dumps(enriched_doc) for enriched_doc in enriched_docs]
161+
).encode('utf-8')
162+
),
163+
'application/x-ndjson'
164+
)
165+
}
166+
# Upload annotated documents
167+
response = requests.post(batch_api, params=params, files=files, auth=auth)
168+
status_code = response.status_code
169+
app.logger.info('Pushed a batch: %s, status: %d', batch_id, status_code)
170+
except Exception as e:
171+
app.logger.error('An error occurred: %s', e, exc_info=True)
172+
# Retry
173+
q.put(item)
174+
175+
# Turn on the enrichment worker thread
176+
threading.Thread(target=enrichment_worker, daemon=True).start()
177+
178+
# Webhook endpoint
179+
@app.route('/webhook', methods=['POST'])
180+
def webhook():
181+
# Verify JWT token
182+
header = flask.request.headers.get('Authorization')
183+
_, token = header.split()
184+
try:
185+
jwt.decode(token, WEBHOOK_SECRET, algorithms=['HS256'])
186+
except jwt.PyJWTError as e:
187+
app.logger.error('Invalid token: %s', e)
188+
return {'status': 'unauthorized'}, 401
189+
# Process webhook event
190+
data = flask.json.loads(flask.request.data)
191+
app.logger.info('Received event: %s', data)
192+
event = data['event']
193+
if event == 'ping':
194+
# Receive this event when a webhook enrichment is created
195+
code = 200
196+
status = 'ok'
197+
elif event == 'enrichment.batch.created':
198+
# Receive this event when a batch of the documents gets ready
199+
code = 202
200+
status = 'accepted'
201+
# Put an enrichment request into the queue
202+
q.put(data)
203+
else:
204+
# Unknown event type
205+
code = 400
206+
status = 'bad request'
207+
return {'status': status}, code
208+
209+
PORT = os.getenv('PORT', '8080')
210+
if __name__ == '__main__':
211+
app.run(host='0.0.0.0', port=int(PORT))
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Flask
2+
pyjwt
3+
requests
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM --platform=linux/amd64 python:3-alpine
2+
3+
WORKDIR /app
4+
5+
COPY requirements.txt main.py /app
6+
7+
RUN pip install --upgrade pip && \
8+
pip install -r requirements.txt && \
9+
rm requirements.txt
10+
11+
CMD ["python", "main.py"]
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
web: python main.py
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Entity Extraction, Document Classification and Sentence Classification using regular expressions
2+
3+
## Requirements
4+
- Instance of Watson Discovery Plus/Enterprise plan on IBM Cloud.
5+
6+
## Setup Instructions
7+
8+
### Deploy the webhook enrichment app to Code Engine
9+
In this tutorial, we will use [IBM Cloud Code Engine](https://www.ibm.com/cloud/code-engine) as the infrastructure for the application of webhook enrichment. Of course, you can deploy the application in any environment you like.
10+
11+
1. [Create a project](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project#create-a-project) of Code Engine.
12+
2. [Create a secret](https://cloud.ibm.com/docs/codeengine?topic=codeengine-secret#secret-create) in the project. This secret contains the following key-value pairs:
13+
- `WD_API_URL`: The API endpoint URL of your Discovery instance
14+
- `WD_API_KEY`: The API key of your Discovery instance
15+
- `WEBHOOK_SECRET`: A key to pass with the request that can be used to authenticate with the application. e.g. `purple unicorn`
16+
3. [Deploy the application](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-source-code) from this repository source code.
17+
- In **Create application**, click **Specify build details** and enter the following:
18+
- Source
19+
- Code repo URL: **TODO: public URL of this repository. https://github.com/watson-developer-cloud/discovery-webhook-enrichment ...?**
20+
- Code repo access: `None`
21+
- Branch name: `main`
22+
- Context directory: `regex`
23+
- Strategy
24+
- Strategy: `Dockerfile`
25+
- Output
26+
- Enter your container image registry information.
27+
- Open **Environment variables (optional)**, and add environment variables.
28+
- Define as: `Reference to full secret`
29+
- Secret: The name of the secret you created in Step 2.
30+
- We recommend setting **Min number of instances** to `1`.
31+
4. Confirm that the application status changes to **Ready**.
32+
33+
### Configure Discovery webhook enrichment
34+
1. Create a project.
35+
2. Create a webhook enrichment using Discovery API.
36+
```bash
37+
curl -X POST {auth} \
38+
--header 'Content-Type: multipart/form-data' \
39+
--form 'enrichment={"name":"my-first-webhook-enrichment", \
40+
"type":"webhook", \
41+
"options":{"url":"{your_code_engine_app_domain}/webhook", \
42+
"secret":"{your_webhook_secret}", \
43+
"location_encoding":"utf-32"}}' \
44+
'{url}/v2/projects/{project_id}/enrichments?version=2023-03-31'
45+
```
46+
3. Create a collection in the project and apply the webhook enrichment to the collection.
47+
```bash
48+
curl -X POST {auth} \
49+
--header 'Content-Type: application/json' \
50+
--data '{"name":"my-collection", \
51+
"enrichments":[{"enrichment_id":"{enrichment_id}", \
52+
"fields":["text"]}]}' \
53+
'{url}/v2/projects/{project_id}/collections?version=2023-03-31'
54+
```
55+
56+
### Ingest documents to Discovery
57+
1. Upload [nhtsa.csv](data/nhtsa.csv) to the collection.
58+
2. You can find the enrichment results by webhook by previewing your query results after the document processing is complete.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
id,text
2+
1,ENGINE SPEED CONTROL.
3+
2,2002 FORD EXPLORER DOOR LOCKS WILL NOT FUNCTION PROPERLY IN FREEZING WEATHER.
4+
3,WE WERE IN MY WIFE'S 2005 FORD FREESTAR DRIVING HOME FROM MY FAMILY FOR THE HOLIDAYS WHEN WE IN A SNOWSTORM. THE CAR VIOLENTLY JERKED DURING DRIVING SEVERAL TIMES.
5+
4,TRANSMISSION "SLIPS" THEN ENGAGES HARD. HAS PROGRESSIVELY GOTTEN WORSE. HAD TRANSMISSION SERVICE JUST ABOUT A YEAR AGO.
6+
5,FRONT LUG NUTS LOOSEN ON 2002 JEEP LIBERTY. THIRD TIME ON THIS CAR.
7+
6,KEY WON'T TURN IN IGNITION.
8+
7,I HAVE A 2003 TOYOTA RAV-4. THE TRANSMISSION STARTED TO SLIP.
9+
8,AIR BAG DIDN'T DEPLOY WHEN I WAS IN A CRASH.
10+
9,2005 CHEVY COBALT 32 587 MILES. DRIVING ON ICY ROADS AFTER SHOPPING FOR A NEW COUCH.

0 commit comments

Comments
 (0)