watson-developer-cloud
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/Dockerfile‎
Lines changed: 11 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/Dockerfile‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/Procfile‎
Lines changed: 1 addition & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/Procfile‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/README.md‎
Lines changed: 65 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/README.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/data/email.txt‎
Lines changed: 9 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/data/email.txt‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/main.py‎
Lines changed: 211 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/main.py‎
Lines changed: 211 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/granite/requirements.txt‎
Lines changed: 3 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/granite/requirements.txt‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/regex/Dockerfile‎
Lines changed: 11 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/regex/Dockerfile‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/regex/Procfile‎
Lines changed: 1 addition & 0 deletions b/‎discovery-data/webhook-enrichment-sample/regex/Procfile‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/regex/README.md‎
Lines changed: 58 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/regex/README.md‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎discovery-data/webhook-enrichment-sample/regex/data/nhtsa.csv‎
Lines changed: 10 additions & 0 deletions b/‎discovery-data/webhook-enrichment-sample/regex/data/nhtsa.csv‎
Lines changed: 10 additions & 0 deletions
@@ -0,0 +1,11 @@
+FROM --platform=linux/amd64 python:3-alpine
+
+WORKDIR /app
+
+COPY requirements.txt main.py /app
+
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt && \
+    rm requirements.txt
+
+CMD ["python", "main.py"]
@@ -0,0 +1 @@
+web: python main.py
@@ -0,0 +1,65 @@
+# Entity Extraction using a foundation model of [watsonx.ai](https://www.ibm.com/products/watsonx-ai)
+
+In this tutorial, we will extract entities from email using watsonx.ai Granite model.
+
+## Requirements
+- Instance of Watson Discovery Plus/Enterprise plan on IBM Cloud.
+- Instance of [Watson Machine Learning](https://cloud.ibm.com/catalog/services/watson-machine-learning).
+- An API key of IBM Cloud. You can see how to manage API keys [here](https://cloud.ibm.com/docs/account?topic=account-manapikey).
+
+## Setup Instructions
+
+### Deploy the webhook enrichment app to Code Engine
+In this tutorial, we will use [IBM Cloud Code Engine](https://www.ibm.com/cloud/code-engine) as the infrastructure for the application of webhook enrichment. Of course, you can deploy the application in any environment you like.
+
+1. [Create a project](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project#create-a-project) of Code Engine.
+2. [Create a secret](https://cloud.ibm.com/docs/codeengine?topic=codeengine-secret#secret-create) in the project. This secret contains the following key-value pairs:
+   - `WD_API_URL`: The API endpoint URL of your Discovery instance
+   - `WD_API_KEY`: The API key of your Discovery instance
+   - `WEBHOOK_SECRET`: A key to pass with the request that can be used to authenticate with the application. e.g. `purple unicorn`
+   - `IBM_CLOUD_API_KEY`: The API key of IBM Cloud. It is used to access Watson Machine Leanring API.
+   - `WML_ENDPOINT_URL`: The API endpoint URL of your Watson Machine Learning. See [the documentation](https://cloud.ibm.com/apidocs/machine-learning).
+   - `WML_INSTANCE_CRN`: The CRN of your Watson Mechine Learning instance. You can find your instance and CRN using `ibmcloud` command: `ibmcloud resources`
+3. [Deploy the application](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-source-code) from this repository source code.
+   - In **Create application**, click **Specify build details** and enter the following:
+      - Source
+         - Code repo URL: **TODO: public URL of this repository. https://github.com/watson-developer-cloud/discovery-webhook-enrichment ...?**
+         - Code repo access: `None`
+         - Branch name: `main`
+         - Context directory: `granite`
+      - Strategy
+         - Strategy: `Dockerfile`
+      - Output
+         - Enter your container image registry information.
+   - Open **Environment variables (optional)**, and add environment variables.
+      - Define as: `Reference to full secret`
+      - Secret: The name of the secret you created in Step 2.
+   - We recommend setting **Min number of instances** to `1`.
+4. Confirm that the application status changes to **Ready**.
+
+### Configure Discovery webhook enrichment
+1. Create a project.
+2. Create a webhook enrichment using Discovery API.
+   ```bash
+   curl -X POST {auth} \
+   --header 'Content-Type: multipart/form-data' \
+   --form 'enrichment={"name":"my-first-webhook-enrichment", \
+     "type":"webhook", \
+     "options":{"url":"{your_code_engine_app_domain}/webhook", \
+       "secret":"{your_webhook_secret}", \
+       "location_encoding":"utf-32"}}' \
+   '{url}/v2/projects/{project_id}/enrichments?version=2023-03-31'
+   ```
+3. Create a collection in the project and apply the webhook enrichment to the collection.
+   ```bash
+   curl -X POST {auth} \
+   --header 'Content-Type: application/json' \
+   --data '{"name":"my-collection", \
+     "enrichments":[{"enrichment_id":"{enrichment_id}", \
+       "fields":["text"]}]}' \
+   '{url}/v2/projects/{project_id}/collections?version=2023-03-31'
+   ```
+
+### Ingest documents to Discovery
+1. Upload [email.txt](data/email.txt) to the collection.
+2. You can find the enrichment results by webhook by previewing your query results after the document processing is complete.
@@ -0,0 +1,9 @@
+Dear team,
+
+I hope this email finds you well. I wanted to share some outstanding achievements from our recent sales efforts.
+
+We now have a multimillion dollar contract with Golden Retail, where we are providing cutting-edge software solution that streamlines inventory management for retail businesses. The customer was struggling with manual inventory tracking, leading to inefficiencies and errors. We have great testimonials from John Doe who is our contact at Golden Retail.
+
+Best regards,
+
+Sarah
@@ -0,0 +1,211 @@
+import flask
+import gzip
+import json
+import jwt
+import logging
+import os
+import queue
+import re
+import requests
+import threading
+import time
+
+WD_API_URL = os.getenv('WD_API_URL')
+WD_API_KEY = os.getenv('WD_API_KEY')
+WEBHOOK_SECRET = os.getenv('WEBHOOK_SECRET')
+IBM_CLOUD_API_KEY = os.getenv('IBM_CLOUD_API_KEY')
+WML_ENDPOINT_URL = os.getenv('WML_ENDPOINT_URL', 'https://us-south.ml.cloud.ibm.com')
+WML_INSTANCE_CRN = os.getenv('WML_INSTANCE_CRN')
+
+# Enrichment task queue
+q = queue.Queue()
+
+app = flask.Flask(__name__)
+app.logger.setLevel(logging.INFO)
+app.logger.handlers[0].setFormatter(logging.Formatter('[%(asctime)s] %(levelname)s in %(module)s: %(message)s (%(filename)s:%(lineno)d)'))
+
+def get_iam_token():
+    data = {'grant_type': 'urn:ibm:params:oauth:grant-type:apikey', 'apikey': IBM_CLOUD_API_KEY}
+    response = requests.post('https://iam.cloud.ibm.com/identity/token', data=data)
+    if response.status_code == 200:
+        return response.json()['access_token']
+    else:
+        raise Exception('Failed to get IAM token.')
+
+IAM_TOKEN = None
+
+def extract_entities(text):
+    global IAM_TOKEN
+    if IAM_TOKEN is None:
+        IAM_TOKEN = get_iam_token()
+    # Prompt
+    payload = {
+        'model_id': 'ibm/granite-13b-instruct-v1',
+        'input': f'''Act as a webmaster who must extract structured information from emails. Read the below email and extract and categorize each entity. If no entity is found, output "None".
+
+Input:
+"Golden Bank is a competitor of Silver Bank in the US" said John Doe.
+
+Named Entities:
+Golden Bank: company, Silver Bank: company, US: country, John Doe: person
+
+Input:
+{text}
+
+Named Entities:
+''',
+        'parameters': {
+            'decoding_method': 'greedy',
+            'max_new_tokens': 50,
+            'min_new_tokens': 1,
+            'stop_sequences': [],
+            'repetition_penalty': 1
+        },
+        'wml_instance_crn': WML_INSTANCE_CRN
+    }
+    params = {'version': '2023-05-29'}
+    headers = {'Authorization': f'Bearer {IAM_TOKEN}'}
+    response = requests.post(f'{WML_ENDPOINT_URL}/ml/v1-beta/generation/text', json=payload, params=params, headers=headers)
+    if response.status_code == 200:
+        result = response.json()['results'][0]['generated_text']
+        app.logger.info('LLM result: %s', result)
+        entities = []
+        if result == 'None':
+            # No entity found
+            return entities
+        for pair in re.split(r',\s*', result):
+            text_type = re.split(r':\s*', pair)
+            entities.append({'text': text_type[0], 'type': text_type[1]})
+        return entities
+    elif response.status_code == 401:
+        # Token expired. Re-generate it.
+        IAM_TOKEN = get_iam_token()
+        return extract_entities(text)
+    else:
+        raise Exception(f'Failed to generate: {response.text}')
+
+def enrich(doc):
+    app.logger.info('doc: %s', doc)
+    features_to_send = []
+    for feature in doc['features']:
+        # Target 'text' field
+        if feature['properties']['field_name'] != 'text':
+            continue
+        location = feature['location']
+        begin = location['begin']
+        end = location['end']
+        text = doc['artifact'][begin:end]
+        try:
+            # Entity extraction example
+            results = extract_entities(text)
+            app.logger.info('entities: %s', results)
+            for entity in results:
+                entity_text = entity['text']
+                entity_type = entity['type']
+                for matched in re.finditer(re.escape(entity_text), text):
+                    features_to_send.append(
+                        {
+                            'type': 'annotation',
+                            'location': {
+                                'begin': matched.start() + begin,
+                                'end': matched.end() + begin,
+                            },
+                            'properties': {
+                                'type': 'entities',
+                                'confidence': 1.0,
+                                'entity_type': entity_type,
+                                'entity_text': matched.group(0),
+                            },
+                        }
+                    )
+        except Exception as e:
+            # Notice example
+            features_to_send.append(
+                {
+                    'type': 'notice',
+                    'properties': {
+                        'description': str(e),
+                        'created': round(time.time() * 1000),
+                    },
+                }
+            )
+    app.logger.info('features_to_send: %s', features_to_send)
+    return {'document_id': doc['document_id'], 'features': features_to_send}
+
+def enrichment_worker():
+    while True:
+        item = q.get()
+        version = item['version']
+        data = item['data']
+        project_id = data['project_id']
+        collection_id = data['collection_id']
+        batch_id = data['batch_id']
+        batch_api = f'{WD_API_URL}/v2/projects/{project_id}/collections/{collection_id}/batches/{batch_id}'
+        params = {'version': version}
+        auth = ('apikey', WD_API_KEY)
+        headers = {'Accept-Encoding': 'gzip'}
+        try:
+            # Get documents from WD
+            response = requests.get(batch_api, params=params, auth=auth, headers=headers, stream=True)
+            status_code = response.status_code
+            app.logger.info('Pulled a batch: %s, status: %d', batch_id, status_code)
+            if status_code == 200:
+                # Annotate documents
+                enriched_docs = [enrich(json.loads(line)) for line in response.iter_lines()]
+                files = {
+                    'file': (
+                        'data.ndjson.gz',
+                        gzip.compress(
+                            '\n'.join(
+                                [json.dumps(enriched_doc) for enriched_doc in enriched_docs]
+                            ).encode('utf-8')
+                        ),
+                        'application/x-ndjson'
+                    )
+                }
+                # Upload annotated documents
+                response = requests.post(batch_api, params=params, files=files, auth=auth)
+                status_code = response.status_code
+                app.logger.info('Pushed a batch: %s, status: %d', batch_id, status_code)
+        except Exception as e:
+            app.logger.error('An error occurred: %s', e, exc_info=True)
+            # Retry
+            q.put(item)
+
+# Turn on the enrichment worker thread
+threading.Thread(target=enrichment_worker, daemon=True).start()
+
+# Webhook endpoint
+@app.route('/webhook', methods=['POST'])
+def webhook():
+    # Verify JWT token
+    header = flask.request.headers.get('Authorization')
+    _, token = header.split()
+    try:
+        jwt.decode(token, WEBHOOK_SECRET, algorithms=['HS256'])
+    except jwt.PyJWTError as e:
+        app.logger.error('Invalid token: %s', e)
+        return {'status': 'unauthorized'}, 401
+    # Process webhook event
+    data = flask.json.loads(flask.request.data)
+    app.logger.info('Received event: %s', data)
+    event = data['event']
+    if event == 'ping':
+        # Receive this event when a webhook enrichment is created
+        code = 200
+        status = 'ok'
+    elif event == 'enrichment.batch.created':
+        # Receive this event when a batch of the documents gets ready
+        code = 202
+        status = 'accepted'
+        # Put an enrichment request into the queue
+        q.put(data)
+    else:
+        # Unknown event type
+        code = 400
+        status = 'bad request'
+    return {'status': status}, code
+
+PORT = os.getenv('PORT', '8080')
+if __name__ == '__main__':
+    app.run(host='0.0.0.0', port=int(PORT))
@@ -0,0 +1,3 @@
+Flask
+pyjwt
+requests
@@ -0,0 +1,11 @@
+FROM --platform=linux/amd64 python:3-alpine
+
+WORKDIR /app
+
+COPY requirements.txt main.py /app
+
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt && \
+    rm requirements.txt
+
+CMD ["python", "main.py"]
@@ -0,0 +1 @@
+web: python main.py
@@ -0,0 +1,58 @@
+# Entity Extraction, Document Classification and Sentence Classification using regular expressions
+
+## Requirements
+- Instance of Watson Discovery Plus/Enterprise plan on IBM Cloud.
+
+## Setup Instructions
+
+### Deploy the webhook enrichment app to Code Engine
+In this tutorial, we will use [IBM Cloud Code Engine](https://www.ibm.com/cloud/code-engine) as the infrastructure for the application of webhook enrichment. Of course, you can deploy the application in any environment you like.
+
+1. [Create a project](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project#create-a-project) of Code Engine.
+2. [Create a secret](https://cloud.ibm.com/docs/codeengine?topic=codeengine-secret#secret-create) in the project. This secret contains the following key-value pairs:
+   - `WD_API_URL`: The API endpoint URL of your Discovery instance
+   - `WD_API_KEY`: The API key of your Discovery instance
+   - `WEBHOOK_SECRET`: A key to pass with the request that can be used to authenticate with the application. e.g. `purple unicorn`
+3. [Deploy the application](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-source-code) from this repository source code.
+   - In **Create application**, click **Specify build details** and enter the following:
+      - Source
+         - Code repo URL: **TODO: public URL of this repository. https://github.com/watson-developer-cloud/discovery-webhook-enrichment ...?**
+         - Code repo access: `None`
+         - Branch name: `main`
+         - Context directory: `regex`
+      - Strategy
+         - Strategy: `Dockerfile`
+      - Output
+         - Enter your container image registry information.
+   - Open **Environment variables (optional)**, and add environment variables.
+      - Define as: `Reference to full secret`
+      - Secret: The name of the secret you created in Step 2.
+   - We recommend setting **Min number of instances** to `1`.
+4. Confirm that the application status changes to **Ready**.
+
+### Configure Discovery webhook enrichment
+1. Create a project.
+2. Create a webhook enrichment using Discovery API.
+   ```bash
+   curl -X POST {auth} \
+   --header 'Content-Type: multipart/form-data' \
+   --form 'enrichment={"name":"my-first-webhook-enrichment", \
+     "type":"webhook", \
+     "options":{"url":"{your_code_engine_app_domain}/webhook", \
+       "secret":"{your_webhook_secret}", \
+       "location_encoding":"utf-32"}}' \
+   '{url}/v2/projects/{project_id}/enrichments?version=2023-03-31'
+   ```
+3. Create a collection in the project and apply the webhook enrichment to the collection.
+   ```bash
+   curl -X POST {auth} \
+   --header 'Content-Type: application/json' \
+   --data '{"name":"my-collection", \
+     "enrichments":[{"enrichment_id":"{enrichment_id}", \
+       "fields":["text"]}]}' \
+   '{url}/v2/projects/{project_id}/collections?version=2023-03-31'
+   ```
+
+### Ingest documents to Discovery
+1. Upload [nhtsa.csv](data/nhtsa.csv) to the collection.
+2. You can find the enrichment results by webhook by previewing your query results after the document processing is complete.
@@ -0,0 +1,10 @@
+id,text
+1,ENGINE SPEED CONTROL.
+2,2002 FORD EXPLORER DOOR LOCKS WILL NOT FUNCTION PROPERLY IN FREEZING WEATHER.
+3,WE WERE IN MY WIFE'S 2005 FORD FREESTAR DRIVING HOME FROM MY FAMILY FOR THE HOLIDAYS WHEN WE IN A SNOWSTORM. THE CAR VIOLENTLY JERKED DURING DRIVING SEVERAL TIMES.
+4,TRANSMISSION "SLIPS" THEN ENGAGES HARD. HAS PROGRESSIVELY GOTTEN WORSE. HAD TRANSMISSION SERVICE JUST ABOUT A YEAR AGO.
+5,FRONT LUG NUTS LOOSEN ON 2002 JEEP LIBERTY. THIRD TIME ON THIS CAR.
+6,KEY WON'T TURN IN IGNITION.
+7,I HAVE A 2003 TOYOTA RAV-4. THE TRANSMISSION STARTED TO SLIP.
+8,AIR BAG DIDN'T DEPLOY WHEN I WAS IN A CRASH.
+9,2005 CHEVY COBALT 32 587 MILES. DRIVING ON ICY ROADS AFTER SHOPPING FOR A NEW COUCH.