You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+49-44Lines changed: 49 additions & 44 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,12 +16,14 @@ Last updated: 2024-11-26
16
16
> This example is based on a `public network site and is intended for demonstration purposes only`. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about [Important Considerations for Production Environment](#important-considerations-for-production-environment). Please note that `these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly`: [Microsoft Sales and Support](https://support.microsoft.com/contactus?ContactUsExperienceEntryPointAssetId=S.HP.SMC-HOME)
17
17
18
18
> How to parse PDFs from an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB. <br/> <br/>
19
+
>
19
20
> 1. Upload your PDFs to an Azure Blob Storage container. <br/>
20
21
> 2. An Azure Function is triggered by the upload, which calls the Azure Document Intelligence API to analyze the PDFs. <br/>
21
22
> 3. The extracted data is parsed and subsequently stored in a Cosmos DB database, ensuring a seamless and automated workflow from document upload to data storage.
22
23
23
24
> [!NOTE]
24
25
> Advantages of Document Intelligence for organizations handling with large volumes of documents: <br/>
26
+
>
25
27
> - Utilizes natural language processing, computer vision, deep learning, and machine learning. <br/>
26
28
> - Handles structured, semi-structured, and unstructured documents. <br/>
27
29
> - Automates the extraction and transformation of data into usable formats like JSON or CSV
@@ -68,8 +70,8 @@ Last updated: 2024-11-26
68
70
-[Step 4: Set Up Azure Document Intelligence](#step-4-set-up-azure-document-intelligence)
-[Step 5: Set Up Azure Functions for Document Ingestion and Processing](#step-5-set-up-azure-functions-for-document-ingestion-and-processing)
74
76
-[Create a Function App](#create-a-function-app)
75
77
-[Configure/Validate the Environment variables](#configurevalidate-the-environment-variables)
@@ -163,7 +165,7 @@ Last updated: 2024-11-26
163
165
164
166
## Step 2: Set Up Azure Blob Storage for PDF Ingestion
165
167
166
-
### Create a Storage Account:
168
+
### Create a Storage Account
167
169
168
170
> An `Azure Storage Account` provides a `unique namespace in Azure for your data, allowing you to store and manage various types of data such as blobs, files, queues, and tables`. It serves as the foundation for all Azure Storage services, ensuring high availability, scalability, and security for your data. <br/> <br/>
169
171
@@ -216,9 +218,9 @@ Within the Storage Account, create a Blob Container to store your PDFs.
216
218
217
219
## Step 3: Set Up Azure Cosmos DB
218
220
219
-
### Create a Cosmos DB Account:
221
+
### Create a Cosmos DB Account
220
222
221
-
> `Azure Cosmos DB` is a globally distributed,`multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.` <br/> <br/>
223
+
> `Azure Cosmos DB` is a globally distributed,`multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.` <br/> <br/>
222
224
223
225
- In the Azure portal, navigate to your **Resource Group**.
224
226
- Click **+ Create**.
@@ -268,31 +270,32 @@ Within the Storage Account, create a Blob Container to store your PDFs.
268
270
269
271
- Go to the Azure Portal.
270
272
-**Create a New Resource**:
271
-
- Click on `Create a resource` and search for `document intelligence`.
272
-
- Select `Document Intelligence` and click `Create`.
273
+
- Click on `Create a resource` and search for `document intelligence`.
274
+
- Select `Document Intelligence` and click `Create`.
- Collect a set of sample documents similar to your PDF example.
323
-
- Label the fields you want to extract using the [Form Recognizer Labeling Tool](https://fott-2-1.azurewebsites.net/). Click [here for more information about to use it](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/try-sample-label-tool?view=doc-intel-2.1.0#prerequisites-for-training-a-custom-form-model).
326
+
- Collect a set of sample documents similar to your PDF example.
327
+
- Label the fields you want to extract using the [Form Recognizer Labeling Tool](https://fott-2-1.azurewebsites.net/). Click [here for more information about to use it](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/try-sample-label-tool?view=doc-intel-2.1.0#prerequisites-for-training-a-custom-form-model).
- For this example we'll be using the system assigned identity to do that. Under `Identy` within your `Document Intelligence Account`, change the status to `On`, and click on `Save`:
336
+
- For this example we'll be using the system assigned identity to do that. Under `Identy` within your `Document Intelligence Account`, change the status to `On`, and click on `Save`:
333
337
334
338
> A system assigned managed identity is restricted to `one per resource and is tied to the lifecycle of this resource`. `You can grant permissions to the managed identity by using Azure role-based access control (Azure RBAC). The managed identity is authenticated with Microsoft Entra ID, so you don’t have to store any credentials in code`.
- Search for `Storage Blob Data Reader`, click `Next`. Then, click on `select members` and search for your `Document intelligence identity`. Finally click on `Review + assign`:
346
+
- Search for `Storage Blob Data Reader`, click `Next`. Then, click on `select members` and search for your `Document intelligence identity`. Finally click on `Review + assign`:
- Verify that the model correctly extracts the desired fields.
371
+
- Upload a new document to test the custom model.
372
+
- Verify that the model correctly extracts the desired fields.
369
373
370
374
## Step 5: Set Up Azure Functions for Document Ingestion and Processing
371
375
372
376
> An `Azure Function App` is a `container for hosting individual Azure Functions`. It provides the execution context for your functions, allowing you to manage, deploy, and scale them together. `Each function app can host multiple functions, which are small pieces of code that run in response to various triggers or events, such as HTTP requests, timers, or messages from other Azure services`. <br/> <br/>
373
377
> Azure Functions are designed to be lightweight and event-driven, enabling you to build scalable and serverless applications. `You only pay for the resources your functions consume while they are running, making it a cost-effective solution for many scenarios`.
374
378
375
-
### Create a Function App
379
+
### Create a Function App
380
+
376
381
- In the Azure portal, go to your **Resource Group**.
377
382
- Click **+ Create**.
378
383
@@ -437,7 +442,6 @@ Within the Storage Account, create a Blob Container to store your PDFs.
3. **Get Cosmos DB Account ID**: Run this command to get the ID of your Cosmos DB account. Record the value of the `id` property as it is required for the next step.
442
446
443
447
```powershell
@@ -489,14 +493,14 @@ Within the Storage Account, create a Blob Container to store your PDFs.
489
493
490
494
- Under `Settings`, go to `Environment variables`. And `+ Add` the following variables:
491
495
492
-
- `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
493
-
- `COSMOS_DB_KEY`: Your Cosmos DB account key.
494
-
- `COSMOS_DB_CONNECTION_STRING`: Your Cosmos DB connection string.
495
-
- `invoicecontosostorage_STORAGE`: Your Storage Account connection string.
496
-
- `FORM_RECOGNIZER_ENDPOINT`: For example: `https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/`
497
-
- `FORM_RECOGNIZER_KEY`: Your Documment Intelligence Key (Form Recognizer).
498
-
- `FUNCTIONS_EXTENSION_VERSION`: ~4 (Review the existence of this, if not create it)
499
-
- `FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR`: true (This setting ensures that all entry point errors are visible in your application insights logs).
496
+
- `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
497
+
- `COSMOS_DB_KEY`: Your Cosmos DB account key.
498
+
- `COSMOS_DB_CONNECTION_STRING`: Your Cosmos DB connection string.
499
+
- `invoicecontosostorage_STORAGE`: Your Storage Account connection string.
500
+
- `FORM_RECOGNIZER_ENDPOINT`: For example: `https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/`
501
+
- `FORM_RECOGNIZER_KEY`: Your Documment Intelligence Key (Form Recognizer).
502
+
- `FUNCTIONS_EXTENSION_VERSION`: ~4 (Review the existence of this, if not create it)
503
+
- `FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR`: true (This setting ensures that all entry point errors are visible in your application insights logs).
@@ -586,7 +590,7 @@ Within the Storage Account, create a Blob Container to store your PDFs.
586
590
> - It ensures the database and container exist, then inserts the extracted data. <br/>
587
591
> 8. **Logging (process and errors)**: Throughout the process, the function logs various steps and any errors encountered for debugging and monitoring purposes.
588
592
589
-
- Update the function_app.py:
593
+
- Update the function_app.py:
590
594
591
595
| Template Blob Trigger | Function Code updated |
592
596
| --- | --- |
@@ -759,7 +763,7 @@ Within the Storage Account, create a Blob Container to store your PDFs.
@@ -772,18 +776,19 @@ Within the Storage Account, create a Blob Container to store your PDFs.
772
776
azure-cosmos==4.3.0
773
777
azure-identity==1.7.0
774
778
```
775
-
- Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
776
-
- Click on the `Azure` icon.
777
-
- Under `workspace`, click on the `Function App` icon.
778
-
- Click on `Deploy to Azure`.
779
+
780
+
- Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
781
+
- Click on the `Azure` icon.
782
+
- Under `workspace`, click on the `Function App` icon.
0 commit comments