Skip to content

Commit 89e3312

Browse files
Fix Markdown syntax issues
1 parent 0ac9e8f commit 89e3312

File tree

1 file changed

+49
-44
lines changed

1 file changed

+49
-44
lines changed

README.md

Lines changed: 49 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,14 @@ Last updated: 2024-11-26
1616
> This example is based on a `public network site and is intended for demonstration purposes only`. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about [Important Considerations for Production Environment](#important-considerations-for-production-environment). Please note that `these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly`: [Microsoft Sales and Support](https://support.microsoft.com/contactus?ContactUsExperienceEntryPointAssetId=S.HP.SMC-HOME)
1717
1818
> How to parse PDFs from an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB. <br/> <br/>
19+
>
1920
> 1. Upload your PDFs to an Azure Blob Storage container. <br/>
2021
> 2. An Azure Function is triggered by the upload, which calls the Azure Document Intelligence API to analyze the PDFs. <br/>
2122
> 3. The extracted data is parsed and subsequently stored in a Cosmos DB database, ensuring a seamless and automated workflow from document upload to data storage.
2223
2324
> [!NOTE]
2425
> Advantages of Document Intelligence for organizations handling with large volumes of documents: <br/>
26+
>
2527
> - Utilizes natural language processing, computer vision, deep learning, and machine learning. <br/>
2628
> - Handles structured, semi-structured, and unstructured documents. <br/>
2729
> - Automates the extraction and transformation of data into usable formats like JSON or CSV
@@ -68,8 +70,8 @@ Last updated: 2024-11-26
6870
- [Step 4: Set Up Azure Document Intelligence](#step-4-set-up-azure-document-intelligence)
6971
- [Create Document Intelligence Resource](#create-document-intelligence-resource)
7072
- [Configure Models](#configure-models)
71-
- [Using Prebuilt Models](#using-prebuilt-models)
72-
- [Training Custom Models](#training-custom-models-optionalif-needed) (optional/if needed)
73+
- [Using Prebuilt Models](#using-prebuilt-models)
74+
- [Training Custom Models](#training-custom-models-optionalif-needed) (optional/if needed)
7375
- [Step 5: Set Up Azure Functions for Document Ingestion and Processing](#step-5-set-up-azure-functions-for-document-ingestion-and-processing)
7476
- [Create a Function App](#create-a-function-app)
7577
- [Configure/Validate the Environment variables](#configurevalidate-the-environment-variables)
@@ -163,7 +165,7 @@ Last updated: 2024-11-26
163165

164166
## Step 2: Set Up Azure Blob Storage for PDF Ingestion
165167

166-
### Create a Storage Account:
168+
### Create a Storage Account
167169

168170
> An `Azure Storage Account` provides a `unique namespace in Azure for your data, allowing you to store and manage various types of data such as blobs, files, queues, and tables`. It serves as the foundation for all Azure Storage services, ensuring high availability, scalability, and security for your data. <br/> <br/>
169171
@@ -216,9 +218,9 @@ Within the Storage Account, create a Blob Container to store your PDFs.
216218

217219
## Step 3: Set Up Azure Cosmos DB
218220

219-
### Create a Cosmos DB Account:
221+
### Create a Cosmos DB Account
220222

221-
> `Azure Cosmos DB` is a globally distributed,` multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.` <br/> <br/>
223+
> `Azure Cosmos DB` is a globally distributed,`multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.` <br/> <br/>
222224
223225
- In the Azure portal, navigate to your **Resource Group**.
224226
- Click **+ Create**.
@@ -268,31 +270,32 @@ Within the Storage Account, create a Blob Container to store your PDFs.
268270

269271
- Go to the Azure Portal.
270272
- **Create a New Resource**:
271-
- Click on `Create a resource` and search for `document intelligence`.
272-
- Select `Document Intelligence` and click `Create`.
273+
- Click on `Create a resource` and search for `document intelligence`.
274+
- Select `Document Intelligence` and click `Create`.
273275

274276
<img width="550" alt="image" src="https://github.com/user-attachments/assets/e8783321-9bf3-42e2-83af-4d1c555205e3">
275277

276278
- **Configure the Resource**:
277-
- **Subscription**: Select your Azure subscription.
278-
- **Resource Group**: Choose an existing resource group or create a new one.
279-
- **Region**: Select the region closest to your location.
280-
- **Name**: Provide a unique name for your Form Recognizer resource.
281-
- **Pricing Tier**: Choose the pricing tier that fits your needs (e.g., Standard S0).
279+
- **Subscription**: Select your Azure subscription.
280+
- **Resource Group**: Choose an existing resource group or create a new one.
281+
- **Region**: Select the region closest to your location.
282+
- **Name**: Provide a unique name for your Form Recognizer resource.
283+
- **Pricing Tier**: Choose the pricing tier that fits your needs (e.g., Standard S0).
282284
- Review your settings and click `Create` to deploy the resource.
283285

284286
<img width="550" alt="image" src="https://github.com/user-attachments/assets/08335330-e9f5-455b-be22-6b938b979d99">
285287

286288
### Configure Models
287289

288290
#### Using Prebuilt Models
291+
289292
- **Access Form Recognizer Studio**:
290-
- Navigate to your Form Recognizer resource in the Azure Portal.
291-
- Check your `Resource Group` if needed:
293+
- Navigate to your Form Recognizer resource in the Azure Portal.
294+
- Check your `Resource Group` if needed:
292295

293296
<img width="550" alt="image" src="https://github.com/user-attachments/assets/d3559dc5-dbcb-44e6-b56d-d097d1719576">
294297

295-
- Under `Overview`, click on `Go to Document Intelligence Studio`:
298+
- Under `Overview`, click on `Go to Document Intelligence Studio`:
296299

297300
<img width="550" alt="image" src="https://github.com/user-attachments/assets/286545a3-574d-48d4-80de-66a58e5b5405">
298301

@@ -305,22 +308,23 @@ Within the Storage Account, create a Blob Container to store your PDFs.
305308
<img width="550" alt="image" src="https://github.com/user-attachments/assets/f88bce37-d7f3-4312-9053-e06f0743cdb3">
306309

307310
- **Analyze Document**:
308-
- Upload your PDF document to the Form Recognizer Studio.
311+
- Upload your PDF document to the Form Recognizer Studio.
309312

310313
<img width="550" alt="image" src="https://github.com/user-attachments/assets/575cb5d1-8e3b-4855-8f15-246ee1ea13b8">
311314

312-
- Click on `Run analysis`, the prebuilt model will automatically extract fields such as invoice ID, date, vendor information, line items, and totals.
315+
- Click on `Run analysis`, the prebuilt model will automatically extract fields such as invoice ID, date, vendor information, line items, and totals.
313316

314317
<img width="550" alt="image" src="https://github.com/user-attachments/assets/483ff4a5-73d3-4dcd-b35d-766f34a648b2">
315318

316-
- Validate your results:
319+
- Validate your results:
317320

318321
<img width="550" alt="image" src="https://github.com/user-attachments/assets/a945bd72-ea1c-4d33-9699-f9257a2ceffa">
319322

320-
#### Training Custom Models (optional/if needed):
323+
#### Training Custom Models (optional/if needed)
324+
321325
- **Prepare Training Data**:
322-
- Collect a set of sample documents similar to your PDF example.
323-
- Label the fields you want to extract using the [Form Recognizer Labeling Tool](https://fott-2-1.azurewebsites.net/). Click [here for more information about to use it](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/try-sample-label-tool?view=doc-intel-2.1.0#prerequisites-for-training-a-custom-form-model).
326+
- Collect a set of sample documents similar to your PDF example.
327+
- Label the fields you want to extract using the [Form Recognizer Labeling Tool](https://fott-2-1.azurewebsites.net/). Click [here for more information about to use it](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/try-sample-label-tool?view=doc-intel-2.1.0#prerequisites-for-training-a-custom-form-model).
324328

325329
<img width="550" alt="image" src="https://github.com/user-attachments/assets/94fca855-ec1b-444c-91f0-e05de13600df">
326330

@@ -329,17 +333,17 @@ Within the Storage Account, create a Blob Container to store your PDFs.
329333

330334
<img width="550" alt="image" src="https://github.com/user-attachments/assets/16feb31b-2a0e-4060-8e57-c870240a5109">
331335

332-
- For this example we'll be using the system assigned identity to do that. Under `Identy` within your `Document Intelligence Account`, change the status to `On`, and click on `Save`:
336+
- For this example we'll be using the system assigned identity to do that. Under `Identy` within your `Document Intelligence Account`, change the status to `On`, and click on `Save`:
333337

334338
> A system assigned managed identity is restricted to `one per resource and is tied to the lifecycle of this resource`. `You can grant permissions to the managed identity by using Azure role-based access control (Azure RBAC). The managed identity is authenticated with Microsoft Entra ID, so you don’t have to store any credentials in code`.
335339
336340
<img width="550" alt="image" src="https://github.com/user-attachments/assets/4be26e42-b9d4-4f04-ae5e-e8e6babd9366">
337341

338-
- Go to your `Storage Account`, under `Access Control (IAM)` click on `+ Add`, and then `Add role assigment`:
342+
- Go to your `Storage Account`, under `Access Control (IAM)` click on `+ Add`, and then `Add role assigment`:
339343

340344
<img width="550" alt="image" src="https://github.com/user-attachments/assets/59881d40-eb4c-4276-b3d3-d5e7dd877af0">
341345

342-
- Search for `Storage Blob Data Reader`, click `Next`. Then, click on `select members` and search for your `Document intelligence identity`. Finally click on `Review + assign`:
346+
- Search for `Storage Blob Data Reader`, click `Next`. Then, click on `select members` and search for your `Document intelligence identity`. Finally click on `Review + assign`:
343347

344348
<img width="550" alt="image" src="https://github.com/user-attachments/assets/e8bbe706-8ecc-41bd-a189-846e82ccef01">
345349

@@ -364,15 +368,16 @@ Within the Storage Account, create a Blob Container to store your PDFs.
364368
<img width="550" alt="image" src="https://github.com/user-attachments/assets/8552060b-f241-4d06-9a51-98b3b2171c08">
365369

366370
- **Test the Model**:
367-
- Upload a new document to test the custom model.
368-
- Verify that the model correctly extracts the desired fields.
371+
- Upload a new document to test the custom model.
372+
- Verify that the model correctly extracts the desired fields.
369373

370374
## Step 5: Set Up Azure Functions for Document Ingestion and Processing
371375

372376
> An `Azure Function App` is a `container for hosting individual Azure Functions`. It provides the execution context for your functions, allowing you to manage, deploy, and scale them together. `Each function app can host multiple functions, which are small pieces of code that run in response to various triggers or events, such as HTTP requests, timers, or messages from other Azure services`. <br/> <br/>
373377
> Azure Functions are designed to be lightweight and event-driven, enabling you to build scalable and serverless applications. `You only pay for the resources your functions consume while they are running, making it a cost-effective solution for many scenarios`.
374378
375-
### Create a Function App
379+
### Create a Function App
380+
376381
- In the Azure portal, go to your **Resource Group**.
377382
- Click **+ Create**.
378383

@@ -437,7 +442,6 @@ Within the Storage Account, create a Blob Container to store your PDFs.
437442
438443
<img width="550" alt="image" src="https://github.com/user-attachments/assets/4c19d70e-d525-4c15-bb0e-518f50f61b37">
439444
440-
441445
3. **Get Cosmos DB Account ID**: Run this command to get the ID of your Cosmos DB account. Record the value of the `id` property as it is required for the next step.
442446
443447
```powershell
@@ -489,14 +493,14 @@ Within the Storage Account, create a Blob Container to store your PDFs.
489493
490494
- Under `Settings`, go to `Environment variables`. And `+ Add` the following variables:
491495
492-
- `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
493-
- `COSMOS_DB_KEY`: Your Cosmos DB account key.
494-
- `COSMOS_DB_CONNECTION_STRING`: Your Cosmos DB connection string.
495-
- `invoicecontosostorage_STORAGE`: Your Storage Account connection string.
496-
- `FORM_RECOGNIZER_ENDPOINT`: For example: `https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/`
497-
- `FORM_RECOGNIZER_KEY`: Your Documment Intelligence Key (Form Recognizer).
498-
- `FUNCTIONS_EXTENSION_VERSION`: ~4 (Review the existence of this, if not create it)
499-
- `FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR`: true (This setting ensures that all entry point errors are visible in your application insights logs).
496+
- `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
497+
- `COSMOS_DB_KEY`: Your Cosmos DB account key.
498+
- `COSMOS_DB_CONNECTION_STRING`: Your Cosmos DB connection string.
499+
- `invoicecontosostorage_STORAGE`: Your Storage Account connection string.
500+
- `FORM_RECOGNIZER_ENDPOINT`: For example: `https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/`
501+
- `FORM_RECOGNIZER_KEY`: Your Documment Intelligence Key (Form Recognizer).
502+
- `FUNCTIONS_EXTENSION_VERSION`: ~4 (Review the existence of this, if not create it)
503+
- `FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR`: true (This setting ensures that all entry point errors are visible in your application insights logs).
500504
501505
<img width="550" alt="image" src="https://github.com/user-attachments/assets/31d813e7-38ba-46ff-9e4b-d091ae02706a">
502506
@@ -586,7 +590,7 @@ Within the Storage Account, create a Blob Container to store your PDFs.
586590
> - It ensures the database and container exist, then inserts the extracted data. <br/>
587591
> 8. **Logging (process and errors)**: Throughout the process, the function logs various steps and any errors encountered for debugging and monitoring purposes.
588592
589-
- Update the function_app.py:
593+
- Update the function_app.py:
590594
591595
| Template Blob Trigger | Function Code updated |
592596
| --- | --- |
@@ -759,7 +763,7 @@ Within the Storage Account, create a Blob Container to store your PDFs.
759763
760764
</details>
761765
762-
- Now, let's update the `requirements.txt`:
766+
- Now, let's update the `requirements.txt`:
763767
764768
| Template `requirements.txt` | Updated `requirements.txt` |
765769
| --- | --- |
@@ -772,18 +776,19 @@ Within the Storage Account, create a Blob Container to store your PDFs.
772776
azure-cosmos==4.3.0
773777
azure-identity==1.7.0
774778
```
775-
- Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
776-
- Click on the `Azure` icon.
777-
- Under `workspace`, click on the `Function App` icon.
778-
- Click on `Deploy to Azure`.
779+
780+
- Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
781+
- Click on the `Azure` icon.
782+
- Under `workspace`, click on the `Function App` icon.
783+
- Click on `Deploy to Azure`.
779784
780785
<img width="550" alt="image" src="https://github.com/user-attachments/assets/12405c04-fa43-4f09-817d-f6879fbff035">
781786
782-
- Select your `subscription`, your `function app`, and accept the prompt to overwrite:
787+
- Select your `subscription`, your `function app`, and accept the prompt to overwrite:
783788
784789
<img width="550" alt="image" src="https://github.com/user-attachments/assets/1882e777-6ba0-4e18-9d7b-5937204c7217">
785790
786-
- After completing, you see the status in your terminal:
791+
- After completing, you see the status in your terminal:
787792
788793
<img width="550" alt="image" src="https://github.com/user-attachments/assets/aa090cfc-f5b3-4ef2-9c2d-6be4f00b83b8">
789794

0 commit comments

Comments
 (0)