diff --git a/README.md b/README.md
index ebe2549..d3ff8d7 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,6 @@ Last updated: 2025-05-20
----------
-
List of References (Click to expand)
@@ -30,21 +29,21 @@ Last updated: 2025-05-20
- [Where to start?](#where-to-start)
- [Important Considerations for Production Environment](#important-considerations-for-production-environment)
- [Overview](#overview)
- - [Function App Hosting Options](#function-app-hosting-options)
+ - [Function App Hosting Options](#function-app-hosting-options)
- [Step 1: Set Up Your Azure Environment](#step-1-set-up-your-azure-environment)
- [Step 2: Set Up Azure Blob Storage for PDF Ingestion](#step-2-set-up-azure-blob-storage-for-pdf-ingestion)
- [Step 3: Set Up Azure Cosmos DB](#step-3-set-up-azure-cosmos-db)
- [Step 4: Set Up Azure Functions for Document Ingestion and Processing](#step-4-set-up-azure-functions-for-document-ingestion-and-processing)
- - [Create a Function App](#create-a-function-app)
- - [Configure/Validate the Environment variables](#configurevalidate-the-environment-variables)
- - [Develop the Function](#develop-the-function)
+ - [Create a Function App](#create-a-function-app)
+ - [Configure/Validate the Environment variables](#configurevalidate-the-environment-variables)
+ - [Develop the Function](#develop-the-function)
- [Step 5: Test the solution](#step-5-test-the-solution)
-
> [!NOTE]
> Limitations of this approach:
+>
> - Requires significant manual effort to structure and format extracted data.
> - Limited in handling complex layouts and non-text elements like images and charts.
@@ -107,8 +106,8 @@ Last updated: 2025-05-20
- An `Azure subscription is required`. All other resources, including instructions for creating a Resource Group, are provided in this workshop.
- `Contributor role assigned or any custom role that allows`: access to manage all resources, and the ability to deploy resources within subscription.
- If you choose to use the Terraform approach, please ensure that:
- - [Terraform is installed on your local machine](https://developer.hashicorp.com/terraform/tutorials/azure-get-started/install-cli#install-terraform).
- - [Install the Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) to work with both Terraform and Azure commands.
+ - [Terraform is installed on your local machine](https://developer.hashicorp.com/terraform/tutorials/azure-get-started/install-cli#install-terraform).
+ - [Install the Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) to work with both Terraform and Azure commands.
## Where to start?
@@ -125,6 +124,7 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
## Overview
> Using Cosmos DB provides you with a flexible, scalable, and globally distributed database solution that can handle both structured and semi-structured data efficiently.
+>
> - `Azure Blob Storage`: Store the PDF invoices.
> - `Azure Functions`: Trigger on new PDF uploads, extract data, and process it.
> - `Azure SQL Database or Cosmos DB`: Store the extracted data for querying and analytics.
@@ -211,7 +211,7 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
## Step 3: Set Up Azure Cosmos DB
-> `Azure Cosmos DB` is a globally distributed,` multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.`
+> `Azure Cosmos DB` is a globally distributed,`multi-model database service provided by Microsoft Azure`. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a `NoSQL database, meaning it can handle unstructured, semi-structured, and structured data types`. `It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.`
> An `Azure Cosmos DB container` is a `logical unit` within a Cosmos DB database where data is stored. `Containers are schema-agnostic, meaning they can store items with different structures. Each container is automatically partitioned to scale out across multiple servers, providing virtually unlimited throughput and storage`. Containers are the primary scalability unit in Cosmos DB, and they use a partition key to distribute data efficiently across partitions.
1. **Create a Cosmos DB Account**:
@@ -320,7 +320,6 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
-
3. **Get Cosmos DB Account ID**: Run this command to get the ID of your Cosmos DB account. Record the value of the `id` property as it is required for the next step.
```powershell
@@ -372,9 +371,9 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
- Under `Settings`, go to `Environment variables`. And `+ Add` the following variables:
- - `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
- - `COSMOS_DB_KEY`: Your Cosmos DB account key.
- - `contosostorageaidemo_STORAGE`: Your Storage Account connection string.
+- `COSMOS_DB_ENDPOINT`: Your Cosmos DB account endpoint.
+- `COSMOS_DB_KEY`: Your Cosmos DB account key.
+- `contosostorageaidemo_STORAGE`: Your Storage Account connection string.
@@ -382,7 +381,7 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
- - Click on `Apply` to save your configuration.
+- Click on `Apply` to save your configuration.
### Develop the Function
@@ -448,9 +447,9 @@ This is an introductory workshop on Microsoft Fabric. Please follow as described
> 3. **Data Extraction**: The extracted text is processed to extract invoice data. The `generate_id` function generates a unique ID for each invoice.
> 4. **Data Storage**: The processed invoice data is saved to Azure Cosmos DB in the `ContosoAIDemo` database and `Invoices` container.
- > `pdfminer.six` is an open-source framework. It is a community-maintained fork of the original PDFMiner,` designed for extracting and analyzing text data from PDF documents`. The framework is built in a modular way, allowing each component to be easily replaced or extended for various purpose
+ > `pdfminer.six` is an open-source framework. It is a community-maintained fork of the original PDFMiner,`designed for extracting and analyzing text data from PDF documents`. The framework is built in a modular way, allowing each component to be easily replaced or extended for various purpose
- - Update the `function_app.py`:
+- Update the `function_app.py`:
| Template Blob Trigger | Function Code updated |
| --- | --- |
@@ -595,6 +594,7 @@ azure-functions
pdfminer.six
azure-cosmos==4.3.0
```
+
- Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
- Click on the `Azure` icon.
- Under `workspace`, click on the `Function App` icon.
diff --git a/terraform-infrastructure/outputs.tf b/terraform-infrastructure/outputs.tf
index d96564f..ce5bffd 100644
--- a/terraform-infrastructure/outputs.tf
+++ b/terraform-infrastructure/outputs.tf
@@ -43,8 +43,7 @@ output "key_vault_name" {
value = azurerm_key_vault.keyvault.name
}
-
output "cosmosdb_account_name" {
description = "The name of the CosmosDB account."
value = azurerm_cosmosdb_account.cosmosdb.name
-}
\ No newline at end of file
+}
diff --git a/terraform-infrastructure/provider.tf b/terraform-infrastructure/provider.tf
index 2719636..71333b4 100644
--- a/terraform-infrastructure/provider.tf
+++ b/terraform-infrastructure/provider.tf
@@ -22,4 +22,4 @@ provider "azurerm" {
}
subscription_id = var.subscription_id # Use the subscription ID variable
-}
\ No newline at end of file
+}
diff --git a/terraform-infrastructure/variables.tf b/terraform-infrastructure/variables.tf
index 8e7eea4..7bf4738 100644
--- a/terraform-infrastructure/variables.tf
+++ b/terraform-infrastructure/variables.tf
@@ -13,7 +13,6 @@ variable "location" {
type = string
}
-
variable "storage_account_name" {
description = "The name of the storage account"
type = string