Skip to content

Improve Resiliency of Batch Operations in TableStorageScaleMetricsRepository #11492

@kshyju

Description

@kshyju

What problem would the feature you're requesting solve? Please describe.

The TableStorageScaleMetricsRepository is encountering transient errors when submitting batches to Azure Tables.

This write operation runs in the background and does not impact function invocations or the health of the host instance, but improving resiliency will reduce noise and improve reliability of metrics collection.

Describe the solution you'd like

1 Add or extend fault-tolerant retry logic for write and read operations (some methods already have retry logic, but not all).

Additional context

FunctionsLogs
| where PreciseTimeStamp > ago(7d)
| where Level == long(2)
| where Source == "Microsoft.Azure.WebJobs.Script.WebHost.TableStorageScaleMetricsRepository"
| where Summary startswith "An unhandled storage exception occurred when reading/writing scale metrics: The server is busy"
| project PreciseTimeStamp, Summary, Level, Source, EventName, RoleInstance, AppName
| summarize ErrorCount = count() by AppName
| order by ErrorCount desc
| take 100

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions