Skip to content

Commit 7d2cc30

Browse files
authored
[SchemaRegistry] Avro perf tests (Azure#23582)
fixes: Azure#22670 fixes: Azure#22671 fixes: Azure#20831
1 parent c6e814c commit 7d2cc30

13 files changed

+264
-489
lines changed

.vscode/cspell.json

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,6 @@
8282
"sdk/metricsadvisor/azure-ai-metricsadvisor/**",
8383
"sdk/purview/azure-purview-catalog/**",
8484
"sdk/remoterendering/azure-mixedreality-remoterendering/**",
85-
"sdk/schemaregistry/ci.yml",
86-
"sdk/schemaregistry/azure-schemaregistry/**",
87-
"sdk/schemaregistry/azure-schemaregistry-avroencoder/**",
8885
"sdk/servicefabric/azure-servicefabric/**",
8986
"sdk/search/azure-search-documents/**",
9087
"sdk/storage/azure-storage-blob-changefeed/**",
@@ -119,6 +116,7 @@
119116
"amqp",
120117
"apim",
121118
"asyncio",
119+
"avroencoder",
122120
"azcmagent",
123121
"azsdk",
124122
"azurecr",
@@ -160,6 +158,7 @@
160158
"fstat",
161159
"gbps",
162160
"GCCH",
161+
"getsizeof",
163162
"graphrbac",
164163
"gmtime",
165164
"guids",
@@ -177,6 +176,7 @@
177176
"ipconfigurations",
178177
"iqmp",
179178
"iscoroutine",
179+
"iscoroutinefunction",
180180
"iscsi",
181181
"ivar",
182182
"jwks",
@@ -390,6 +390,21 @@
390390
"ierr",
391391
"mymodel"
392392
]
393+
},
394+
{
395+
"filename": "sdk/schemaregistry/**/tests/**/*.py",
396+
"words": [
397+
"favo",
398+
"randb"
399+
]
400+
},
401+
{
402+
"filename": "sdk/schemaregistry/azure-schemaregistry-avroencoder/**/*.py",
403+
"words": [
404+
"currsize",
405+
"unpartial",
406+
"alru"
407+
]
393408
}
394409
],
395410
"allowCompoundWords": true
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# SchemaRegistry AvroEncoder Performance Tests
2+
3+
In order to run the performance tests, the `azure-devtools` package must be installed. This is done as part of the `dev_requirements`.
4+
Start by creating a new virtual environment for your perf tests. This will need to be a Python 3 environment, preferably >=3.7.
5+
6+
### Setup for test resources
7+
8+
These tests will run against a pre-configured SchemaRegistry. The following environment variable will need to be set for the tests to access the live resources:
9+
```
10+
SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE=<the connection string of a Schema Registry.>
11+
SCHEMAREGISTRY_GROUP=<a schema group in a Schema Registry.>
12+
```
13+
14+
### Setup for perf test runs
15+
16+
```cmd
17+
(env) ~/azure-schemaregistry> pip install -r dev_requirements.txt
18+
(env) ~/azure-schemaregistry> pip install -e .
19+
```
20+
21+
## Test commands
22+
23+
When `azure-devtools` is installed, you will have access to the `perfstress` command line tool, which will scan the current module for runable perf tests. Only a specific test can be run at a time (i.e. there is no "run all" feature).
24+
25+
```cmd
26+
(env) ~/azure-schemaregistry-avroencoder> cd tests
27+
(env) ~/azure-schemaregistry-avroencoder/tests> perfstress
28+
```
29+
Using the `perfstress` command alone will list the available perf tests found.
30+
31+
### Common perf command line options
32+
These options are available for all perf tests:
33+
- `--duration=10` Number of seconds to run as many operations (the "run" function) as possible. Default is 10.
34+
- `--iterations=1` Number of test iterations to run. Default is 1.
35+
- `--parallel=1` Number of tests to run in parallel. Default is 1.
36+
- `--warm-up=5` Number of seconds to spend warming up the connection before measuring begins. Default is 5.
37+
- `--sync` Whether to run the tests in sync or async. Default is False (async).
38+
- `--no-cleanup` Whether to keep newly created resources after test run. Default is False (resources will be deleted).
39+
40+
### Schema Registry Avro command line options
41+
The options are available for all SR perf tests:
42+
- `--schema-size=150` Number of bytes each schema contains, rounded down to nearest multiple of 50. Default is 150.
43+
- `--num-values` Number of values to encode/decode with given schema. Default is 1.
44+
45+
### Tests
46+
The tests currently written for the SDK:
47+
- `EncodeContentTest` Encodes `num-values` number of content with a single schema of size `schema-size` per run. First encode call should take longer than rest, as schema ID is cached after first call.
48+
- `DecodeContentTest` Decodes `num-values` number of encoded content with schema of size `schema-size` per run. First decode call should take longer than rest, as schema is cached after first call.
49+
50+
## Example command
51+
```cmd
52+
(env) ~/azure-schemaregistry-avroencoder/tests> perfstress EncodeContentTest --parallel=2 --duration=10 --schema-size=500 --num-values=2
53+
```
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
5+
6+
import json
7+
import math
8+
import random, string
9+
import sys
10+
11+
from azure_devtools.perfstress_tests import PerfStressTest
12+
from azure.identity import DefaultAzureCredential
13+
from azure.identity.aio import DefaultAzureCredential as AsyncDefaultAzureCredential
14+
from azure.schemaregistry import SchemaRegistryClient
15+
from azure.schemaregistry.encoder.avroencoder import AvroEncoder
16+
from azure.schemaregistry.aio import SchemaRegistryClient as AsyncSchemaRegistryClient
17+
from azure.schemaregistry.encoder.avroencoder.aio import AvroEncoder as AsyncAvroEncoder
18+
19+
20+
class _SchemaRegistryAvroTest(PerfStressTest):
21+
def __init__(self, arguments):
22+
super().__init__(arguments)
23+
24+
self.fully_qualified_namespace = self.get_from_env(
25+
"SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE"
26+
)
27+
self.group_name = self.get_from_env("SCHEMAREGISTRY_GROUP")
28+
self.definition, num_fields = self._create_schema_definition()
29+
self.content = self._create_content(num_fields)
30+
31+
def _create_schema_definition(self):
32+
schema_size = self.args.schema_size
33+
34+
# random string to avoid conflicting requests
35+
letters = string.ascii_lowercase
36+
randletters = ''.join(random.choice(letters) for i in range(10))
37+
38+
fields = []
39+
schema = {
40+
"type": "record",
41+
"name": f"example.User{randletters}",
42+
"fields": fields,
43+
}
44+
45+
# 100 bytes
46+
schema_no_fields_size = sys.getsizeof(json.dumps(schema, separators=(",", ":")))
47+
fields.append({"name": "favor_number00000", "type": ["int", "null"]})
48+
# each additional field is 50 bytes
49+
schema_one_field_size = sys.getsizeof(json.dumps(schema, separators=(",", ":")))
50+
field_size = schema_one_field_size - schema_no_fields_size
51+
52+
# calculate number of fields to add to get args.schema_size rounded down to nearest 50 multiple
53+
num_fields = math.floor((schema_size - schema_no_fields_size) / field_size)
54+
55+
for i in range(1, num_fields):
56+
num_idx = f"{i:05d}"
57+
fields.append(
58+
{"name": f"favo_number{num_idx}", "type": ["int", "null"]},
59+
)
60+
definition = json.dumps(schema, separators=(",", ":"))
61+
return definition, num_fields
62+
63+
def _create_content(self, num_fields):
64+
content = {"favor_number00000": 0}
65+
for i in range(1, num_fields):
66+
num_idx = f"{i:05d}"
67+
content[f"favo_number{num_idx}"] = i
68+
return content
69+
70+
@staticmethod
71+
def add_arguments(parser):
72+
super(_SchemaRegistryAvroTest, _SchemaRegistryAvroTest).add_arguments(parser)
73+
parser.add_argument(
74+
"--schema-size",
75+
nargs="?",
76+
type=int,
77+
help="Size of a single schema. Max 1000000 bytes. Defaults to 150 bytes",
78+
default=150,
79+
)
80+
parser.add_argument(
81+
"--num-values",
82+
nargs="?",
83+
type=int,
84+
help="Number of values to encode/decode with given schema. Default is 1.",
85+
default=1,
86+
)
87+
88+
89+
class _EncodeTest(_SchemaRegistryAvroTest):
90+
def __init__(self, arguments):
91+
super().__init__(arguments)
92+
self.sync_credential = DefaultAzureCredential()
93+
self.sync_client = SchemaRegistryClient(
94+
fully_qualified_namespace=self.fully_qualified_namespace,
95+
credential=self.sync_credential,
96+
)
97+
self.sync_encoder = AvroEncoder(
98+
client=self.sync_client, group_name=self.group_name, auto_register_schemas=True
99+
)
100+
self.async_credential = AsyncDefaultAzureCredential()
101+
self.async_client = AsyncSchemaRegistryClient(
102+
fully_qualified_namespace=self.fully_qualified_namespace,
103+
credential=self.async_credential,
104+
)
105+
self.async_encoder = AsyncAvroEncoder(
106+
client=self.async_client, group_name=self.group_name, auto_register_schemas=True
107+
)
108+
109+
async def global_setup(self):
110+
await super().global_setup()
111+
112+
async def close(self):
113+
self.sync_client.close()
114+
self.sync_credential.close()
115+
self.sync_encoder.close()
116+
await self.async_client.close()
117+
await self.async_credential.close()
118+
await self.async_encoder.close()
119+
await super().close()
120+
121+
122+
class _DecodeTest(_SchemaRegistryAvroTest):
123+
def __init__(self, arguments):
124+
super().__init__(arguments)
125+
self.sync_credential = DefaultAzureCredential()
126+
self.sync_client = SchemaRegistryClient(
127+
fully_qualified_namespace=self.fully_qualified_namespace,
128+
credential=self.sync_credential,
129+
)
130+
self.sync_encoder = AvroEncoder(
131+
client=self.sync_client, group_name=self.group_name, auto_register_schemas=True
132+
)
133+
self.async_credential = AsyncDefaultAzureCredential()
134+
self.async_client = AsyncSchemaRegistryClient(
135+
fully_qualified_namespace=self.fully_qualified_namespace,
136+
credential=self.async_credential,
137+
)
138+
self.async_encoder = AsyncAvroEncoder(
139+
client=self.async_client, group_name=self.group_name, auto_register_schemas=True
140+
)
141+
self.encoded_content = self._encode_content()
142+
143+
def _encode_content(self):
144+
with self.sync_encoder as encoder:
145+
return encoder.encode(self.content, schema=self.definition)
146+
147+
async def global_setup(self):
148+
await super().global_setup()
149+
150+
async def close(self):
151+
self.sync_client.close()
152+
self.sync_credential.close()
153+
self.sync_encoder.close()
154+
await self.async_client.close()
155+
await self.async_credential.close()
156+
await self.async_encoder.close()
157+
await super().close()
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
5+
6+
from ._test_base import _DecodeTest
7+
8+
9+
class DecodeContentTest(_DecodeTest):
10+
def run_sync(self):
11+
for _ in range(self.args.num_values):
12+
self.sync_encoder.decode(self.encoded_content)
13+
14+
async def run_async(self):
15+
for _ in range(self.args.num_values):
16+
await self.async_encoder.decode(self.encoded_content)
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
5+
6+
from ._test_base import _EncodeTest
7+
8+
9+
class EncodeContentTest(_EncodeTest):
10+
def run_sync(self):
11+
for _ in range(self.args.num_values):
12+
self.sync_encoder.encode(self.content, schema=self.definition)
13+
14+
async def run_async(self):
15+
for _ in range(self.args.num_values):
16+
await self.async_encoder.encode(self.content, schema=self.definition)

sdk/schemaregistry/azure-schemaregistry/tests/async_tests/recordings/test_schema_registry_async.test_get_schema_errors.yaml

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)