Skip to content

Commit 7628159

Browse files
committed
initial commit
1 parent 5d41707 commit 7628159

31 files changed

+1235
-6
lines changed

README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
1-
## My Project
1+
# Amazon OpenSearch Ingestion CDK Python project!
22

3-
TODO: Fill this README out!
3+
This repository contains a set of example projects for [Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ingestion.html)
44

5-
Be sure to:
5+
| Example | Architecture | Description |
6+
|---------|-------------|------|
7+
| [ingestion to opensearch domain](./opensearch) | ![osis-domain-pipeline](./opensearch/osis-domain-pipeline.svg) | data ingestion to an opensearch domain using OpenSearch Ingestion Pipelines |
8+
| [opensearch-serverless colleciton](./opensearch-serverless) | ![osis-collection-pipeline](./opensearch-serverless/osis-collection-pipeline.svg) | data ingestion to an opensearch serverless collection using OpenSearch Ingestion Pipelines |
69

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
10+
Enjoy!
11+
12+
## References
13+
14+
* [Amazon OpenSearch Ingestion Developer Guide](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ingestion.html)
15+
* [Tutorial: Ingesting data into a domain using Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-get-started.html)
16+
* [Tutorial: Ingesting data into a collection using Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-serverless-get-started.html)
17+
* [Data Prepper](https://opensearch.org/docs/latest/data-prepper/index/) - a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization.
918

1019
## Security
1120

@@ -14,4 +23,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
1423
## License
1524

1625
This library is licensed under the MIT-0 License. See the LICENSE file.
17-

opensearch-serverless/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
*.swp
2+
package-lock.json
3+
__pycache__
4+
.pytest_cache
5+
.venv
6+
*.egg-info
7+
8+
# CDK asset staging directory
9+
.cdk.staging
10+
cdk.out

opensearch-serverless/README.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
2+
# Ingesting data into a collection using Amazon OpenSearch Ingestion
3+
4+
![osis-collection-pipeline](./osis-collection-pipeline.svg)
5+
6+
This is an Amazon OpenSearch ingestion project for CDK development with Python.
7+
8+
This project builds on the following tutorial: [Ingesting data into a collection using Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-serverless-get-started.html).
9+
10+
This project shows you how to use Amazon OpenSearch Ingestion to configure a simple pipeline and ingest data into an Amazon OpenSearch Serverless collection.
11+
12+
The `cdk.json` file tells the CDK Toolkit how to execute your app.
13+
14+
This project is set up like a standard Python project. The initialization
15+
process also creates a virtualenv within this project, stored under the `.venv`
16+
directory. To create the virtualenv it assumes that there is a `python3`
17+
(or `python` for Windows) executable in your path with access to the `venv`
18+
package. If for any reason the automatic creation of the virtualenv fails,
19+
you can create the virtualenv manually.
20+
21+
To manually create a virtualenv on MacOS and Linux:
22+
23+
```
24+
$ python3 -m venv .venv
25+
```
26+
27+
After the init process completes and the virtualenv is created, you can use the following
28+
step to activate your virtualenv.
29+
30+
```
31+
$ source .venv/bin/activate
32+
```
33+
34+
If you are a Windows platform, you would activate the virtualenv like this:
35+
36+
```
37+
% .venv\Scripts\activate.bat
38+
```
39+
40+
Once the virtualenv is activated, you can install the required dependencies.
41+
42+
```
43+
(.venv) $ pip install -r requirements.txt
44+
```
45+
46+
At this point you can now synthesize the CloudFormation template for this code.
47+
48+
<pre>
49+
(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
50+
(.venv) $ export CDK_DEFAULT_REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
51+
(.venv) $ cdk synth -c iam_user_name=<i>your-iam-user-name</i> --all
52+
</pre>
53+
54+
:warning: Amazon OpenSearch Serverless requires mandatory IAM permission for access to resources.
55+
You are required to add these two IAM permissions for your OpenSearch Serverless **"aoss:APIAccessAll"** for Data Plane API access, and **"aoss:DashboardsAccessAll"** for Dashboards access. Failure to add the two new IAM permissions will result in 403 errors starting on May 10th, 2023
56+
57+
For a sample data-plane policy [here](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_id-based-policy-examples-data-plane.html):
58+
59+
- [Using OpenSearch Serverless in the console
60+
](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_serverless_id-based-policy-examples-console)
61+
- [Administering OpenSearch Serverless collections](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_id-based-policy-examples-collection-admin)
62+
- [Viewing OpenSearch Serverless collections](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_id-based-policy-examples-view-collections)
63+
- [Using data-plane policies](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_id-based-policy-examples-data-plane)
64+
65+
Use `cdk deploy` command to create the stack shown above.
66+
67+
<pre>
68+
(.venv) $ cdk deploy -c iam_user_name=<i>your-iam-user-name</i> --all
69+
</pre>
70+
71+
To add additional dependencies, for example other CDK libraries, just add
72+
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
73+
command.
74+
75+
## Clean Up
76+
77+
Delete the CloudFormation stack by running the below command.
78+
79+
<pre>
80+
(.venv) $ cdk destroy -c iam_user_name=<i>your-iam-user-name</i> --force --all
81+
</pre>
82+
83+
## Useful commands
84+
85+
* `cdk ls` list all stacks in the app
86+
* `cdk synth` emits the synthesized CloudFormation template
87+
* `cdk deploy` deploy this stack to your default AWS account/region
88+
* `cdk diff` compare deployed stack with current state
89+
* `cdk docs` open CDK documentation
90+
91+
Enjoy!
92+
93+
## Run Tests
94+
95+
#### Step 1: Ingest some sample data
96+
97+
First, get the ingestion URL from the **Pipeline settings** page:
98+
99+
![osis-pipeline-settings](./assets/osis-pipeline-settings.png)
100+
101+
Then, ingest some sample data. The following sample request uses [awscurl](https://github.com/okigan/awscurl) to send a single log file to the `my_logs` index:
102+
103+
<pre>
104+
$ awscurl --service osis --region <i>us-east-1</i> \
105+
-X POST \
106+
-H "Content-Type: application/json" \
107+
-d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","req
108+
uest":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' \
109+
https://<i>{pipeline-endpoint}.us-east-1</i>.osis.amazonaws.com/log-pipeline/test_ingestion_path
110+
</pre>
111+
112+
You should see a `200 OK` response.
113+
114+
#### Step 2: Query the sample data
115+
116+
Now, query the `my_logs` index to ensure that the log entry was successfully ingested:
117+
118+
<pre>
119+
$ awscurl --service aoss --region <i>us-east-1</i> \
120+
-X GET \
121+
https://<i>{collection-id}.us-east-1</i>.aoss.amazonaws.com/my_logs/_search | jq -r '.'
122+
</pre>
123+
124+
**Sample response:**
125+
126+
<pre>
127+
{
128+
"took": 367,
129+
"timed_out": false,
130+
"_shards": {
131+
"total": 0,
132+
"successful": 0,
133+
"skipped": 0,
134+
"failed": 0
135+
},
136+
"hits": {
137+
"total": {
138+
"value": 1,
139+
"relation": "eq"
140+
},
141+
"max_score": 1,
142+
"hits": [
143+
{
144+
"_index": "my_logs",
145+
"_id": "1%3A0%3ALkidTIgBbiu_ytx_zXnH",
146+
"_score": 1,
147+
"_source": {
148+
"time": "2014-08-11T11:40:13+00:00",
149+
"remote_addr": "122.226.223.69",
150+
"status": "404",
151+
"request": "GET http://www.k2proxy.com//hello.html HTTP/1.1",
152+
"http_user_agent": "Mozilla/4.0 (compatible; WOW64; SLCC2;)",
153+
"@timestamp": "2023-05-24T07:16:29.708Z"
154+
}
155+
}
156+
]
157+
}
158+
}
159+
</pre>
160+
161+
## References
162+
163+
* [Tutorial: Ingesting data into a collection using Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-serverless-get-started.html)
164+
* [Amazon OpenSearch Ingestion Developer Guide](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ingestion.html)
165+
* [Data Prepper](https://opensearch.org/docs/latest/data-prepper/index/) - a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization.
166+
* [Top strategies for high volume tracing with Amazon OpenSearch Ingestion (2023-04-27)](https://aws.amazon.com/blogs/big-data/top-strategies-for-high-volume-tracing-with-amazon-opensearch-ingestion/)
167+
* [Use cases for Amazon OpenSearch Ingestion
168+
](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/use-cases-overview.html) - some common use cases for Amazon OpenSearch Ingestion.
169+
* [Best practices for Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-best-practices.html)
170+
* [Identity and Access Management for Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security_iam_id-based-policy-examples-data-plane.html)
171+
* [Setting up roles and users in Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html)
172+
* [AWS Signature Version 4 Signing Examples](https://github.com/aws-samples/sigv4a-signing-examples)
173+
* [awscurl](https://github.com/okigan/awscurl) - curl-like tool with AWS Signature Version 4 request signing.
174+

opensearch-serverless/app.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env python3
2+
import os
3+
4+
from cdk_stacks import (
5+
OpsCollectionPipelineRoleStack,
6+
OpsServerlessTimeSeriesStack,
7+
OpsServerlessIngestionStack
8+
)
9+
10+
import aws_cdk as cdk
11+
12+
13+
AWS_ENV = cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'),
14+
region=os.getenv('CDK_DEFAULT_REGION'))
15+
16+
app = cdk.App()
17+
18+
collection_pipeline_role = OpsCollectionPipelineRoleStack(app, 'OpsCollectionPipelineRoleStack')
19+
20+
ops_serverless_ts_stack = OpsServerlessTimeSeriesStack(app, "OpsServerlessTSStack",
21+
collection_pipeline_role.iam_role.role_arn,
22+
env=AWS_ENV)
23+
ops_serverless_ts_stack.add_dependency(collection_pipeline_role)
24+
25+
ops_serverless_ingestion_stack = OpsServerlessIngestionStack(app, "OpsServerlessIngestionStack",
26+
collection_pipeline_role.iam_role.role_arn,
27+
ops_serverless_ts_stack.collection_endpoint,
28+
env=AWS_ENV)
29+
ops_serverless_ingestion_stack.add_dependency(ops_serverless_ts_stack)
30+
31+
app.synth()
226 KB
Loading

opensearch-serverless/cdk.json

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
{
2+
"app": "python3 app.py",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"requirements*.txt",
11+
"source.bat",
12+
"**/__init__.py",
13+
"python/__pycache__",
14+
"tests"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": [
21+
"aws",
22+
"aws-cn"
23+
],
24+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27+
"@aws-cdk/aws-iam:minimizePolicies": true,
28+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33+
"@aws-cdk/core:enablePartitionLiterals": true,
34+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
38+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
39+
"@aws-cdk/aws-route53-patters:useCertificate": true,
40+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
41+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
42+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
43+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
44+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
45+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
46+
"@aws-cdk/aws-redshift:columnId": true,
47+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
48+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
49+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true
50+
}
51+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .collection_pipeline_role import OpsCollectionPipelineRoleStack
2+
from .opensearch_serverless_ts import OpsServerlessTimeSeriesStack
3+
from .opensearch_serverless_ingestion import OpsServerlessIngestionStack
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
import aws_cdk as cdk
2+
3+
from aws_cdk import (
4+
Stack,
5+
aws_iam
6+
)
7+
from constructs import Construct
8+
9+
10+
class OpsCollectionPipelineRoleStack(Stack):
11+
12+
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
13+
super().__init__(scope, construct_id, **kwargs)
14+
15+
collection_pipeline_policy_doc = aws_iam.PolicyDocument()
16+
17+
collection_pipeline_policy_doc.add_statements(aws_iam.PolicyStatement(**{
18+
"effect": aws_iam.Effect.ALLOW,
19+
"resources": ["*"],
20+
"actions": [
21+
"aoss:BatchGetCollection"
22+
]
23+
}))
24+
25+
pipeline_role = aws_iam.Role(self, 'OpenSearchIngestionPipelineRole',
26+
role_name='OpenSearchCollectionPipelineRole',
27+
assumed_by=aws_iam.ServicePrincipal('osis-pipelines.amazonaws.com'),
28+
inline_policies={
29+
'collection-pipeline-policy': collection_pipeline_policy_doc
30+
}
31+
)
32+
self.iam_role = pipeline_role
33+
34+
cdk.CfnOutput(self, f'{self.stack_name}_Role', value=self.iam_role.role_name)
35+
cdk.CfnOutput(self, f'{self.stack_name}_RoleArn', value=self.iam_role.role_arn)

0 commit comments

Comments
 (0)