You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-8Lines changed: 22 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,19 @@
1
1
# terraform-aws-recycle-eks
2
2
3
3
This module creates a terraform module to recycle EKS worker nodes. The high level functionalities are explained below,
4
-
- Use a lamdba to take an instance id as an input, to put it in standby state. Using autoscaling api to automatically add a new instance to the group while putting the old instance to standby state. The old instance will get into "Standby" state only when the new instance is in fully "Inservice" state
4
+
- Creates a step-function that will consist of 4 lambdas. This step function will handle the transfer of inputs across the lambda functions.
5
+
- The first lambda takes an instance id as an input, to put it in standby state. Using autoscaling api to automatically add a new instance to the group while putting the old instance to standby state. The old instance will get into "Standby" state only when the new instance is in fully "Inservice" state
5
6
- Taint this "Standby" node in EKS using K8S API in Lambda to prevent new pods from getting scheduled into this node
6
7
- Periodically use K8S API check for status of “stateful” pods on that node based on the label selector provided. Another Lambda will do that
7
-
- Once all stateful pods have completed on the node, use K8S API in another Lambda to drain the standby node
8
-
- Once the number of running pod reached 0, shut down that standby instance using AWS SDK.
9
-
- We are not termnating the node, only shutting it down, hust in case. In future releases, we will be start terminating the nodes
8
+
- Once all stateful pods have completed on the node, i.e number of running pod reached 0, shut down that standby instance using AWS SDK via lambda. We are not terminating the node, only shutting it down, just in case. In future releases, we will be start terminating the nodes
9
+
10
10
11
11
## TODO:
12
12
- Check for new node in service before proceeding to put the existing node in standby state. Right now we are putting a sleep of 300 sec.
13
-
- Stop using anonymous role and find a way to map the role with a proper user
14
-
- get_bearer_token() function used in all lambda. Refactor the code to use as a Python module.
13
+
- Refactor the code to use as a common module for getting the access token.
15
14
- Better logging and exception handling
15
+
- Make use of namespace input while selecting the pods. Currently it checks for pods in all namespaces.
16
+
- Find a terraform way to edit configmap/aws-auth, this step is still manual to make this module work.
16
17
17
18
There are two main components:
18
19
@@ -22,10 +23,9 @@ There are two main components:
22
23
23
24
## Usage
24
25
25
-
**Set up all supported AWS / Datadog integrations**
0 commit comments