If needed create a dedicated VPC for the cluster
The VPC must have at least 2 private subnets for cluster placement (Amazon EKS VPC and subnet requirements and considerations)
Take note of the VPCid and SubnetIds
If needed create a dedicated keypair for ssh access to the cluster nodes
Create(https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html) the cluster IAM role:
arn:aws:iam::828879644785:role/eksClusterRole
aws --profile mnemonica eks create-cluster --region eu-west-1 --name prod --kubernetes-version 1.28 --role-arn arn:aws:iam::828879644785:role/EKSClusterRole --resources-vpc-config subnetIds=subnet-0362bfc23186dee00,subnet-027b808f39e226fca
aws --profile mnemonica eks update-cluster-config --region eu-west-1 --name prod --resources-vpc-config endpointPublicAccess=true,publicAccessCidrs="79.62.239.61/32,95.243.137.141/32,52.215.3.31/32",endpointPrivateAccess=true
Enable the following addons from the AWS console:
Add the new cluster to kubectl config:
aws --profile mnemonica eks update-kubeconfig --region eu-west-1 --name prod
kubectl config get-clusters
kubectl config use-context arn:aws:eks:eu-west-1:828879644785:cluster/prod
$> cat >node-role-trust-relationship.json <$ aws --profile mnemonica iam create-role \ ``` --role-name Dev-AmazonEKSNodeRole \ --assume-role-policy-document file://"node-role-trust-relationship.json" ```$ aws --profile mnemonica iam attach-role-policy \ ``` --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \ --role-name Dev-AmazonEKSNodeRole ```$ aws --profile mnemonica iam attach-role-policy \ ``` --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \ --role-name Dev-AmazonEKSNodeRole ```$ aws --profile mnemonica iam attach-role-policy \ ``` --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \ --role-name Dev-AmazonEKSNodeRole ```¶ General purpose Nodegroup
k8s.io/cluster-autoscaler/enabled
k8s.io/cluster-autoscaler/<cluster name>
(Optional) Allow remote (ssh) access from selected security groups.
:First create a security group to allow access from your desired source IP to port 22. Apply this security group to an EC2 instance (bastion host). Access to the EKS nodes on port 22 will be allowed from this bastion host. Explanation here: https://docs.aws.amazon.com/vpc/latest/userguide/security-group-rules.html#security-group-referencing).
:==> It is good practice to create different security groups for different environments (dev, test, prod)*
capability: gpu
:(for pod nodeSelector)
k8s.amazonaws.com/accelerator: nvidia-tesla-t4
:(for autoscaler scale in functionality)
*Add the following K8s taints to the nodegroup:
nvidia.com/gpu: NoSchedule
*Tag the nodegroup at minimum with the following tags to enable autoscaling:
k8s.io/cluster-autoscaler/enabled
k8s.io/cluster-autoscaler/<cluster name>
*Tag the nodegroup with the following tags to configure autoscaling:
k8s.io/cluster-autoscaler/node-template/autoscaling-options/scaledownunneededtime: 2m0s
*Tag the nodegroup with the following tags to allow scaling out from 0 nodes (replicates K8s labels and taints):
k8s.io/cluster-autoscaler/node-template/label/k8s.amazonaws.com/accelerator: nvidia-tesla-t4
k8s.io/cluster-autoscaler/node-template/taint/nvidia.com/gpu: true:NoSchedule
k8s.io/cluster-autoscaler/node-template/label/capability: gpu
Create a nodegroup for nodes that should be excluded from autoscaling:
asExcluded: true (for pod nodeSelector)
*Add the following K8s taints:
asExcluded: NoSchedule
cat > s3-juicefs-eks-mnemonica-prod-iam-policy.json << EOF
{
```
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "*"
},
{
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::juicefs-eks-dev/*",
"arn:aws:s3:::juicefs-eks-dev"
]
}
]
```
} EOF
aws --profile mnemonica iam create-policy --policy-name s3-juicefs-eks-mnemonica-prod-readwrite --policy-document file://s3-juicefs-eks-mnemonica-prod-iam-policy.json
aws --profile mnemonica iam create-user --user-name juicefs-eks-mnemonica-prod
aws --profile mnemonica iam create-access-key --user-name juicefs-eks-mnemonica-prod
Take note of the output
{
```
"AccessKey": {
"UserName": "juicefs-eks-mnemonica-prod",
"AccessKeyId": "xxx",
"Status": "Active",
"SecretAccessKey": "xxx",
"CreateDate": "2023-10-04T15:06:52+00:00"
}
```
}
cat >juicefs-secret.yaml <kubectl apply -f juicefs-secret.yaml
helm repo add juicefs https://juicedata.github.io/charts/ helm repo update helm fetch --untar juicefs/juicefs-csi-driver cd juicefs-csi-driver
Installation configurations is included in values.yaml, review this file and modify to your needs.
node:
```
tolerations:
- key: nvidia.com/gpu
operator: Exists
- key: asExcluded
operator: Exists
```
If installing on ARM nodes, the driver has to be installed in sidecar mode and the sidecar images have to support the ARM architecture:
sidecars: ``` livenessProbeImage: repository: registry.k8s.io/sig-storage/livenessprobe tag: "v2.6.0" csiProvisionerImage: repository: registry.k8s.io/sig-storage/csi-provisioner tag: "v2.2.2" nodeDriverRegistrarImage: repository: registry.k8s.io/sig-storage/csi-node-driver-registrar tag: "v2.5.0" csiResizerImage: repository: registry.k8s.io/sig-storage/csi-resizer tag: "v1.8.0" ```
helm install juicefs-csi-driver juicefs/juicefs-csi-driver -n kube-system -f ./custom-values.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
```
name: juicefs-pv
labels:
juicefs-name: ten-pb-fs
```
spec:
```
# For now, JuiceFS CSI Driver doesn't support setting storage capacity for static PV. Fill in any valid string is fine.
capacity:
storage: 10Pi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
# A CSIDriver named csi.juicefs.com is created during installation
driver: csi.juicefs.com
# volumeHandle needs to be unique within the cluster, simply using the PV name is recommended
volumeHandle: juicefs-pv
fsType: juicefs
# Reference the volume credentials (Secret) created in previous step
# If you need to use different credentials, or even use different JuiceFS volumes, you'll need to create different volume credentials
nodePublishSecretRef:
name: juicefs-secret
namespace: dev
```
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
```
name: juicefs-pvc
namespace: dev
```
spec:
```
accessModes:
- ReadWriteMany
volumeMode: Filesystem
# Must use an empty string as storageClassName
# Meaning that this PV will not use any StorageClass, instead will use the PV specified by selector
storageClassName: ""
# For now, JuiceFS CSI Driver doesn't support setting storage capacity for static PV. Fill in any valid string that's lower than the PV capacity.
resources:
requests:
storage: 10Pi
selector:
matchLabels:
juicefs-name: ten-pb-fs
```
Create an IAM policy named AWSLoadBalancerControllerIAMPolicy. Take note of the policy ARN that's returned.
==> It is good practice further scoping down this configuration based on the VPC ID or cluster name resource tag.
aws --profile mnemonica iam create-policy \ ``` --policy-name AWSLoadBalancerControllerIAMPolicy \ --policy-document file://iam-policy.json ```
oidc_id=$(aws --profile mnemonica eks describe-cluster --name prod --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)aws --profile mnemonica iam list-open-id-connect-providers | grep $oidc_id | cut -d "/" -f4eksctl --profile mnemonica utils associate-iam-oidc-provider --cluster $cluster_name --approveDev-load-balancer-role-trust-policy.json file. Replace 111122223333 with your account ID. Replace region-code with the AWS Region that your cluster is in. Replace EXAMPLED539D4633E53DE1B71EXAMPLE with the output returned in the previous step.cat >Prod-load-balancer-role-trust-policy.json <Create the IAM role
:==> It is a good practice to create different roles for different environments (dev, test, prod)*aws --profile mnemonica iam create-role \ ``` --role-name AmazonEKSLoadBalancerControllerRole-prod \ --assume-role-policy-document file://"Prod-load-balancer-role-trust-policy.json" ```*Attach the required Amazon EKS managed IAM policy to the IAM role. Replace
111122223333with your account ID.aws --profile mnemonica iam attach-role-policy \ ``` --policy-arn arn:aws:iam::111122223333:policy/AWSLoadBalancerControllerIAMPolicy \ --role-name AmazonEKSLoadBalancerControllerRole-prod ```Use the following template to create the
aws-load-balancer-controller-service-account.yamlfile. Replace111122223333with your account ID.
:==> It is a good practice to create different service accounts for different environments (dev, test, prod)*cat >prod-aws-load-balancer-controller-service-account.yaml <*Create the Kubernetes service account on your cluster.
kubectl apply -f prod-aws-load-balancer-controller-service-account.yaml
helm repo add eks https://aws.github.io/eks-chartshelm install aws-load-balancer-controller eks/aws-load-balancer-controller \ ``` -n kube-system \ --set clusterName=prod \ --set serviceAccount.create=false \ --set serviceAccount.name=prod-aws-load-balancer-controller ```
aws --profile mnemonica elbv2 create-target-group --name eks-dev-frontend --protocol HTTP --port 30001 --target-type instance --vpc-id vpc-0a897214f58117ddd
aws --profile mnemonica elbv2 create-target-group --name eks-dev-tools --protocol HTTP --port 30002 --target-type instance --vpc-id vpc-0a897214f58117ddd
GroupId returned by the command.aws --profile mnemonica ec2 create-security-group --group-name allow-all-http_s --description "allow HTTP/HTTPS traffic from internet" --vpc-id vpc-0a897214f58117ddd
aws --profile mnemonica ec2 authorize-security-group-ingress --group-id sg-0618f1c65238a20f6 --protocol tcp --port 80 --cidr 0.0.0.0/0
aws --profile mnemonica ec2 authorize-security-group-ingress --group-id sg-0618f1c65238a20f6 --protocol tcp --port 443 --cidr 0.0.0.0/0
aws --profile mnemonica elbv2 create-load-balancer --name eks-dev --subnets subnet-083540f377cf1a0a5 subnet-0f727df918c8957b3 --security-groups sg-0618f1c65238a20f6 --ip-address-type ipv4
aws --profile mnemonica elbv2 create-listener --load-balancer-arn arn:aws:elasticloadbalancing:eu-west-1:828879644785:loadbalancer/app/eks-dev/08cd7b14d8b5595c --port 443 --protocol HTTPS --certificates CertificateArn=arn:aws:acm:eu-west-1:828879644785:certificate/c7448d05-6ca8-493d-9eb3-1fcd6fa6cf50 --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:eu-west-1:828879644785:targetgroup/eks-dev-frontend/487231003336b9ca
cat>frontend-tgb.yaml <kubectl apply -f frontend-tgb.yamlhttps://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3099
¶ GPU workers
In order to use GPU nodes it is necessary to apply the Nvidia device plugin for Kubernetes as a DaemonSet to the cluster:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/nvidia-device-plugin.ymlreference(https://github.com/NVIDIA/k8s-device-plugin/releases)
It is important that pods on the GPU nodes have the correct drivers installed
The date of creation of the Nodegroup determines the "AMI release version" (latest). This can later be upgrade by creating a new nodegroup.
To determine the driver version installed on the AMI:
aws --profile mnemonica ec2 describe-images --filters "Name=name,Values=amazon-eks-gpu-node-1.28-v20231002"
aws --profile mnemonica ec2 run-instances --image-id ami-0d09ad178e7e780d1 \ ``` --count 1 \ --instance-type t2.micro \ --key-name eks-dev-01 \ --security-group-ids sg-0c3ea3ee68f96c291 \ --subnet-id subnet-54f5491c ```
ssh -i eks-dev-01.pem ec2-user@3.248.227.2
yum list installed | grep nvidia-driver
[...](...)
nvidia-driver-latest-dkms.x86_64 3:535.54.03-1.el7 @amzn2-nvidia
nvidia-driver-latest-dkms-NVML.x86_64 3:535.54.03-1.el7 @amzn2-nvidia
nvidia-driver-latest-dkms-NvFBCOpenGL.x86_64
```
3:535.54.03-1.el7 @amzn2-nvidia
```
nvidia-driver-latest-dkms-cuda.x86_64 3:535.54.03-1.el7 @amzn2-nvidia
nvidia-driver-latest-dkms-cuda-libs.x86_64
```
3:535.54.03-1.el7 @amzn2-nvidia
```
nvidia-driver-latest-dkms-devel.x86_64 3:535.54.03-1.el7 @amzn2-nvidia
nvidia-driver-latest-dkms-libs.x86_64 3:535.54.03-1.el7 @amzn2-nvidia
[...](...)
In this case the NVENC runtime libraries version to install in the container image is 535.54.03
libnvidia-encode libnvidia-decode libnvidia-compute
kubectl create secret docker-registry regcred --docker-server= --docker-username= --docker-password= --docker-email=
# Checking the credentials
kubectl get secret regcred --output=yaml
kubectl get secret regcred --output="jsonpath={.data.\.dockerconfigjson}" | base64 --decode
helm fetch --untar 1password/connect
custom-values.yaml file:connect:
```
serviceType: ClusterIP
nodeSelector:
"asExcluded": "true"
tolerations:
- key: asExcluded
operator: Exists
```
helm install connect 1password/connect --set-file connect.credentials=./1password-credentials.json -f ./connect/custom-values.yaml
helm fetch --untar 1password/secrets-injector
templates/deployment.yaml to add label and selectors for Pod to Node assignment (https://github.com/1Password/connect-helm-charts/pull/175)apiVersion: apps/v1
kind: Deployment
metadata:
```
name: {{ .Values.injector.applicationName }}
namespace: {{ .Release.Namespace }}
labels:
app: {{ .Values.injector.applicationName }}
{{- with .Values.injector.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- with .Values.injector.annotations }}
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "1"
{{- toYaml . | nindent 4 }}
{{- end }}
```
spec:
```
selector:
matchLabels:
app: {{ .Values.injector.applicationName }}
template:
metadata:
labels:
app: {{ .Values.injector.applicationName }}
{{- with .Values.injector.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.injector.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
```
{{- with .Values.injector.nodeSelector }}
```
nodeSelector:
```
{{ toYaml . | indent 8 }}
{{- end }}
```
tolerations:
```
{{ toYaml .Values.injector.tolerations | indent 8 }}
```
serviceAccountName: {{ .Values.injector.applicationName }}
containers:
- name: {{ .Values.injector.applicationName }}
image: {{ .Values.injector.imageRepository }}:{{ tpl .Values.injector.version . }}
imagePullPolicy: {{ .Values.injector.imagePullPolicy }}
args:
- -service-name={{ .Values.injector.applicationName }}
- -alsologtostderr
- -v=4
- 2>&1
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
lifecycle:
preStop:
exec:
command: [ "/bin/sh", "-c", "/prestop.sh" ]
```
injector:
```
version: 1.0.2
nodeSelector:
"asExcluded": "true"
tolerations:
- key: asExcluded
operator: Exists
```
helm install secrets-injector ./secrets-injector -f ./secrets-injector/custom-values.yaml
kubectl label namespaces default secrets-injection=enabled