Design patterns to handle Amazon EMR on EKS workloads for Apache Spark



Amazon EMR on Amazon EKS allows you to submit Apache Spark jobs on demand on Amazon Elastic Kubernetes Service (Amazon EKS) with out provisioning clusters. With EMR on EKS, you possibly can consolidate analytical workloads together with your different Kubernetes-based purposes on the identical Amazon EKS cluster to enhance useful resource utilization and simplify infrastructure administration. Kubernetes makes use of namespaces to supply isolation between teams of assets inside a single Kubernetes cluster. Amazon EMR creates a digital cluster by registering Amazon EMR with a namespace on an EKS cluster. Amazon EMR can then run analytics workloads on that namespace.

In EMR on EKS, you possibly can submit your Spark jobs to Amazon EMR digital clusters utilizing the AWS Command Line Interface (AWS CLI), SDK, or Amazon EMR Studio. Amazon EMR requests the Kubernetes scheduler on Amazon EKS to schedule pods. For each job you run, EMR on EKS creates a container with an Amazon Linux 2 base picture, Apache Spark, and related dependencies. Every Spark job runs in a pod on Amazon EKS employee nodes. In case your Amazon EKS cluster has employee nodes in several Availability Zones, the Spark utility driver and executor pods can unfold throughout a number of Availability Zones. On this case, knowledge switch fees apply for cross-AZ communication and will increase knowledge processing latency. If you wish to scale back knowledge processing latency and keep away from cross-AZ knowledge switch prices, you need to configure Spark purposes to run solely inside a single Availability Zone.

On this publish, we share 4 design patterns to handle EMR on EKS workloads for Apache Spark. We then present the right way to use a pod template to schedule a job with EMR on EKS, and use Karpenter as our autoscaling instrument.

Sample 1: Handle Spark jobs by pod template

Clients usually consolidate a number of purposes on a shared Amazon EKS cluster to enhance utilization and save prices. Nevertheless, every utility might have totally different necessities. For instance, it’s possible you’ll wish to run performance-intensive workloads similar to machine studying mannequin coaching jobs on SSD-backed situations for higher efficiency, or fault-tolerant and versatile purposes on Amazon Elastic Compute Cloud (Amazon EC2) Spot Situations for decrease value. In EMR on EKS, there are a couple of methods to configure how your Spark job runs on Amazon EKS employee nodes. You possibly can make the most of the Spark configurations on Kubernetes with the EMR on EKS StartJobRun API, or you need to use Spark’s pod template characteristic. Pod templates are specs that decide the right way to run every pod in your EKS clusters. With pod templates, you’ve gotten extra flexibility and may use pod template information to outline Kubernetes pod configurations that Spark doesn’t help.

You need to use pod templates to attain the next advantages:

  • Scale back prices – You possibly can schedule Spark executor pods to run on EC2 Spot Situations whereas scheduling Spark driver pods to run on EC2 On-Demand Situations.
  • Enhance monitoring – You possibly can improve your Spark workload’s observability. For instance, you possibly can deploy a sidecar container by way of a pod template to your Spark job that may ahead logs to your centralized logging utility
  • Enhance useful resource utilization – You possibly can help a number of groups operating their Spark workloads on the identical shared Amazon EKS cluster

You possibly can implement these patterns utilizing pod templates and Kubernetes labels and selectors. Kubernetes labels are key-value pairs which can be connected to things, similar to Kubernetes employee nodes, to establish attributes which can be significant and related to customers. You possibly can then select the place Kubernetes schedules pods utilizing nodeSelector or Kubernetes affinity and anti-affinity in order that it may possibly solely run on particular employee nodes. nodeSelector is the only option to constrain pods to nodes with particular labels. Affinity and anti-affinity broaden the kinds of constraints you possibly can outline.

Autoscaling in Spark workload

Autoscaling is a operate that robotically scales your compute assets up or all the way down to modifications in demand. For Kubernetes auto scaling, Amazon EKS helps two auto scaling merchandise: the Kubernetes Cluster Autoscaler and the Karpenter open-source auto scaling undertaking. Kubernetes autoscaling ensures your cluster has sufficient nodes to schedule your pods with out losing assets. If some pods fail to schedule on present employee nodes because of inadequate assets, it will increase the dimensions of the cluster and provides extra nodes. It additionally makes an attempt to take away underutilized nodes when its pods can run elsewhere.

Sample 2: Activate Dynamic Useful resource Allocation (DRA) in Spark

Spark offers a mechanism known as Dynamic Useful resource Allocation (DRA), which dynamically adjusts the assets your utility occupies based mostly on the workload. With DRA, the Spark driver spawns the preliminary variety of executors after which scales up the quantity till the required most variety of executors is met to course of the pending duties. Idle executors are deleted when there aren’t any pending duties. It’s significantly helpful should you’re not sure what number of executors are wanted in your job processing.

You possibly can implement it in EMR on EKS by following the Dynamic Useful resource Allocation workshop.

Sample 3: Absolutely management cluster autoscaling by Cluster Autoscaler

Cluster Autoscaler makes use of the idea of node teams because the factor of capability management and scale. In AWS, node teams are applied by auto scaling teams. Cluster Autoscaler implements it by controlling the DesiredReplicas subject of your auto scaling teams.

To avoid wasting prices and enhance useful resource utilization, you need to use Cluster Autoscaler in your Amazon EKS cluster to robotically scale your Spark pods. The next are suggestions for autoscaling Spark jobs with Amazon EMR on EKS utilizing Cluster Autoscaler:

  • Create Availability Zone bounded auto scaling teams to verify Cluster Autoscaler solely provides employee nodes in the identical Availability Zone to keep away from cross-AZ knowledge switch fees and knowledge processing latency.
  • Create separate node teams for EC2 On-Demand and Spot Situations. By doing this, you possibly can add or shrink driver pods and executor pods independently.
  • In Cluster Autoscaler, every node in a node group must have equivalent scheduling properties. That features EC2 occasion sorts, which must be of comparable vCPU to reminiscence ratio to keep away from inconsistency and wastage of assets. To be taught extra about Cluster Autoscaler node teams greatest practices, seek advice from Configuring your Node Teams.
  • Adhere to Spot Occasion greatest practices and maximize diversification to take benefits of a number of Spot swimming pools. Create a number of node teams for Spark executor pods with totally different vCPU to reminiscence ratios. This significantly will increase the soundness and resiliency of your utility.
  • When you’ve gotten a number of node teams, use pod templates and Kubernetes labels and selectors to handle Spark pod deployment to particular Availability Zones and EC2 occasion sorts.

The next diagram illustrates Availability Zone bounded auto scaling teams.

AZ bounded CA ASG
As a number of node teams are created, Cluster Autoscaler has the idea of expanders, which give totally different methods for choosing which node group to scale. As of this writing, the next methods are supported: random, most-pods, least-waste, and precedence. With a number of node teams of EC2 On-Demand and Spot Situations, you need to use the precedence expander, which permits Cluster Autoscaler to pick out the node group that has the best precedence assigned by the person. For configuration particulars, seek advice from Precedence based mostly expander for Cluster Autoscaler.

Sample 4: Group-less autoscaling with Karpenter

Karpenter is an open-source, versatile, high-performance Kubernetes cluster auto scaler constructed with AWS. The general aim is similar of auto scaling Amazon EKS clusters to regulate un-schedulable pods; nevertheless, Karpenter takes a distinct strategy than Cluster Autoscaler, often called group-less provisioning. It observes the mixture useful resource requests of unscheduled pods and makes choices to launch minimal compute assets to suit the un-schedulable pods for environment friendly binpacking and lowering scheduling latency. It may additionally delete nodes to cut back infrastructure prices. Karpenter works instantly with the Amazon EC2 Fleet.

To configure Karpenter, you create provisioners that outline how Karpenter manages un-schedulable pods and expired nodes. It is best to make the most of the idea of layered constraints to handle scheduling constraints. To scale back EMR on EKS prices and enhance Amazon EKS cluster utilization, you need to use Karpenter with comparable constraints of Single-AZ, On-Demand Situations for Spark driver pods, and Spot Situations for executor pods with out creating a number of kinds of node teams. With its group-less strategy, Karpenter permits you to be extra versatile and diversify higher.

The next are suggestions for auto scaling EMR on EKS with Karpenter:

  • Configure Karpenter provisioners to launch nodes in a single Availability Zone to keep away from cross-AZ knowledge switch prices and scale back knowledge processing latency.
  • Create a provisioner for EC2 Spot Situations and EC2 On-Demand Situations. You possibly can scale back prices by scheduling Spark driver pods to run on EC2 On-Demand Situations and schedule Spark executor pods to run on EC2 Spot Situations.
  • Restrict the occasion sorts by offering an inventory of EC2 situations or let Karpenter select from all of the Spot swimming pools out there to it. This follows the Spot greatest practices of diversifying throughout a number of Spot swimming pools.
  • Use pod templates and Kubernetes labels and selectors to permit Karpenter to spin up right-sized nodes required for un-schedulable pods.

The next diagram illustrates how Karpenter works.

Karpenter How it Works

To summarize the design patterns we mentioned:

  1. Pod templates assist tailor your Spark workloads. You possibly can configure Spark pods in a single Availability Zone and make the most of EC2 Spot Situations for Spark executor pods, leading to higher price-performance.
  2. EMR on EKS helps the DRA characteristic in Spark. It’s helpful should you’re not acquainted what number of Spark executors are wanted in your job processing, and use DRA to dynamically regulate the assets your utility wants.
  3. Using Cluster Autoscaler allows you to absolutely management the right way to autoscale your Amazon EMR on EKS workloads. It improves your Spark utility availability and cluster effectivity by quickly launching right-sized compute assets.
  4. Karpenter simplifies autoscaling with its group-less provisioning of compute assets. The advantages embody lowered scheduling latency, and environment friendly bin-packing to cut back infrastructure prices.

Walkthrough overview

In our instance walkthrough, we are going to present the right way to use Pod template to schedule a job with EMR on EKS. We use Karpenter as our autoscaling instrument.

We full the next steps to implement the answer:

  1. Create an Amazon EKS cluster.
  2. Put together the cluster for EMR on EKS.
  3. Register the cluster with Amazon EMR.
  4. For Amazon EKS auto scaling, arrange Karpenter auto scaling in Amazon EKS.
  5. Submit a pattern Spark job utilizing pod templates to run in single Availability Zone and make the most of Spot for Spark executor pods.

The next diagram illustrates this structure.


To observe together with the walkthrough, guarantee that you’ve got the next prerequisite assets:

Create an Amazon EKS cluster

There are two methods to create an EKS cluster: you need to use AWS Administration Console and AWS CLI, or you possibly can set up all of the required assets for Amazon EKS utilizing eksctl, a easy command line utility for creating and managing Kubernetes clusters on EKS. For this publish, we use eksctl to create our cluster.

Let’s begin with putting in the instruments to arrange and handle your Kubernetes cluster.

  1. Set up the AWS CLI with the next command (Linux OS) and make sure it really works:
    curl "" -o ""
    sudo ./aws/set up
    aws --version

    For different working methods, see Putting in, updating, and uninstalling the AWS CLI model.

  2. Set up eksctl, the command line utility for creating and managing Kubernetes clusters on Amazon EKS:
    curl --silent --location "$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
    sudo mv -v /tmp/eksctl /usr/native/bin
    eksctl model

    eksctl is a instrument collectively developed by AWS and Weaveworks that automates a lot of the expertise of making EKS clusters.

  3. Set up the Kubernetes command-line instrument, kubectl, which lets you run instructions in opposition to Kubernetes clusters:
    curl -o kubectl
    chmod +x ./kubectl
    sudo mv ./kubectl /usr/native/bin

  4. Create a brand new file known as eks-create-cluster.yaml with the next:
    variety: ClusterConfig
      identify: emr-on-eks-blog-cluster
      area: us-west-2
    availabilityZones: ["us-west-2b", "us-west-2c", "us-west-2d"]
    managedNodeGroups:#On-demand nodegroups for spark job
    - identify: singleaz-ng-ondemand
      instanceType: m5.xlarge
      desiredCapacity: 1
      availabilityZones: ["us-west-2b"]

  5. Create an Amazon EKS cluster utilizing the eks-create-cluster.yaml file:
    eksctl create cluster -f eks-create-cluster.yaml

    On this Amazon EKS cluster, we create a single managed node group with a basic function m5.xlarge EC2 Occasion. Launching Amazon EKS cluster, its managed node teams, and all dependencies usually takes 10–quarter-hour.

  6. After you create the cluster, you possibly can run the next to verify all node teams have been created:
    eksctl get nodegroups --cluster emr-on-eks-blog-cluster

    Now you can use kubectl to work together with the created Amazon EKS cluster.

  7. After you create your Amazon EKS cluster, you will need to configure your kubeconfig file in your cluster utilizing the AWS CLI:
    aws eks --region us-west-2 update-kubeconfig --name emr-on-eks-blog-cluster
    kubectl cluster-info

Now you can use kubectl to hook up with your Kubernetes cluster.

Put together your Amazon EKS cluster for EMR on EKS

Now we put together our Amazon EKS cluster to combine it with EMR on EKS.

  1. Let’s create the namespace emr-on-eks-blog in our Amazon EKS cluster:
    kubectl create namespace emr-on-eks-blog

  2. We use the automation powered by eksctl to create role-based entry management permissions and so as to add the EMR on EKS service-linked function into the aws-auth configmap:
    eksctl create iamidentitymapping --cluster emr-on-eks-blog-cluster --namespace emr-on-eks-blog --service-name "emr-containers"

  3. The Amazon EKS cluster already has an OpenID Join supplier URL. You allow IAM roles for service accounts by associating IAM with the Amazon EKS cluster OIDC:
    eksctl utils associate-iam-oidc-provider --cluster emr-on-eks-blog-cluster —approve

    Now let’s create the IAM function that Amazon EMR makes use of to run Spark jobs.

  4. Create the file blog-emr-trust-policy.json:

    "Model": "2012-10-17",
    "Assertion": [
    "Effect": "Allow",
    "Principal": {
    "Service": ""
    "Action": "sts:AssumeRole"

    Arrange an IAM function:

    aws iam create-role --role-name blog-emrJobExecutionRole —assume-role-policy-document file://blog-emr-trust-policy.json

    This IAM function accommodates all permissions that the Spark job wants—as an example, we offer entry to S3 buckets and Amazon CloudWatch to entry needed information (pod templates) and share logs.

    Subsequent, we have to connect the required IAM insurance policies to the function so it may possibly write logs to Amazon S3 and CloudWatch.

  5. Create the file blog-emr-policy-document with the required IAM insurance policies. Exchange the bucket identify together with your S3 bucket ARN.

    "Model": "2012-10-17",
    "Assertion": [
    "Effect": "Allow",
    "Action": [
    "Useful resource": ["arn:aws:s3:::<bucket-name>"]
    "Impact": "Permit",
    "Motion": [
    "Useful resource": [

    Connect it to the IAM function created within the earlier step:

    aws iam put-role-policy --role-name blog-emrJobExecutionRole --policy-name blog-EMR-JobExecution-policy —policy-document file://blog-emr-policy-document.json

  6. Now we replace the belief relationship between the IAM function we simply created with the Amazon EMR service identification. The namespace offered right here within the belief coverage must be identical when registering the digital cluster in subsequent step:
    aws emr-containers update-role-trust-policy --cluster-name emr-on-eks-blog-cluster --namespace emr-on-eks-blog --role-name blog-emrJobExecutionRole --region us-west-2

Register the Amazon EKS cluster with Amazon EMR

Registering your Amazon EKS cluster is the ultimate step to arrange EMR on EKS to run workloads.

We create a digital cluster and map it to the Kubernetes namespace created earlier:

aws emr-containers create-virtual-cluster 
    --region us-west-2 
    --name emr-on-eks-blog-cluster 
    --container-provider '{
       "id": "emr-on-eks-blog-cluster",
       "kind": "EKS",
       "data": {
          "eksInfo": {
              "namespace": "emr-on-eks-blog"

After you register, you need to get affirmation that your EMR digital cluster is created:

"arn": "arn:aws:emr-containers:us-west-2:142939128734:/virtualclusters/lwpylp3kqj061ud7fvh6sjuyk",
"id": "lwpylp3kqj061ud7fvh6sjuyk",
"identify": "emr-on-eks-blog-cluster"

A digital cluster is an Amazon EMR idea that implies that Amazon EMR registered to a Kubernetes namespace and may run jobs in that namespace. If you happen to navigate to your Amazon EMR console, you possibly can see the digital cluster listed.

Arrange Karpenter in Amazon EKS

To get began with Karpenter, guarantee there’s some compute capability out there, and set up it utilizing the Helm charts offered within the public repository. Karpenter additionally requires permissions to provision compute assets. For extra info, seek advice from Getting Began.

Karpenter’s single duty is to provision compute in your Kubernetes clusters, which is configured by a customized useful resource known as a provisioner. As soon as put in in your cluster, the Karpenter provisioner observes incoming Kubernetes pods, which might’t be scheduled because of inadequate compute assets within the cluster, and robotically launches new assets to fulfill their scheduling and useful resource necessities.

For our use case, we provision two provisioners.

The primary is a Karpenter provisioner for Spark driver pods to run on EC2 On-Demand Situations:

variety: Provisioner
  identify: ondemand
  ttlSecondsUntilExpired: 2592000 

  ttlSecondsAfterEmpty: 30

  labels: on-demand

    - key: ""
      operator: In
      values: ["us-west-2b"]
    - key: ""
      operator: In
      values: ["arm64"]
    - key: ""
      operator: In
      values: ["on-demand"]

      cpu: "1000"
      reminiscence: 1000Gi

    subnetSelector: emr-on-eks-blog-cluster
    securityGroupSelector: emr-on-eks-blog-cluster

The second is a Karpenter provisioner for Spark executor pods to run on EC2 Spot Situations:

variety: Provisioner
  identify: default
  ttlSecondsUntilExpired: 2592000 

  ttlSecondsAfterEmpty: 30

  labels: spot

    - key: ""
      operator: In
      values: ["us-west-2b"]
    - key: ""
      operator: In
      values: ["arm64"]
    - key: ""
      operator: In
      values: ["spot"]

      cpu: "1000"
      reminiscence: 1000Gi

    subnetSelector: emr-on-eks-blog-cluster
    securityGroupSelector: emr-on-eks-blog-cluster

Notice the highlighted portion of the provisioner config. Within the necessities part, we use the well-known labels with Amazon EKS and Karpenter so as to add constraints for the way Karpenter launches nodes. We add constraints that if the pod is in search of a label spot, it makes use of this provisioner to launch an EC2 Spot Occasion solely in Availability Zone us-west-2b. Equally, we observe the identical constraint for the on-demand label. We can be extra granular and supply EC2 occasion sorts in our provisioner, and they are often of various vCPU and reminiscence ratios, providing you with extra flexibility and including resiliency to your utility. Karpenter launches nodes solely when each the provisioner’s and pod’s necessities are met. To be taught extra concerning the Karpenter provisioner API, seek advice from Provisioner API.

Within the subsequent step, we outline pod necessities and align them with what now we have outlined in Karpenter’s provisioner.

Submit Spark job utilizing Pod template

In Kubernetes, labels are key-value pairs which can be connected to things, similar to pods. Labels are meant for use to specify figuring out attributes of objects which can be significant and related to customers. You possibly can constrain a pod in order that it may possibly solely run on explicit set of nodes. There are a number of methods to do that, and the really useful approaches all use label selectors to facilitate the choice.

Starting with Amazon EMR variations 5.33.0 or 6.3.0, EMR on EKS helps Spark’s pod template characteristic. We use pod templates so as to add particular labels the place Spark driver and executor pods must be launched.

Create a pod template file for a Spark driver pod and save them in your S3 bucket:

apiVersion: v1
variety: Pod
  nodeSelector: on-demand
  - identify: spark-kubernetes-driver # This might be interpreted as Spark driver container

Create a pod template file for a Spark executor pod and save them in your S3 bucket:

apiVersion: v1
variety: Pod
  nodeSelector: spot
  - identify: spark-kubernetes-executor # This might be interpreted as Spark driver container

Pod templates present totally different fields to handle job scheduling. For added particulars, seek advice from Pod template fields. Notice the nodeSelector for the Spark driver pods and Spark executor pods, which match the labels we outlined with the Karpenter provisioner.

For a pattern Spark job, we use the next code, which creates a number of parallel threads and waits for a couple of seconds:

cat << EOF >
import sys
from time import sleep
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("threadsleep").getOrCreate()
def sleep_for_x_seconds(x):sleep(x*20)
sc.parallelize(vary(1,6), 5).foreach(sleep_for_x_seconds)

Copy the pattern Spark job into your S3 bucket:

aws s3 mb s3://<YourS3Bucket>
aws s3 cp s3://<YourS3Bucket>

Earlier than we submit the Spark job, let’s get the required values of the EMR digital cluster and Amazon EMR job execution function ARN:

export S3blogbucket= s3://<YourS3Bucket>
export VIRTUAL_CLUSTER_ID=$(aws emr-containers list-virtual-clusters --query "virtualClusters[?state=='RUNNING'].id" --region us-west-2 --output textual content)

export EMR_ROLE_ARN=$(aws iam get-role --role-name blog-emrJobExecutionRole --query Function.Arn --region us-west-2 --output textual content)

To allow the pod template characteristic with EMR on EKS, you need to use configuration-overrides to specify the Amazon S3 path to the pod template:

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name spark-threadsleep-single-az 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-5.33.0-latest 
--region us-west-2 
--job-driver '{
    "sparkSubmitJobDriver": {
        "entryPoint": "'${S3blogbucket}'/",
        "sparkSubmitParameters": "--conf spark.executor.situations=6 --conf spark.executor.reminiscence=1G --conf spark.executor.cores=1 --conf spark.driver.cores=2"
--configuration-overrides '{
    "applicationConfiguration": [
        "classification": "spark-defaults", 
        "properties": {
"spark.kubernetes.driver.podTemplateFile":"'${S3blogbucket}'/spark_driver_podtemplate.yaml", "spark.kubernetes.executor.podTemplateFile":"'${S3blogbucket}'/spark_executor_podtemplate.yaml"
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-on-eks/emreksblog", 
        "logStreamNamePrefix": "threadsleep"
      "s3MonitoringConfiguration": {
        "logUri": "'"$S3blogbucket"'/logs/"

Within the Spark job, we’re requesting two cores for the Spark driver and one core every for Spark executor pod. As a result of we solely had a single EC2 occasion in our managed node group, Karpenter seems on the un-schedulable Spark driver pods and makes use of the on-demand provisioner to launch EC2 On-Demand Situations for Spark driver pods in us-west-2b. Equally, when the Spark executor pods are in pending state, as a result of there aren’t any Spot Situations, Karpenter launches Spot Situations in us-west-2b.

This fashion, Karpenter optimizes your prices by ranging from zero Spot and On-Demand Situations and solely creates them dynamically when required. Moreover, Karpenter batches pending pods after which binpacks them based mostly on CPU, reminiscence, and GPUs required, taking into consideration node overhead, VPC CNI assets required, and daemon units that might be packed when mentioning a brand new node. This makes certain you’re effectively using your assets with least wastage.

Clear up

Don’t neglect to scrub up the assets you created to keep away from any pointless fees.

  1. Delete all of the digital clusters that you just created:
    #Checklist all of the digital cluster ids
    aws emr-containers list-virtual-clusters#Delete digital cluster by passing digital cluster id
    aws emr-containers delete-virtual-cluster —id <virtual-cluster-id>

  2. Delete the Amazon EKS cluster:
    eksctl delete cluster emr-on-eks-blog-cluster

  3. Delete the EMR_EKS_Job_Execution_Role function and insurance policies.


On this publish, we noticed the right way to create an Amazon EKS cluster, configure Amazon EKS managed node teams, create an EMR digital cluster on Amazon EKS, and submit Spark jobs. Utilizing pod templates, we noticed how to make sure Spark workloads are scheduled in the identical Availability Zone and make the most of Spot with Karpenter auto scaling to cut back prices and optimize your Spark workloads.

To get began, check out the EMR on EKS workshop. For extra assets, seek advice from the next:

Concerning the writer

Jamal Arif is a Options Architect at AWS and a containers specialist. He helps AWS clients of their modernization journey to construct progressive, resilient, and cost-effective options. In his spare time, Jamal enjoys spending time outdoor together with his household mountaineering and mountain biking.



Please enter your comment!
Please enter your name here