Activation via AWS CLI

This page contains steps to activate Xonai Accelerator via AWS command-line interface.

Before creating the cluster, copy the following script and edit key-pair-name to be the name of your personal key pair file (*.pem file name but without the extension), and edit BOOTSTRAP_ACTION_SCRIPT to point to the S3 location where xonai-activation.sh exists.

export KEYNAME=<key-pair-name>
export CLUSTER_NAME=xonai-spark-cluster
export EMR_RELEASE_LABEL=emr-6.7.0
export INSTANCE_TYPE=m5.xlarge
export CONFIG_JSON_LOCATION=./xonai-configuration.json
export BOOTSTRAP_ACTION_SCRIPT=<xonai-activation-script-path>

Create the EMR cluster with the following command, which on successful creation it should output JSON containing values such as the ID of the new cluster:

aws emr create-cluster \
  --name $CLUSTER_NAME \
  --release-label $EMR_RELEASE_LABEL \
  --service-role EMR_DefaultRole \
  --applications Name=Hadoop Name=Spark \
  --ec2-attributes KeyName=$KEYNAME,InstanceProfile=EMR_EC2_DefaultRole \
  --instance-type $INSTANCE_TYPE \
  --configurations file://$CONFIG_JSON_LOCATION \
  --bootstrap-actions Name='Xonai Accelerator activation',Path=$BOOTSTRAP_ACTION_SCRIPT
{
    "ClusterId": "j-3HWJEKDYQWKCU",

Note

It can take up to ~5 minutes to create a cluster.

Check if the cluster is waiting for steps to run with the following command, which should output JSON indicating the state of “WAITING”.

aws emr describe-cluster --cluster-id <cluster_id>
{
    "Cluster": {
        "Id": "j-3HWJEKDYQWKCU",
        "Name": "xonai-spark-cluster",
        "Status": {
            "State": "WAITING",
            "StateChangeReason": {
                "Message": "Cluster ready to run steps."

Submit a Spark Application to the EMR Cluster With Xonai

SSH into the master node of the cluster in “WAITING” state with the following command:

aws emr ssh --cluster-id <cluster_id> --key-pair-file ~/<my-key-pair.pem>

Now you can submit a Xonai-accelerated Spark application just like any ordinary Spark application via spark-submit, for example:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.memoryOverhead=4g \
  $SPARK_HOME/examples/jars/spark-examples.jar \
  1000

The console output should look like this in order to indicate that the plugin component was initialized.

../_images/emr-console1.png

Not setting spark.executor.memoryOverhead or having insufficient memory overhead will result in an error message, for example:

../_images/emr-console2.png

Cluster Termination

When you are done submitting Spark applications, do not forget to terminate the cluster either via the “Terminate” button or via CLI:

aws emr terminate-clusters --cluster-id <cluster_id>

See the official Amazon EMR guide to learn more about launching EMR clusters.