Yarn plugin: Overview

Goal

Aim of the Yarn plugin is to handle Yarn services Life cycle

Start Yarn services
Stop Yarn services
Get current Yarn service status

By 'Yarn services', we mean Yarn jobs running indefinitely, such as Spark Streaming jobs.

This all also named as 'Yarn Long Running Services', or 'Yarn Long Running Jobs'.

This plugin is NOT intended to manage Batch jobs.

How it works

This is achieved by:

Issuing command to the Yarn Resource Manager through its REST API
Start Yarn service when required using a user provided launching script.

All these operations are performed on a specific node included in the cluster. This node is designated as a yarn_relay.

Requirement

There is some requirement for the launching script.

HADeploy identify Yarn services by its name. So care must be taken to ensure the service name match the one provided in the definition. This can easily be achieved using the --name option on the yarn/spark submit command. See example below.
The launching script must exit after launching the job. For Spark, the --conf "spark.yarn.submit.waitAppCompletion=false" option can be used. See example below.

Yarn services deployment.

The deployment of each services by itself is NOT in the scope of this plugin. Typically this consist in:

Create a deployment folder.
Deploying a jar
Deploying one or several configuration files
Eventually, deploying a script to launch the service.

This at least on the Yarn relay node and eventually on one or several other nodes, for resiliency.

All these tasks can be achieved using this HADeploy folders and files specification.

Templating mechanism and support of Maven repository built in the files plugin will be of great help here.

Actions `stop`,`start` and `status`

The yarn plugin introduce three new actions:

hadeploy --src ..... --action start

Will start all services described by the yarn_services list. And

hadeploy --src ..... --action stop

Will kill the same services. While

hadeploy --src ..... --action status

will display current status of the services, in a rather primitive form.

Also, the Yarn plugin kill all running services at one of the first step of the removal action (--action remove).

Of course, all this will occur only on services HADeploy is aware of (Defined with yarn_services). Other services will not be impacted.

Services shutdown.

When HADeploy is instructed to halt all services (--action stop), by default, it will use the RM REST API, setting application in the 'KILLED' state. This is equivalent to a yarn application --kill command.

An alternate way to shutdown a yarn job is to provide a script issuing the kill command, and to define such script using the killing_cmd attribute. This can be used in the following case:

If your application provide a more graceful shutdown method
If your cluster is secured by Kerberos and if the RM REST API is not secured by SPNEGO. In such case, using such API will not allow setting of the appropriate user, and a killing script is a far more easy solution.

Notifications: Services restart

Let's say we now want to update the service's jar or one of the associated configuration files.

We can modify it and trigger a new deployment. HADeploy will notice the modification and push the new version on the target hosts. But, the running services will be unaffected.

We can restart it manually. But, HADeploy provide a mechanism to automate this. By adding a notify attribute to the files definition. See the example below.

Ranger support.

Ranger handling on Yarn jobs is based on Yarn Queue management. HADeploy allow you to define such permission using yarn_ranger_policies.

Example

Here is a snippet describing the deployment of a simple Yarn services 'datastep':


vars:
  yarn_launcher_host: en1
  basedir: "/opt/datastep"
  user: dsrunner
  group: dsrunner
  datastep_version: "0.1.0-SNAPSHOT"

yarn_relay:
  host: ${yarn_launcher_host}

maven_repositories:
- name: myrepo
  snapshots_url: http://myrepo.mydomain.com/nexus/repository/maven-snapshots/
  releases_url: http://myrepo.mydomain.com/nexus/repository/maven-releases/

folders:
- { path: "${basedir}", scope: "${yarn_launcher_host}", owner: "${user}", group: "${group}", mode: "755" }

files:
- { scope: "${yarn_launcher_host}", src: "mvn://myrepo/com.mydomain/datastep/${datastep_version}/uber", 
    notify: ['yarn://datastep'], dest_folder: "${basedir}", owner: "${user}", group: "${group}", mode: "0644" }

- { scope: "${yarn_launcher_host}", src: "tmpl://submit.sh", dest_folder: "${basedir}", 
    notify: ['yarn://datastep'], owner: "${user}", group: "${group}", mode: "0744" }

- { scope: "${yarn_launcher_host}", src: "tmpl://kill.sh", dest_folder: "${basedir}", 
    notify: ['yarn://datastep'], owner: "${user}", group: "${group}", mode: "0744" }

yarn_services:
- name: datastep
  launching_cmd: ./submit.sh
  launching_cmd: ./kill.sh
  launching_dir: ${basedir}

And here is what could be a simplistic submit script template:

#/bin/bash

{% if kerberos is defined and kerberos %}
kinit -kt /etc/security/keytabs/{{user}}.keytab {{user}}
{% endif %}

spark-submit --name datastep --master yarn --deploy-mode cluster --class com.mydomain.datastep.Main \
    --conf "spark.yarn.submit.waitAppCompletion=false" \
    --jars {{basedir}}/datastep-{{datastep_version}}-uber.jar 

{% if kerberos is defined and kerberos %}
kdestroy
{% endif %}

And a killing script:

#/bin/bash

{% if kerberos is defined and kerberos %}
kinit -kt /etc/security/keytabs/{{user}}.keytab {{user}}
{% endif %}

APPLICATION_ID=$(yarn application --appStates RUNNING --list 2>/dev/null | awk "{ if (\$2==\"datastep\") print \$1 }")

if [ "$APPLICATION_ID" = "" ]
then
    echo "?? Not running"
else
    yarn application --kill ${APPLICATION_ID} 2>/dev/null
    echo "$APPLICATION_ID Killed!"
fi

{% if kerberos is defined and kerberos %}
kdestroy
{% endif %}

This is of course not complete, as it lack at least the target cluster definition.

Please refer to yarn_relay and yarn_services for a complete description. And to files for the notify syntax.

Of course, before being able to launch the services (--action start), a deployment must be performed before (--action deploy)