hdfs_relay

Synopsis

Issuing some commands to specifics subsystem, such as HDFS require a quite complex client configuration.

To avoid this, HADeploy will not issue such command directly, but push the command on one of the cluster node, called ’Relay node'. An edge node of the cluster would typically assume this function.

hdfs_relay will define which host will be used to relay operations for HDFS, and also how these operations will be performed.

There should be only one entry of this type in the HADeploy definition file.

Attributes

hdfs_relay is a map with the following attributes:

Name req? Description
host yes The host on which all HDFS commands will be pushed for execution.
cache_folder no A folder on this host, which will be used by HADeploy as cache storage. Mainly, all files targeted to HDFS will be first pushed in this cache. And will remains in it, to optimize idempotence.
Default: {{ansible_user_dir}}/.hadeploy/files, where {{ansible_user_dir}} is substitued by the home folder of the ssh_user defined for this relay host.
user no The user account HADeploy will use to perform all HDFS related operation. Must have enough rights to do so.
Not to be defined when using Kerberos authentication.
Default: The ssh_user defined for this relay host or hdfs if this user is root.
principal no A Kerberos principal allowing all HDFS related operation to be performed. See below
local_keytab_path no A local path to the associated keytab file. This path is relative to the embeding file. See below
relay_keytab_path no A path to the associated keytab file on the relay host. See below
hadoop_conf_dir no Where HADeploy will lookup Hadoop configuration file.
Default: /etc/hadoop/conf
webhdfs_endpoint no HADeploy will perform several actions through WebHDFS REST interface. You can specify corresponding endpoint, if it is not defined in the usual configuration way.
Default: The value found in <hadoop_conf_dir>/hdfs-site.xml
when no Boolean. Allow conditional deployment of this item.
Default True

Hadoop configuration lookup

If this hdfs_relay host is properly configured as an Hadoop client, there should be no need to provide value to hadoop_conf_dir and/or webhdfs_endpoint, as HADeploy will be able to lookup the WebHDFS URL by using default values.

Kerberos authentication

When principal and ..._keytab_path variables are defined, Kerberos authentication will be activated for all HDFS folders, files and trees operations. This means a kinit will be issued with provided values before any HDFS access, and a kdestroy issued after. This has the following consequences:

Regarding the keytab file, two cases:

Example

The simplest case:

hdfs_relay:
  host: en1

Same, with default value sets:

hdfs_relay:
  host: en1
  user: hdfs
  hadoop_conf_dir: "/etc/hadoop/conf"
  webhdfs_endpoint: "namenode.mycluster.myinfra.com:50070"

The simplest case with Kerberos activated:

hdfs_relay:
  host: en1
  principal: hdfs-mycluster
  relay_keytab_path: /etc/security/keytabs/hdfs.headless.keytab