kafka_relay

Synopsis

Issuing some commands to specifics subsystem, such as Apache Kafka require a quite complex client configuration.

To avoid this, HADeploy will not issue such command directly, but push the command on one of the cluster node, called ’Relay node'.

kafka_relay will define which host will be used to relay operations for Kafka, and also how these operations will be performed.

There should be only one entry of this type in the HADeploy definition file.

Attributes

kafka_relay is a map with the following attributes:

Name req? Description
host yes The host on which all Kafka commands will be pushed for execution. THIS HOST MUST HAVE KAFKA INSTALLED ON. This can be validated by trying to locate and use commanes such as kafka-topic.sh.
zk_host_group yes The host_group representing the zookeeper quorum. This group must contain the hosts acting as zookeeper servers.
This group should have the force_setupflag set to `yes
kafka_version yes Specify the Kafka version. May be 0.10.0, 1.0.0, 1.1.1 or 2.0.0. If your current version does not strictly match one of theses, you may try to select immediate previous version (i.e. use 0.10.0 for 0.11.0).
zk_port no The zookeeper client port.
Default: 2181
zk_path no The root path of Kafka in zookeeper.
Default:'/'
broker_id_map no With Kafka, each broker is identified with an id. When creating a Topic, one can let Kafka distribute partition's replica across the cluster. But, we may also need to specify explicitly the distribution of replica, with strict location rules.
In such case, we need to specify brokers at topic creation, using broker_id. As these broker_id are infrastructure dependent, our application deployment description would be tightly coupled to the target infrastructure.
To prevent, this, we introduce here a level of indirection, by a map where each key is a virtual broker_id (used in assignment in topic definition) and the value is the effective one.
If this map is not defined, then the virtual broker_id are same as the effective ones.
tools_folder no Folder used by HADeploy to install some tools for Kafka management.
Default: /tmp/hadeploy_<user>/ where user is the ssh_user defined for this relay host.
become_user no A user account under which all kafka operations will be performed..
Note: The ssh_user defined for this relay host must have enough rights to switch to this become_user using the become_method below.
Default: No user switch, so the ssh_user defined for this relay host will be used.
become_method no The method used to swith to this user. Refer to the Ansible documentation on this parameter.
Default: Ansible default (sudo).
when no Boolean. Allow conditional deployment of this item.
Default True

Kerberos authentication

HADeploy Kafka topics management need access to Zookeeper. When Kerberos is activated on the target cluster, such access may be protected and forbidden for your deployment user.

In such case, solution is to act as kafka user, by using the become_user attribute.

But keep in mind The ssh_user defined for this relay host must have enough rights to switch to kafka user account using the become_method.

Example

The simplest case:

kafka_relay:
  host: br1
  zk_host_group: zookeepers
  kafka_version: "1.0.0"

The simplest case when Kerberos is activated:

kafka_relay:
  host: br1
  zk_host_group: zookeepers
  kafka_version: "1.0.0"
  become_user: kafka

A more complex, with default value set and a broker_id mapping (Typical of an Hortonworks Kafka deployment).

kafka_relay:
  host: br1
  zk_host_group: zookeepers
  kafka_version: "0.10.0"
  zk_port: 2181
  broker_id_map:
    1: 1001
    2: 1002
    3: 1003

Tricks

kdescribe

To find the broker_ids values, one may use the kdescribe tool:

AnsibleUndefinedVariable

If, when running HADeploy you encounter error like:

fatal: [dn1]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ansible_fqdn'"}

it is most likely that you have not set force_setup on the zk_host_group group