Under the hood
Overview
As stated in the installation instruction, HADeploy use Ansible under the hood to perform operation on the remote hosts. As such Ansible is defined as a dependency for HADeploy installation.
The main steps of an HADeploy run are the following:
-
Load the application description by appending all
--src
files provided on the command line to build a single deployment file. -
Check syntax of this deployment file, based on a YAML schema.
-
Build a data model in memory representing this file content.
-
For all objects, Check
when:
clause and remove it ifFalse
. -
Check this data model for consistency, enrich it, or transform some data to ease the next stages.
-
Generate a Jinja2 template by concatenating all template snippets provided by the plugins.
-
Render this template with the model to generate an Ansible playbook
-
Launch Ansible on this playbook.
Variables
When using HADeploy in an advanced way (i.e using the ansible
module, or developing your own plugin), you may be disturbed by different variable notation.
There is in fact 3 kinds of variables involved in HADeploy
${my_variable}
Or <<my_variable>>
This is the only variable notation you should be aware of for standard usage of HADeploy.
Such variables are resolved during step 1 (Building of the deployment file).
Refer to alternate notation for the motivation of using <<my_variable>>
.
{{{my_variable}}}
This is the variable notation used during the rendering of step 6. This will allow all snippets provided by the plugin to access variables of the model.
{{my_variable}}
This is the standard variable notation used by Ansible. HADeploy will not resolve such variables, passing them as is to Ansible playbook. So they will be resolved by Ansible in step 7.
This form need generaly to be quoted ("{{my_variable}}"
). It must also be used for encrypted values
Variable relationship
Although this variables act at different level, there is some mechanism to propagate user's value (The one with ${..} notation) to lower level.
-
They are copied in the data model built in step 3, under the token
src.vars
. So, the variable${my_variable}
can be accessed by{{{src.vars.my_variable}}}
by a plugin playbook snippet. -
In the Ansible context, a
group_vars/all
file is generated, containing a line asmy_variable: my_value
for each user's variable. So, all the user's variable will be directly accessible by Ansible, in step 7.
The working folder
In case of problem, it could be useful to have a look on the generated Ansible playbook.
This playbook is generated in a temporary folder, called the working folder
. It is named as the action provided as parameter.
This folder is automatically created in /tmp
, under a random name.
To ease debugging, one can force this working folder to a specific location, using the --workingFolder
command line option.
Warning: In such case, the full content of the working folder will be deleted on each run...
But, this working folder not only contains the playbook. Here is a brief description of the files you may found in it:
Name | Description |
---|---|
ansible.cfg inventory group_vars folder |
HAdeploy create a complete Ansible context to run Ansible inside. |
<action> .yml.jj2 |
This is the Jinja2 source template, result of step 5 above, which will be merged with the data model to produce the file below. |
<action> .yml |
The playbook for targeted action, generated on step 6. i.e deploy.yml or remove.yml . |
model_src.json | This is the part of the data model built from the sources file. |
model_data.json | This is a the part of the data model where HADeploy store some intermediate structure. |
model_helper.json | This is a part of the data model hosting some configuration information. |
schema.yml | This is the YAML schema used to validate source input files. Will use pykwalify tool for validation. |
desc_xxxx.yml.j2 | Some helper files, specific to some plugins. |
Note: If you launch HADeploy with --action none
then it will generate ansible playbooks for all action it is aware off. But, it will not launch ansible.
This is intended to validate description for all action.
Plugins
A plugin is a component which may be involved in all phases described at the befinning of this page. It is typically made of:
-
A partial schema. All plugin's YAML schema parts will be merged to provide the overall schema against which the deployment file will be validated.
-
Some Python code called to check and enrich the model. This code must include a subclass of the
Plugin
class, to handle plugin properties and lifecycle. -
Several playbook snippet. Typically, one per supported action. For a given action, all plugin plabook's snippets will be concatenated to build the target playbook. These snippets are Jinja2 templates which will be merged with the data model.
All theses are optional. A plugin can also host:
-
Ansible roles or modules.
-
One or more helper. An helper is a specific program designed to manage services which offer only a Java API.
Currently, HADeploy is provided with the following internal plugin:
inventory
for target hosts management.ansible_inventories
for direct integration of an Ansible inventories.users
for management of local users and groups.files
for base file management.hdfs
which extends the previous one for HDFS accesses.hbase
hive
kafka
ansible
ranger
Embedded Ansible roles
-
For HDFS access, HADeploy embeds the hdfs_modules Ansible modules in the HDFS plugin
-
For Apache Ranger policy handling, HADeploy embeds the ranger_modules Ansible modules in the Ranger plugin
-
For Storm topologies lifecycle handling, HADeploy embeds the storm_modules Ansible modules in the Elasticsearch plugin
-
For Elasticsearch indices and templates managment, HADeploy embeds the elastic_modules Ansible modules in the Elasticsearch plugin
Embedded Helpers
Hive
- For Hive based deployment, HADeploy embeds the jdchive tool.
HBase
-
For HBase based deployment, HADeploy embeds the jdchtable tool.
-
For HBase dataset loading, HADeploy embeds the hbload tool.
Kafka
- For Kafka based deployment, HADeploy embeds the jdctopic tool.