ranger_relay
Synopsis
All Apache Ranger commands will be performed using Ranger HTTP/REST API. This definition will provide informations for HADeploy to use this interface.
There should be only one entry of this type in the HADeploy definition file.
Attributes
ranger_relay is a map with the following attributes:
Name | req? | Description |
---|---|---|
host | yes | From which host these commands will be launched. Typically, an edge node in the cluster. But may be a host outside the cluster. |
ranger_url | yes | The Ranger base URL to access Ranger API. Same host:port as the Ranger Admin GUI. Typically: http://myranger.server.com:6080 or https://myranger.server.com:6182 |
ranger_username | no | The user name to log on the Ranger Admin. Must have enough rights to manage policies |
ranger_password | no | The password associated with the admin_username. May be encrypted. Refer to encrypted variables |
principal | no | A Kerberos principal allowing all Ranger related operation to be performed. See below |
local_keytab_path | no | A local path to the associated keytab file. This path is relative to the embeding file. See below |
relay_keytab_path | no | A path to the associated keytab file on the relay host. See below |
tools_folder | no | Folder used by HADeploy to store keytab if needed. Default: /tmp/hadeploy_<user>/ where user is the ssh_user defined for this relay host. |
validate_certs | no | Useful if Ranger Admin connection is using SSL. If no, SSL certificates will not be validated. This should only be used on personally controlled sites using self-signed certificates. Default: yes |
ca_bundle_relay_file | no | Useful if Ranger Admin connection is using SSL. Allow to specify a CA_BUNDLE file, a file that contains root and intermediate certificates to validate the Ranger Admin certificate in .pem format. This file will be looked up on the relay host system, on which this module will be executed. |
ca_bundle_local_file | no | Same as above, except this file will be looked up locally, relative to the main file. It will be copied on the relay host at the location defined by ca_bundle_relay_file |
hdfs_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin HDFS service, typically: <yourClusterName>_hadoop .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
hbase_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin HBase service, typically: <yourClusterName>_hbase .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
kafka_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin Kafka service, typically: <yourClusterName>_kafka .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
hive_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin Hive service, typically: <yourClusterName>_hive .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
yarn_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin Yarn service, typically: <yourClusterName>_yarn .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
storm_service_name | no | In most cases, you should not need to set this parameter. It defines the Ranger Admin Storm service, typically: <yourClusterName>_storm .It must be set if there are several such services defined in your Ranger Admin configuration, to select the one you intend to use. |
policy_name_decorator | no | To distinguish Ranger policy managed by HADeploy, a naming convention is applied by default. The policy name, as it will appears in the GUI Ranger interface will be in the form HAD[<policyName>] , where <policyName> is the name of the policy as you provide it.This is achieved by wrapping the name with this pattern, where {0} is substituted with the policy name. For python aware reader, this is performed as: "HAD[{0}]".format(policyName) .If you just want to have raw policy name, simply define the parameter with {0} .Default: HAD[{0}] |
no_log | no | Boolean. Prevent some messages to be displayed, as they can contains sensitive information. This can make debugging somewhat more difficult, so temporary setting this swith to False may be useful to get more information (Including sensitive ones!).Default True |
when | no | Boolean. Allow conditional deployment of this item. Default True |
Kerberos authentication
When principal
and ..._keytab_path
variables are defined, Kerberos authentication will be activated for all Ranger operations. This means a kinit
will be issued with provided values before any Ranger access, and a kdestroy
issued after. This has the following consequences:
-
All Ranger operations will be performed on behalf of the user defined by the provided principal. This user must have enough rights to perform required policy manipulation. Granting this user with 'admin' role through Ranger UI is the simplest way to achieve this.
-
The
kinit
will be issued on the relay host with thessh_user
account. This means any previous ticket own by this user on this node will be lost.
Regarding the keytab file, two cases:
-
This keytab file already exists on the relay host. In such case, the
relay_keytab_path
must be set to the location of this file. And the relay host'sssh_user
must have read access on it. -
This keytab file is not present on the relay host. In such case the
local_keytab_path
must be set to its local location. HADeploy will take care of copying it on the remote relay host, in a location undertools_folder
. Note you can also modify this target location by setting also therelay_keytab_path
parameter. In this case, it must be the full path, including the keytab file name. And the containing folder must exists.
Example
A simple configuration:
ranger_relay:
host: en1
ranger_url: http://ranger.mycluster.mycompany.com:6080
ranger_username: admin
ranger_password: admin
A more secure configuration, with https, certificate validation and encrypted password.
encrypted_vars:
ranger_password: |
$ANSIBLE_VAULT;1.1;AES256
34396662613462623565323936616330623661623065343033646136643635653430636238613962
3537343131346462343138343064313937646366363435340a633532366162623838376436366362
61393033343932303636653066336130616132383463373934396265306364363562613565613165
6163613739303430650a356136353865623534643237646166393230613933396166663963633538
3664
ranger_relay:
host: en1
ranger_url: https://ranger.mycluster.mycompany.com:6182
ranger_username: admin
ranger_password: "{{ranger_password}}"
ca_bundle_local_file: cert/ranger_mycluster_cert.pem
ca_bundle_relay_file: /etc/security/certs/ranger_mycluster_cert.pem
When using such encryption feature, you will need to provide a password when launching HADeploy. Otherwise you will have an error like the following:
The offending line appears to be:
vars:
rangerPassword: !vault |
^ here
More detail on how to encrypt a value and providing a password on execution at encrypted variables
A configuration using Kerberos authentication:
ranger_relay:
host: en1
ranger_url: http://ranger.mycluster.mycompany.com:6080
principal: sa
local_keytab_path: ./sa-gate17.keytab
CA_BUNDLE
Internally, HADeploy use the python requests
API to access Ranger. The provided ca_bundle_relay_file
will be used as the verify
parameter of all HTTP requests. More info here.
In some cases, a CA_BUNDLE may be simply the certificate of the Ranger server, in PEM format.
To grab this certificate, you may use a tiny python program like the following:
import ssl
if __name__ == '__main__':
cert = ssl.get_server_certificate(("ranger.mycluster.corp.com", 6182), ssl_version=ssl.PROTOCOL_SSLv23)
print cert
f = open("cert.pem", "w")
f.write(cert)
f.close()