Supervisor plugin: Overview
Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.
It shares some of the same goals of programs like systemd, but unlike this one, it is not meant to be run as a substitute for init as “process id 1”. Instead it is meant to be used to control processes related to a project or a customer, and is meant to start like any other program at boot time.
A supervisor instance is aimed to control several application programs. In practice, in most case, only one instance per application will be necessary.
This plugin allow both to create a supervisor
instance and to define all programs this instance will have to manage, by defined supervisor_programs
Creating a supervisor
instance using this module require root
access on the target host.
But this instance is then delegated to a specific (typically non-priviledged) user (or technical account) which then will be able to perform any admin operation (Add/remove programs, start/stop, ...).
This module will also allow HADeploy to handle start and stop of the associated programs.
A simple case
Let's begin with a simple example. In this example, we want to deploy a dameon on the two edge nodes of a cluster.
This deamon will run under the tsup
technical account. We also want the supervisor to be manageable throught this account.
Here is a main application file, saved as app.yml
hosts:
- { name: en1, ssh_host: en1.mycluster.com, ssh_user: root, ssh_private_key_file: keys/root_id }
- { name: en2, ssh_host: en2.mycluster.com, ssh_user: root, ssh_private_key_file: keys/root_id }
local_templates_folders:
- ./templates
users:
- login: tsup
folders:
- { scope: all, path: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
supervisors:
- name: tsupv
scope: all
user: tsup
group: tsup
files:
- { scope: all, src: "tmpl://program1.sh", dest_folder: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
supervisor_programs:
- name: program1
supervisor: tsupv
command: "/bin/bash /opt/tsup/program1.sh"
In this file we:
-
Describe the target cluster and how to connect to.
-
Define a local location as our template repository.
-
Create the
tsup
technical account. -
Create a folder
/opt/tsup
as an home folder for the application. -
Create a supervisor instance name
tsupv
. -
Copy the
program1.sh
in the/opt/tsup
folder. -
And declare this program to be managed by the supervisor instance we just defined.
Of course, we will need a test program to manage. Let's write a simple script:
#!/bin/bash
user=$(id -un)
prg=program1
while true
do
echo "$(date -Iseconds) - - - Hello from ${user}:${prg} to stdout....."
sleep 2
done
And save it as templates/program1.sh
.
Our application is now ready to be deployed. On the deployement node (Called deployer
here):
[hadeployer]$ hadeploy --src app.yml --action deploy
Then, we can log on one edge node and ensure both the daemon and the supervisor run under the tsup
account.
[en1]$ pgrep -au tsup
4392 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord_tsupv.conf
4621 /bin/bash /opt/tsup/program1.sh
7983 sleep 2
And we can now play with the supervisor command line interface created for our instance (supervisorctl_tsupv
), under the tsup
account:
[en1]$ sudo -u tsup supervisorctl_tsupv
program1 RUNNING pid 4621, uptime 0:07:03
supervisor> help
default commands (type help <topic>):
=====================================
add clear fg open quit remove restart start stop update
avail exit maintail pid reload reread shutdown status tail version
supervisor> status program1
program1 RUNNING pid 4621, uptime 0:07:20
supervisor> tail program1
llo from tsup:program1 to stdout.....
2017-10-15T08:53:28+0000 - - - Hello from tsup:program1 to stdout.....
...
...
2017-10-15T08:54:09+0000 - - - Hello from tsup:program1 to stdout.....
2017-10-15T08:54:11+0000 - - - Hello from tsup:program1 to stdout.....
supervisor> stop program1
program1: stopped
supervisor> status
program1 STOPPED Oct 15 08:54 AM
supervisor> start program1
program1: started
supervisor> status
program1 RUNNING pid 8270, uptime 0:00:05
supervisor>
From the HADeploy node, we can stop both the program and the supervisor in one command:
[hadeployer]$ hadeploy --src app.yml --action stop
Ensure this is no process running under tsup
account on the edge node:
[en1]$ pgrep -au tsup
Back to the HADeploy node, restart everything:
[hadeployer]$ hadeploy --src app.yml --action start
An check on the edge node:
[en1]$ pgrep -au tsup
8887 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord_tsupv.conf
8974 /bin/bash /opt/tsup/program1.sh
9046 sleep 2
Actions stop
, start
and status
As just mentioned, the Supervisor
plugin introduce three new actions:
hadeploy --src ..... --action stop
Will stop all supervisor_programs
and then all supervisord daemon described by the supervisors
list which have the managed
flag to true
. And:
hadeploy --src ..... --action start
Will start the same supervisord daemon, then all supervisor_programs
. And:
hadeploy --src ..... --action status
Will display current status in a rather primitive form.
Web interface
Each supervisor instance can expose a web interface for management. We can easily configure it by adding a line to our supervisor definition:
supervisors:
- name: tsupv
scope: all
user: tsup
group: tsup
http_server: { endpoint: "0.0.0.0:9001", username: admin, password: admin }
Then, we just neeed to trigger a new deployment:
[hadeployer]$ hadeploy --src app.yml --action deploy
If we look at the deployment messages, we can see only two tasks where performed:
TASK [Setup configuration file /etc/supervisord_tsupv.conf] ****************************************************************************************************
changed: [en1]
TASK [Setup supervisord_tsupv.service] *************************************************************************************************************************
ok: [en1]
TASK [Setup /usr/bin/supervisorctl_tsupv] **********************************************************************************************************************
ok: [en1]
RUNNING HANDLER [Restart supervisord_tsupv service] ************************************************************************************************************
changed: [en1]
.....
.....
PLAY RECAP *****************************************************************************************************************************************************
en1 : ok=19 changed=2 unreachable=0 failed=0
-
The supervisor configuration file
/etc/supervisord_tsupv.conf
has be regenerated, with the http_server configuration. -
The supervisor
supervisord_tsupv
has been restarted
Of course, if we trigger again a deployment no change will be performed anymore.
We can now point a browser to http://en1.mycluster.com:9001/
, to see:
Of course, in real life, we will not use 'admin' as a password. And we will not provide it as clear text, but provide its SHA-1 form instead. See Supervisors Web interface
Notifications: daemon restart
Let's say we now want to update our program1.sh
.
We can modify it and trigger a new deployment. HADeploy will notice the modification and push the new version on the target hosts. But, the running daemon will be unafected.
We can restart it manually. But, HADeploy provide a mechanisme to automate this. By adding a notify
attribute to the files
definition:
files:
- { scope: all, notify: ["supervisor://tsupv/program1"], src: "tmpl://program1.sh", dest_folder: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
This will instruct HADeploy to restart the program program1
of the supervisor instance tsupv
in case of modification of program1.sh
TASK [Copy template file 'program1.sh' to '/opt/tsup/program1.sh'] *********************************************************************************************
changed: [en1]
TASK [Get current state for supervisor_tsupv_program1 programs] ************************************************************************************************
ok: [en1]
RUNNING HANDLER [Restart supervisor_tsupv program program1] ****************************************************************************************************
changed: [en1]
...
...
PLAY RECAP *****************************************************************************************************************************************************
en1 : ok=20 changed=2 unreachable=0 failed=0
Non-root deployment
Creating a supervisor instance needs root access on the target nodes.
Requiring root access for all deployment can be a problem, as it may restrain the number of people able to deploy application. And, an error in the deployment file can corrupt or even destroy the whole target cluster.
A solution to this problem would be to split the deployment in two phases:
-
An initial deployment, under root access, will create dedicated folders, namespace, databases, ... thus defining a limited space inside which the application will be deployed. Such space can be called 'box', 'container' or 'corridor'.
-
Application deployment, performed under non-priviledged account (Typically the application technical account) and which will be limited to operate in the space defined previously.
In our use case, the supervisor instance creation will be performed during the initial stage, while program1.sh will be deployed under application account.
This will lead to split our application file in two parts:
A first file, named init.yml
:
hosts:
- { name: en1, ssh_host: en1.mycluster.com, ssh_user: root, ssh_private_key_file: keys/root_id }
- { name: en2, ssh_host: en2.mycluster.com, ssh_user: root, ssh_private_key_file: keys/root_id }
users:
- login: tsup
authorized_keys:
- keys/tsup_id.pub
folders:
- { scope: all, path: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
supervisors:
- name: tsupv
scope: all
user: tsup
group: tsup
http_server: { endpoint: "0.0.0.0:9001", username: admin, password: admin }
Let's comment it:
-
First, as before, we access the target node using the
root
credential. -
As before, we create our application technical account. But we had an
authorized_key
to be able to directly log on under this account. Of course, we have previously generated a key pair. -
Then, we create the application main folder. Note we need to be root to create a folder in
/opt
. -
And we create the supervisor instance as previously.
Now, we will create the application deployment part, as app.yml
:
hosts:
- { name: en1, ssh_host: en1.mycluster.com, ssh_user: tsup, ssh_private_key_file: keys/tsup_id }
- { name: en2, ssh_host: en2.mycluster.com, ssh_user: tsup, ssh_private_key_file: keys/tsup_id }
local_templates_folders:
- ./templates
files:
- { scope: all, notify: "tsupv:program1", src: "tmpl://program1.sh", dest_folder: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
supervisor_programs:
- name: program1
supervisor: tsupv
command: "/bin/bash /opt/tsup/program1.sh"
supervisors:
- name: tsupv
scope: all
user: tsup
group: tsup
http_server: { endpoint: "0.0.0.0:9001", username: admin, password: admin }
managed: no
Let's comment it:
-
We now access the target host with the
tsup
credential. -
As before, we define a local location as our template repository.
-
As before, we copy the
program1.sh
in the/opt/tsup
folder. -
And, as before, we declare this program to be managed by the supervisor instance we defined previously.
But, there is a problem. To perform this last operation, HADeploy need to know most of the parameters of the supervisor (Mostly its file layout). This is why we need to insert the supervisor definition in the file, here at the end.
But, we just want to describe it. We don't want to manage it (This is the role of the init.yml
script). This is why we add the managed: no
attribute.
Don't be confused. Managing supervisor's program is not managing the supervisor by itself.
You can now deploy the application container:
hadeploy --src init.yml --action deploy
And then the application by itself:
hadeploy --src app.yml --action deploy
This pattern provide other benefits:
-
Application (re)deployment is faster, as the number of task is reduced.
-
We have now finer control on application stop and start.
hadeploy --src app.yml --action stop
will only stop the application programs, not the supervisor itself.
Root key protection
But, how can we prevent a regular (non-root) user to launch the init.yml
script ?
A simple answer is to arrange for the root private key (keys/root_id
in our case) to be accessible only by the root user (or by a priviledged users group) on the HADeploy node.
This is an easy way to bind priviledged access on target cluster to privileged access on the deployment node.
Non-root deployment improved
If you look at the previous init.yml
and app.yml
, you will find some redundancies:
-
The host inventory
-
The supervisor definition.
So, we need some refactoring to have something more maintainable:
First, we isolate our target cluster inventory definition in a separate file mycluster.yml
:
hosts:
- { name: en1, ssh_host: en1.mycluster.com }
- { name: en2, ssh_host: en2.mycluster.com }
Note than we have remove all ssh connection information.
The, we isolate the supervisor(s) definitions in another file supervisors.yml
:
supervisors:
- name: tsupv
scope: all
user: tsup
group: tsup
http_server: { endpoint: "0.0.0.0:9001", username: admin, password: admin }
managed: ${managing_supervisors}
Note than we can set the managed
attribute by a variable.
And here is the new version of the init.yml
file:
host_overrides:
- name: all
ssh_user: root
ssh_private_key_file: keys/root_id
users:
- login: tsup
authorized_keys:
- keys/tsup_id.pub
folders:
- { scope: all, path: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
vars:
managing_supervisors: yes
include: supervisors.yml
And the new version of the app.yml
file:
host_overrides:
- name: all
ssh_user: tsup
ssh_private_key_file: keys/tsup_id
vars:
managing_supervisors: no
include: supervisors.yml
local_templates_folders:
- ./templates
files:
- { scope: all, notify: "tsupv:program1", src: "tmpl://program1.sh", dest_folder: /opt/tsup, owner: tsup, group: tsup, mode: "0755" }
supervisor_programs:
- name: program1
supervisor: tsupv
command: "/bin/bash /opt/tsup/program1.sh"
We can now proceed with the application container deployment:
hadeploy --src mycluster.yml --src init.yml --action deploy
And then the application deployment:
hadeploy --src mycluster.yml --src app.yml --action deploy
Note than we provide the mycluster.yml
on the command line. An alternative would be to include it in the init.yml
and the app.yml
. But, if you have several target clusters, this solution is far more flexible.