This was written as part of a job interview. This guide uses lxd as it is a test setup.
KVM setup
Since I lost my Fusion license along with alok@ideadevice.com I decided to give KVM a spin on the Thinkpad. Downloaded Bionic Beaver (18.04) LTS and set it up using virt-manager
. Could not access the console
Error connecting to graphical console: Error opening spice console, SpiceClientGtk missing
Some googling and installing some packages didn’t help so I moved on by using virt-viewer
. Remember to use --connect qemu:///system
with all libvirt
commands when using it as a regular user or set LIBVIRT_DEFAULT_URI=qemu:///system
.
Once the VM is up, follow the setup guide to configure juju and lxd. Ignore the xenial-backports
bit as we are using a newer lxd anyway.
While doing lxd init
. One time it didn’t ask about setting a bridge but instead offered to setup a fan-out instead. I’m not sure why it did this and when I ran the command again, it asked about the bridge.
Routing
Once I had the lxd controller up, I realised that I should have configured the VM with a bridge instead of NAT as now I could not access the containers inside the Ubuntu VM from the Debian host. This was worked around by adding a route on the host.
First, find the VM IP from the host.
weyl:~$ virsh domifaddr ubuntu1804
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet0 52:54:00:b8:11:b6 ipv4 192.168.122.236/24
Then, find the subnet for the lxd containers from the VM.
alok@ubuntu:~$ lxc network get lxdbr0 ipv4.address
10.171.182.1/24
Now add a route on the host.
weyl:~$ sudo ip r add 10.171.182.1/24 via 192.168.122.236 dev virbr0
The interface virbr0
was found by inspecting ip route
to see which bridge the VM was using.
Python
Remember to do an apt-get update
on a fresh install. pip
and other essentials are not available until this is done.
Install required packages
$ sudo apt install build-essential python python-dev tox
Create a virtualenv
$ mkdir ~/venv
$ virtualenv ~/venv/py2
$ source ~/venv/py2/bin/activate
Now make test
can be executed.
Dev setup
Install the charm binary via snap
$ sudo snap install charm
Charms can be obtained by charm pull
and will place the charm in cwd
. Largely like git
.
$ charm pull rabbitmq-server
RabbitMQ
Followed the guide and it worked as advertised.
Initially I deployed a cluster of 2 but erroneously set min-cluster-size=3
. This caused one of the units to fail. I tried to correct this by adding a another machine (via the GUI) and trying to resolved
the error but this did not work. Not sure why.
Also, juju config rabbitmq-server management_plugin=True
caused a problem with the charm as it attempted to setup the management plugin individually on each unit. After multiple resolved
attempts, this option was set to False
.
I did not enable SSL.
Theory of operation
RabbitMQ 3.5.7 (the version installed) is capable of clustering without needing any third party support like corosync and ceph.
Events
Charms operate on events. All events have hooks to hang implementations to react to those events. For instance, the install
event can be handled by installing the rabbitmq-server
package (among others). Hook implementations reside in the hooks/
directory.
Hooks
In the charm, the top-level code is in hooks/rabbitmq_server_relations.py
and the various event names are symlinked to this. Hooks are registered by decorating functions with hooks.hook(<event name>)
. By invoking hooks.execute(sys.argv)
in the __main__
section, the appropriate function is called.
RabbitMQ native clustering can be executed on any node. From the docs:
it is enough to provide one online node and the node will be clustered to the cluster that the specified node belongs to
Some dynamic data is stashed in peerstorage
.
The rabbit_utils.cluster_with()
method implements the native clustering.
Leader election is performed by the is_elected_leader()
method. As per the doc:
It relies on two mechanisms to determine leadership:
- If juju is sufficiently new and leadership election is supported, the is_leader command will be used.
- If the charm is part of a corosync cluster, call corosync to determine leadership.
- If the charm is not part of a corosync cluster, the leader is determined as being “the alive unit with the lowest unit numer[sic]”. In other words, the oldest surviving unit.
So in this case it is going to be rabbitmq-server/0
that is elected the leader.
Actions
Actions are for
Debugging
To further investigate the management plugin problem,
$ juju debug-hooks rabbitmq-server/0
If the unit is in error, the hook that failed will be setup in the tmux session, be patient it takes a minute sometimes.
Turns out that the following patch fixes the problem.
--- rabbitmq_server_relations.py.orig 2018-07-13 07:35:09.268064964 +0000
+++ rabbitmq_server_relations.py 2018-07-13 07:35:28.220164194 +0000
@@ -605,6 +605,10 @@
rsync(os.path.join(charm_dir(), 'scripts',
'check_rabbitmq_queues.py'),
os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_queues.py'))
+ if config('management_plugin'):
+ rsync(os.path.join(charm_dir(), 'scripts',
+ 'check_rabbitmq_cluster.py'),
+ os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_cluster.py'))
if config('stats_cron_schedule'):
script = os.path.join(SCRIPTS_DIR, 'collect_rabbitmq_stats.sh')
cronjob = CRONJOB_CMD.format(schedule=config('stats_cron_schedule'),
@@ -613,10 +617,6 @@
rsync(os.path.join(charm_dir(), 'scripts',
'collect_rabbitmq_stats.sh'), script)
write_file(STATS_CRONFILE, cronjob)
- if config('management_plugin'):
- rsync(os.path.join(charm_dir(), 'scripts',
- 'check_rabbitmq_cluster.py'),
- os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_cluster.py'))
elif os.path.isfile(STATS_CRONFILE):
os.remove(STATS_CRONFILE)
I made the changes in my local setup and deployed with
$ juju upgrade-charm --force-units --path ~/charms/rabbitmq-server rabbitmq-server
Now a resolved
invocation fixed the problem. Thanks to marco who figured this method out.
Open questions
How to upgrade the charm itself on a machine- What information is stored in relations?
- How is information stored in relations?