July 13, 2018

Getting started with Juju

This was written as part of a job interview. This guide uses lxd as it is a test setup.

KVM setup

Since I lost my Fusion license along with alok@ideadevice.com I decided to give KVM a spin on the Thinkpad. Downloaded Bionic Beaver (18.04) LTS and set it up using virt-manager. Could not access the console

Error connecting to graphical console: Error opening spice console, SpiceClientGtk missing

Some googling and installing some packages didn’t help so I moved on by using virt-viewer. Remember to use --connect qemu:///system with all libvirt commands when using it as a regular user or set LIBVIRT_DEFAULT_URI=qemu:///system.

Once the VM is up, follow the setup guide to configure juju and lxd. Ignore the xenial-backports bit as we are using a newer lxd anyway.

While doing lxd init. One time it didn’t ask about setting a bridge but instead offered to setup a fan-out instead. I’m not sure why it did this and when I ran the command again, it asked about the bridge.

Routing

Once I had the lxd controller up, I realised that I should have configured the VM with a bridge instead of NAT as now I could not access the containers inside the Ubuntu VM from the Debian host. This was worked around by adding a route on the host.

First, find the VM IP from the host.

weyl:~$ virsh domifaddr ubuntu1804
 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet0      52:54:00:b8:11:b6    ipv4         192.168.122.236/24

Then, find the subnet for the lxd containers from the VM.

alok@ubuntu:~$ lxc network get lxdbr0 ipv4.address                                                                                                                       
10.171.182.1/24

Now add a route on the host.

weyl:~$ sudo ip r add 10.171.182.1/24 via 192.168.122.236 dev virbr0 

The interface virbr0 was found by inspecting ip route to see which bridge the VM was using.

Python

Remember to do an apt-get update on a fresh install. pip and other essentials are not available until this is done.

Install required packages

$ sudo apt install build-essential python python-dev tox 

Create a virtualenv

$ mkdir ~/venv
$ virtualenv ~/venv/py2
$ source ~/venv/py2/bin/activate

Now make test can be executed.

Dev setup

Install the charm binary via snap

$ sudo snap install charm

Charms can be obtained by charm pull and will place the charm in cwd. Largely like git.

$ charm pull rabbitmq-server

RabbitMQ

Followed the guide and it worked as advertised.

Initially I deployed a cluster of 2 but erroneously set min-cluster-size=3. This caused one of the units to fail. I tried to correct this by adding a another machine (via the GUI) and trying to resolved the error but this did not work. Not sure why.

Also, juju config rabbitmq-server management_plugin=True caused a problem with the charm as it attempted to setup the management plugin individually on each unit. After multiple resolved attempts, this option was set to False.

I did not enable SSL.

Theory of operation

RabbitMQ 3.5.7 (the version installed) is capable of clustering without needing any third party support like corosync and ceph.

Events

Charms operate on events. All events have hooks to hang implementations to react to those events. For instance, the install event can be handled by installing the rabbitmq-server package (among others). Hook implementations reside in the hooks/ directory.

Hooks

In the charm, the top-level code is in hooks/rabbitmq_server_relations.py and the various event names are symlinked to this. Hooks are registered by decorating functions with hooks.hook(<event name>). By invoking hooks.execute(sys.argv) in the __main__ section, the appropriate function is called.

RabbitMQ native clustering can be executed on any node. From the docs:

it is enough to provide one online node and the node will be clustered to the cluster that the specified node belongs to

Some dynamic data is stashed in peerstorage.

The rabbit_utils.cluster_with() method implements the native clustering.

Leader election is performed by the is_elected_leader() method. As per the doc:

It relies on two mechanisms to determine leadership:

  1. If juju is sufficiently new and leadership election is supported, the is_leader command will be used.
  2. If the charm is part of a corosync cluster, call corosync to determine leadership.
  3. If the charm is not part of a corosync cluster, the leader is determined as being “the alive unit with the lowest unit numer[sic]”. In other words, the oldest surviving unit.

So in this case it is going to be rabbitmq-server/0 that is elected the leader.

Actions

Actions are for

Debugging

To further investigate the management plugin problem,

$ juju debug-hooks rabbitmq-server/0 

If the unit is in error, the hook that failed will be setup in the tmux session, be patient it takes a minute sometimes.

Turns out that the following patch fixes the problem.

--- rabbitmq_server_relations.py.orig   2018-07-13 07:35:09.268064964 +0000
+++ rabbitmq_server_relations.py        2018-07-13 07:35:28.220164194 +0000
@@ -605,6 +605,10 @@
         rsync(os.path.join(charm_dir(), 'scripts',
                            'check_rabbitmq_queues.py'),
               os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_queues.py'))
+        if config('management_plugin'):
+            rsync(os.path.join(charm_dir(), 'scripts',
+                               'check_rabbitmq_cluster.py'),
+                  os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_cluster.py'))
     if config('stats_cron_schedule'):
         script = os.path.join(SCRIPTS_DIR, 'collect_rabbitmq_stats.sh')
         cronjob = CRONJOB_CMD.format(schedule=config('stats_cron_schedule'),
@@ -613,10 +617,6 @@
         rsync(os.path.join(charm_dir(), 'scripts',
                            'collect_rabbitmq_stats.sh'), script)
         write_file(STATS_CRONFILE, cronjob)
-    if config('management_plugin'):
-        rsync(os.path.join(charm_dir(), 'scripts',
-                           'check_rabbitmq_cluster.py'),
-              os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_cluster.py'))
     elif os.path.isfile(STATS_CRONFILE):
         os.remove(STATS_CRONFILE)

I made the changes in my local setup and deployed with

$ juju upgrade-charm --force-units --path ~/charms/rabbitmq-server rabbitmq-server

Now a resolved invocation fixed the problem. Thanks to marco who figured this method out.

Open questions

  1. How to upgrade the charm itself on a machine
  2. What information is stored in relations?
  3. How is information stored in relations?

References

  1. Understanding Juju charms
  2. Juju setup guide with LXD
  3. Juju and charm architecture overview

Powered by Hugo & Kiss.