Puppet, OpenStack, where to start? Ceph

Ten months into this job, and I still feel like an OpenStack novice, but it feels better than a couple of months ago at least. In fact last week we had what I felt was a big automation win, where we deployed a Ceph OSD node from bare metal to joining the cluster without any ‘manual’ intervention. That automation needs more, well, automation, but at least it’s repeatable and consistent now. But I’ve leapt ahead of myself. This is a heavily abbreviated history of how we got here:

  • Luca deployed OpenStack with Fuel. Five short words which actually represent months of detailed work and a fair bit of complaining from his cubicle. Disk partitioning, network bonds, bridges, VLANs, GRE, VXLAN, MTU settings, bugs, confusing or missing or out of date documentation, people in the wrong timezone for proper conversations… oh my. I helped a bit.
  • I created an All-In-One (AIO) deployment with the puppet-openstack-integration (POI) project. I started comparing the (hiera) data between it and the Fuel-deployed stack.
  • Using POI I deployed a compute node almost to the point of working, but we managed to break our dev stack before we got to iron out the final kinks.
  • Luca got us started with MAAS, which proved a little more intuitive than xCAT and being built by Canonical it works well with Ubuntu. We customised the MAAS deployment process to suit our hardware and needs.
  • Ceph is not as much of a core integrated component of OpenStack as the other parts so it is another good candidate for early deployment tooling, and so we got started with Puppet-Ceph. In the end we found spjmurray’s Ceph module more intuitive and reliable, and it handled the new long term stable release 12.x Luminous almost as soon as it was released.

Here’s how we deploy a Ceph OSD node:

  • PXE boot the node:
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis power off
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis bootdev pxe
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis power on
  • Commission the node: Straightforward MAAS step from the documentation.
  • Customise the node: Network bridges, disk partitions, hostname. We have a hundred-line script to do this, and the main tools in use are the MAAS CLI and jq.
  • Prepare the curtin (curt installer) script (largely one-off work, although we continue to tweak it). Currently this just installs the Puppet Agent.
  • Deploy the node: Straightforward MAAS step from the documentation.

Once the node is deployed, it lets Puppet and our modules (which in turn use the Ceph module) take over, and we have more OSDs in our cluster!

$ ceph osd df tree
 -1       51.97385        - 53220G   177G 53043G 0.33 1.00   - root default
-21       20.06506        - 20547G 70573M 20478G 0.34 1.01   -     host new-node
 12   hdd  1.82410  1.00000  1867G  6458M  1861G 0.34 1.02  90         osd.23
 13   hdd  1.82410  1.00000  1867G  6433M  1861G 0.34 1.01  95         osd.24
 25   hdd  1.82410  1.00000  1867G  6344M  1861G 0.33 1.00  71         osd.25
 26   hdd  1.82410  1.00000  1867G  6429M  1861G 0.34 1.01  74         osd.26
 27   hdd  1.82410  1.00000  1867G  6394M  1861G 0.33 1.00 103         osd.27
 28   hdd  1.82410  1.00000  1867G  6412M  1861G 0.34 1.01  94         osd.28
 29   hdd  1.82410  1.00000  1867G  6429M  1861G 0.34 1.01 102         osd.29
 30   hdd  1.82410  1.00000  1867G  6559M  1861G 0.34 1.03 104         osd.30
 31   hdd  1.82410  1.00000  1867G  6343M  1861G 0.33 1.00  76         osd.31
 32   hdd  1.82410  1.00000  1867G  6474M  1861G 0.34 1.02  98         osd.32
 33   hdd  1.82410  1.00000  1867G  6293M  1861G 0.33 0.99  69         osd.33