Puppet, OpenStack, where to start? Ceph

Ten months into this job, and I still feel like an OpenStack novice, but it feels better than a couple of months ago at least. In fact last week we had what I felt was a big automation win, where we deployed a Ceph OSD node from bare metal to joining the cluster without any ‘manual’ intervention. That automation needs more, well, automation, but at least it’s repeatable and consistent now. But I’ve leapt ahead of myself. This is a heavily abbreviated history of how we got here:

  • Luca deployed OpenStack with Fuel. Five short words which actually represent months of detailed work and a fair bit of complaining from his cubicle. Disk partitioning, network bonds, bridges, VLANs, GRE, VXLAN, MTU settings, bugs, confusing or missing or out of date documentation, people in the wrong timezone for proper conversations… oh my. I helped a bit.
  • I created an All-In-One (AIO) deployment with the puppet-openstack-integration (POI) project. I started comparing the (hiera) data between it and the Fuel-deployed stack.
  • Using POI I deployed a compute node almost to the point of working, but we managed to break our dev stack before we got to iron out the final kinks.
  • Luca got us started with MAAS, which proved a little more intuitive than xCAT and being built by Canonical it works well with Ubuntu. We customised the MAAS deployment process to suit our hardware and needs.
  • Ceph is not as much of a core integrated component of OpenStack as the other parts so it is another good candidate for early deployment tooling, and so we got started with Puppet-Ceph. In the end we found spjmurray’s Ceph module more intuitive and reliable, and it handled the new long term stable release 12.x Luminous almost as soon as it was released.

Here’s how we deploy a Ceph OSD node:

  • PXE boot the node:
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis power off
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis bootdev pxe
$ ipmitool -I lanplus -H $IP -U $user -P $pass chassis power on
  • Commission the node: Straightforward MAAS step from the documentation.
  • Customise the node: Network bridges, disk partitions, hostname. We have a hundred-line script to do this, and the main tools in use are the MAAS CLI and jq.
  • Prepare the curtin (curt installer) script (largely one-off work, although we continue to tweak it). Currently this just installs the Puppet Agent.
  • Deploy the node: Straightforward MAAS step from the documentation.

Once the node is deployed, it lets Puppet and our modules (which in turn use the Ceph module) take over, and we have more OSDs in our cluster!

$ ceph osd df tree
 -1       51.97385        - 53220G   177G 53043G 0.33 1.00   - root default
-21       20.06506        - 20547G 70573M 20478G 0.34 1.01   -     host new-node
 12   hdd  1.82410  1.00000  1867G  6458M  1861G 0.34 1.02  90         osd.23
 13   hdd  1.82410  1.00000  1867G  6433M  1861G 0.34 1.01  95         osd.24
 25   hdd  1.82410  1.00000  1867G  6344M  1861G 0.33 1.00  71         osd.25
 26   hdd  1.82410  1.00000  1867G  6429M  1861G 0.34 1.01  74         osd.26
 27   hdd  1.82410  1.00000  1867G  6394M  1861G 0.33 1.00 103         osd.27
 28   hdd  1.82410  1.00000  1867G  6412M  1861G 0.34 1.01  94         osd.28
 29   hdd  1.82410  1.00000  1867G  6429M  1861G 0.34 1.01 102         osd.29
 30   hdd  1.82410  1.00000  1867G  6559M  1861G 0.34 1.03 104         osd.30
 31   hdd  1.82410  1.00000  1867G  6343M  1861G 0.33 1.00  76         osd.31
 32   hdd  1.82410  1.00000  1867G  6474M  1861G 0.34 1.02  98         osd.32
 33   hdd  1.82410  1.00000  1867G  6293M  1861G 0.33 0.99  69         osd.33

CloudAtCost money grab

I was intrigued by the pay once model of Cloud At Cost and spent some money there a few years ago. The machine is good enough for my low level purposes, but I have seen a number of articles over the years complaining of their network performance and poor service. I figured you get what you pay for, and I would use it while it worked, and dump it if I hit problems. Besides, I could leave it lying idle and use it later when they’ve improved right? My one-time payment gives me a VPS for life.

So I thought.

I stopped using my machines a while back when there was a problem with a reboot, and I couldn’t get any help to get access again. The reimaging process was buggy, and of course without any effective support, that left my machine dead. Oh well, see above about leaving it idle until later.

A year or more later I’ve logged in to have another poke at it, and nothing has changed in the interface, including my inability to reimage my machines. I don’t think it’s worth submitting a support ticket, judging by the responses on other tickets and questions I can see in the system. On top of that, I now see an invoice for USD$9 for ‘annual service fee’, which is perhaps understandable given their description, but entirely unpalatable given my experience. Oh well, good luck to them I guess.

So long, and thanks for the very few fish, C@C.

Airline staff

Daddy, why is that gate so narrow?

So people don’t take their big bags onto the aeroplane themselves, and they check them in to go in the cargo hold.

So the pilot can put them in the aeroplane?

Yes, the pilot needs to put them in the place underneath. Well the pilot has just one special job of flying the plane. Other people do other special jobs like that one. It actually takes a lot of people to help get us and our things to fly from one place to another.

I don’t love flying – and I reckon few people do – but I am still grateful for the people who make it possible. I spend some hours packing, getting the timing and the papers all right, lining up, waiting and then waiting some more. For that, I get to be somewhere else much faster than by other ways. They get a day’s pay, which I guess they consider a fair trade. I don’t know how many flights per week a cargo loader, air steward or pilot might do, or how often they meet that tiny number of passengers who make things difficult, or what they have to deal with behind the scenes. So, thank you. I have time to sit and feel grateful, because the kids are settled.

(This is actually from November 2015, but the words haven’t dated. I’m still grateful to sit still when the kids are settled, and to anyone who helps that happen!)

Thank you, honest strangers

Life has been heavy going for the past year, more so than ever before. It’s had ups and downs, and yesterday was largely a down – I forgot too many things and failed to communicate with people properly so that only made things worse. Anyway, they were all very good about it and I’m lucky to have great caring people around me helping make it all work.

Today was better. I had time to get things done which will mercifully remain done. Washing, cleaning the chook cage, getting the kids to brush their teeth… have to be done again and again, but it’s good to make some progress on the backlog.

Anyway, to the title of the post: I dealt with two unfamiliar situations today, and both of them turned out more smoothly than I dared hope: Selling a low-value car, and shipping a kitchen appliance across the country to a Gumtree buyer with just-in-time (significant) payment of postage. Both of these have scam potential, certainly evidenced by a couple of the messages I’d received from potential buyers (or scammers). I reckon the best response is to treat everyone as if they’re honest, but take protective steps along the way, like taking photos of goods, explicitly discussing how things are happening, and recording relevant information. There’s no sense in being paranoid and putting people off side.

To cut a long story short, one person handed me cash, let me photograph their drivers licence and took away the car; another sent me the full agreed payment plus enough to cover relevant costs, and I no longer have these things cluttering up my life. To both of you, thank you. We drank champagne tonight for a few reasons, and you helped.

Enable Java WS on Centos 7

Sometimes when you’re making changes to systems it feels wrong. Insecure, hacky, manual and frustrating. But then you move on, hoping you don’t have to do it again. Well, here’s how I got to use the IPMI (iLO, BMC, iDRAC, etc) web interface of some old servers from my Centos 7 server:

Access the IPMI web interface

ws $ ssh -X server
server $ firefox $some_ip

Login, browse to the ‘remote control’ section (they’re all pretty similar), click launch. It pops up a prompt asking me what I would like to use, to launch jviewer.jnlp.

Install and configure Java

I found a guide which says to install and configure Java; java-1.8.0-openjdk was already installed out of the box, so it was just a matter of configuring it:

server # update-alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
*  1           java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-
 + 2           /usr/java/jre1.8.0_121/bin/java

Enter to keep the current selection[+], or type selection number: 2

Configure Firefox to launch .jnlp files with javaws

Firefox doesn’t know how to run javaws, so it needs to be told, via these instructions.

server $ vim .mozilla/firefox/vgenq8rj.default/mimeTypes.rdf

Mangle Java security settings

Java (rightly) complains about security settings. It’s only for internal boxes on a particular network, but BeyondCorp thinking still makes me cringe. Open the Java Control Panel:

ws $ ssh -X server
server $ /usr/java/jre1.8.0_121/bin/ControlPanel

In Security, Exception Site List, I added the URLs of the servers I need to manage. It works. I feel dirty. I suspect I could install an older version of Java to skip this step, and feel just a bit dirtier.

Taming a Mac, Lubuntu style

I’ve supported and administered Mac servers and desktops before, but never leapt on the tech worker bandwagon of actually using one for my own work. At the new office, I was handed a Macbook Pro, so I figured it was time to learn.

Learning when to use the various meta keys is a job for my fingers to keep practising – I’m getting better, and hopefully it won’t break my brain when I head home to my Lubuntu desktops.

Some web searching led me to fiddle with Automator to help fill some gaps, but the resulting keyboard shortcuts didn’t always respond, so I’ve settled on some programs to help:

  • Slate for launching, switching, resizing and placing programs in the desktop
  • Mission Control gives me multiple desktops and shortcuts to switch between them
  • HyperSwitch gives me a more comfortable alt-tab (programs within the current desktop) than the full-blown command-tab (all programs)
  • Quicksilver promises more features than Spotlight, and so far delivers except for System Preferences contents

The three programs I’ve installed have more features than I’m using, and I may explore them later.

I’m content with how it’s all working, and I am hesitant to throw it all away and install Lubuntu. That’s partly because of the time spent doing so, and partly because I’m not certain about how well it would work. Hardware compatibility, external screen resolution, power management… things seem to work just fine in *buntu, but I’m quite sure Apple have written things to work excellently in OS X. I want to spend time using the computer, not twiddling with it. I was initially dubious, but my OS X is now Good Enough(TM) for me to get on with some work.

VMWare upgrade 5.1 -> 6.0 notes

In 2013, the fine folks at Interconnekt sold us a fine Dell PowerEdge T420 server to run VMWare and a bunch of virtual machines. VMWare came installed on a dual SD module, and this came in handy later. We quickly discovered that VMWare ESXi 5.1 has a RAM limit of 32GB, so we removed the other 32GB and sailed on.

Three years on, and we have bumped up against that limit, our VMs exhausting available supply. We could pare down our use on some machines, but knowing there is an upgrade available and some DIMMs in the cupboard, I’m upgrading.

Dell recommend their customised version of VMWare, so I downloaded that through the Drivers & Downloads part of their website. I was stumped finding the link for a while, but eventually found I had to ‘change OS’ from Windows to VMWare, and found it under Enterprise Solutions. There is nothing to suggest skipping 5.5 is a bad idea, so I grabbed 6.0 and burned it to CD.

Backup the VMs

I installed ghettoVCB and did a test backup run of all machines:

$ wget -O ghettoVCB.vib https://github.com/lamw/ghettoVCB/blob/master/vghetto-ghettoVCB.vib?raw=true
$ scp ghettoVCB.vib root@server:
esxi # esxcli software vib install -v /ghettoVCB.vib
 VIB virtuallyGhetto_bootbank_ghettoVCB_1.0.0-0.0.0 violates extensibility rule checks: [u'(line 23: col 0) Element vib failed to validate content']
 VIB virtuallyGhetto_bootbank_ghettoVCB_1.0.0-0.0.0's acceptance level is community, which is not compliant with the ImageProfile acceptance level partner
 To change the host acceptance level, use the 'esxcli software acceptance set' command.
 Please refer to the log file for more details.
esxi # esxcli software vib install -v /ghettoVCB.vib -f
Installation Result
   Message: Operation finished successfully.
   Reboot Required: false
   VIBs Installed: virtuallyGhetto_bootbank_ghettoVCB_1.0.0-0.0.0
   VIBs Removed: 
   VIBs Skipped: 

GhettoVCB creates a snapshot, does a backup of that snapshot, then deletes the snapshot. It fails if a snapshot already exists.

esxi # vim /etc/ghettovcb/ghettoVCB.conf
esxi # /opt/ghettovcb/bin/ghettoVCB.sh -a -c /etc/ghettovcb/ghettoVCB.conf -l /ghettoVCB-`date +%Y%m%d`.log

Some VMs had old unused snapshots, and some were in an old format, probably from when they were copied across from previous versions of ESXi. Therefore those backups failed. Right click, upgrade VM worked. Delete old snapshots. A successful test backup took 24 hours to our NAS – slow, but that’s okay in this case.

I didn’t want to manage changes in the event of failure, so I shutdown all VMs before running the actual backup. 24 hours later, I was ready to upgrade ESXi and install more RAM.

Upgrade ESXi to 6.0.0 Update 2

I put ESXi in maintenance mode so it wouldn’t automatically boot VMs on next boot. I first wanted to see that all is well. Shutdown ESXi, physically removed all HDDs and network cable from NAS – for paranoia’s sake, boot up with the CD I previously burned. Select upgrade and the dual SD module. Dammit, the upgrade failed because there’s a CommunitySupported VIB installed – ghettoVCB! Boot up, remove it:

esxi # esxcli software vib remove --vibname=ghettoVCB

Shutdown and boot the CD again. The upgrade was simple as can be, and for interest I timed it: 8 minutes of processing time. ESXi boots, complains about missing HDDs, and works just fine.

I’d heard of another benefit that I am now going to realise: I don’t have to boot my Windows VM to run vSphere Client any more! I might still for some things, because web clients sometimes suck, but not having to is great.

Install 4 x 8GB 2Rx4 1333MHz RAM DIMMs

Now, on to the RAM upgrade. The machine is nicely laid out, with easy access to the DIMMs. Naturally I want to put two of my DIMMs into each bank of slots, but where? There are two used in each out of six. The information panel on the case was helpful but not conclusive. Over to the Owner’s Manual.

“Populate all sockets with white release tabs first and then black.”
“Advanced ECC mode extends SDDC from x4 DRAM based DIMMs to both x4 and x8 DRAMs”
“Memory Optimized (Independent Channel) Mode … supports SDDC only for memory modules that use x4 device width and does not impose any specific slot population requirements”

A bit of discussion led me to believe that I should just place them closes to the CPU and with x4 DIMMs it didn’t make a lot of difference. It worked. Sadly I left out the big plastic baffle from the case when I reassembled – not a huge problem, but I need to schedule another outage to reinstall it.

Job done. I’m happy.