Technical skills after almost 4 years at ICG

Posted on Thu 28 September 2017 in work

Please note that this post is intentionally written in past tense to avoid having to rewrite it completely in the future.

This post aims to be a summary of technologies I've learned to use during my period at the Institute for Computer Vision and Computer Graphics at TU Graz.

Table of Contents generated with DocToc

monitoring

While I was fortunate to avoid Nagios, I have quite a lot of experience with Sensu and its quirks. I sent several patches to their plugins and deployed a sizeable setup of checks and metrics, some of which were heavily customised to the ICG's needs. We deployed Sensu for Linux and Windows. You might like this post about how to set up Sensu with Puppet I've written about part of that work. I was also active on the sensu-users mailing list.

Furthermore I deployed Logstash and Logstash-forwarder for collecting, analysing and structuring log files. This work included coming up with custom patterns for matching as well as defining configuration for ingesting logs of dpkg, syslog, apache & apache-error, nginx & nginx-error, seafile & seahub as well as fail2ban.

The collected data was available via Grafana for metrics, Uchiwa for results of Sensu and Kibana for logs, all protected behind an Apache reverse proxy with authentication via whitelisted LDAP accounts. I integrated custom URLs to easily go from Sensu to the corresponding results in Grafana and Kibana and built multiple custom Grafana dashboards. Those dashboards either displayed general information or were custom-tailored to solving particular problems in operations.

Before becoming more intimate with Sensu, I wrote our own script for monitoring the output of two CLIs of hardware RAID vendors (storcli, tw_cli).

web servers

I have worked with Apache as well as Nginx, the majority of time with Apache, setting up static websites, WSGI based applications and reverse proxies with LDAP or password authentication. A part of the work with Apache was done via Puppet modules.

source control

I was in charge of two GitLab instances - one of which I migrated from a source installation to the Omnibus package - and maintained an old Apache Subversion instance. I am a strong supporter of Git and if need be can adjust to using SVN again. I've helped several of the researchers to set up their projects for GitLab Continuous Integration and used the feature myself extensively for both development and administration projects - a topic very dear to me.

datastores

For monitoring purposes I have built a setup including Graphite for storing metrics data, Redis for keeping monitoring related transient data for Sensu and Elasticsearch for storing logs with Elastic curator for removing them after a defined retention period. Setup of Graphite and Redis was done via Puppet modules.

I have limited knowledge of MySQL and PostgreSQL. I was part of a team developing an application using a Postgres backend. Further tasks included creating backups with pg_dump and editing a huge database dump by hand in an editor. The thought of this should give you nightmares. MySQL tasks were mostly creating backups.

I learned a few things about LDAP while I was modifying users, groups and configuration entries during everyday operations and operating system upgrades. Given my dislike of Java, I refused to install Java and by extension avoided using Apache Directory Studio, instead writing my .ldif files by hand using templates in my editor and applying them via ldapadd.

virtualization and containerization

For various development processes I was using Virtualbox together with Vagrant for easy setup of new machines that get thrown away. In production, Xen was used - I've written about some of that experience. Additionally, I built several custom Docker containers for the GitLab CI. We did not use any Docker containers hosting services in production. I have written about building a container for this very blog though.

configuration management

I've written the Puppet setup at the institute managing many, many services. Some hosts are entirely controlled by Puppet. The configuration is deployed from a git repository to the Puppetmaster after being run through syntax tests and integration tests via GitLab and Docker.

For shorter, one-shot tasks on multiple hosts I've lately taken a liking to Ansible. Generally I find configuration management solutions more intuitive than ones that specify processes.

security

I've cleaned up, simplified and improved readability of an existing Shorewall setup. The entire configuration is being dry-run in the GitLab CI before being deployed to production on success.

I've configured and deployed TLS for several services, including LDAP, web servers, IMAP/POP3 (Cyrus), SMTP (Postfix), Rabbitmq, Logstash and more. I've rolled out several versions of Openssh configurations and Fail2ban deployments via our own Puppet code. Generally I'm of the opinion that even traffic in your own datacenter should be encrypted - that's a remainder after reading that Google's internal lines were tapped a few years back.

I was in charge during some unfortunate events where security issues popped up and had to be investigated or violations to our IT policies had to be dealt with. These policies were based on the Admin Team's decisions and put into text by me. I've submitted detailed written reports about these activities to my boss.

troubleshooting OSs

While my main focus has been on Linux servers - mostly Ubuntu with some Debian - I have been busy troubleshooting problems on Linux desktops too, including broken X, crashing LightDM, missing CUDA drivers, issues with Secure Boot (and the shim-signed package). I've also seen my fair share of macOS problems given that we had several Mac users, including myself. Amongst the problems there were inaccessible BootCamp and completely broken installations due to users aborting upgrade processes. I have been mostly saved from Windows issues by my colleagues handling those. I have however written a tiny script allowing you to easily boot into Windows from your Ubuntu installation after realising the convenience of such a solution when using BootCamp.

upgrades and migrations

Over the years I've successfully upgraded many of our existing servers via do-release-upgrade or changing the Debian repository and fixed all occurring issues. I've also migrated a part of our infrastructure from DRBD 8 to DRBD 9 in order to replicate to more machines without layering.

Upgrading was made much easier by having all systems at a current state which I achieved by using Unattended Upgrades for many of our sources. Reading changelogs and news about new features to improve our infrastructure has been very helpful in that regard. One achievement I am very proud of is having a patch accepted into Ubuntu (Precise and Trusty).

Usually I'm working with a list of things that I check after upgrading and manually merge or rewrite configuration files which I find using the following command. A look into the logfiles of various services is always a good idea too.

find /etc -name "*dpkg-*" -or -name "*ucf-*" -or -name "*.merged" | sort

Additionally I oversaw and implemented the switch from manual configuration and firewall rule sync to a setup controlled by Puppet which is able to keep two hosts synced and configured.

remainders

This is the grab-bag area. Most things in here didn't warrant a longer section.

I replaced a manual process of dealing with DHCP with a Puppet-controlled setup - that's thankfully very easy using ISC-DHCP-SERVER. I deployed multiple applications that use Shibboleth authentication and worked with the central university IT section on that. I configured and deployed Mattermost and Seafile, made sure our network mounts, automounts, samba shares and Mailman instance worked and NTP is synced.

writing

I've written extensive documentation for new users and the on-boarding process, documentation for the Admin Team as well as a policy section. Additionally, I published several posts about my work with permission from the ICG here on my personal website.

Furthermore I meticulously took notes on all new issues and their solutions in our GitLab issue tracker, so that a knowledge base containing previous experiences was created instead of letting all my experience evaporate.