An explanation to a hand full of rsync parameters

Posted on Thu 05 October 2017 in work • Tagged with Institute for Computer Vision and Computer Graphics

If you check out one of the community's most favourite syncing and file transfer tools, rsync on any day, you will notice it has quite a lot of parameters (rsync --help). Here's a short explanation of some of them and what they can be used for - taken from the .gitlab-ci.yml files of their respective projects.

example 1

This call is used in our automated deployment process for a web based project. It deploys the project into the correct directory, ensuring that it can still be read and executed after syncing. This is where the setting of group and user is important, since those are used for the creation of new files as well as reading of code by the interpreter. Since this is from a Git repository, it only makes sense to have a hidden file present to avoid syncing of specific files. Personally, I prefer to add --stats and --human-readable to every rsync that's used for a deployment since you can see what changed on site in the GitLab build logs.

# In the .gitlab-ci.yml the command is on one line to avoid errors
# for the sake of readability I have reformatted the call.
# The order of the optional parameters has been changed to be
# more coherent.
sudo rsync --recursive \
           --perms \
           --stats \
           --human-readable \
           --times \
           --group \
           --owner \
           --usermap=gitlab-runner:labelme,0:labelme \
           --groupmap=gitlab-runner:labelme,0:labelme \
           --super \
           --exclude-from=.rsyncExcludeFiles \
           . /home/labelme/LabelMeAnnotationTool-master/
  • Copy everything including all files and subfolders (--recursive) from the current location (.) to the target (/home/labelme/LabelMeAnnotationTool-master).
  • Ensure that the permissions are the same at the target as they were at the source (--perms).
  • Afterwards display detailed, human-readable statistics about the transfer (--human-readable, --stats).
  • Make sure the modification times of the files at the target location matches the ones at the source location (--times).
  • Modify the group and owner of the files (--group, --owner)
  • from gitlab-runner and from the user with userid 0 to labelme (--usermap=FROM_USER:TO_USER,FROM_USERID:TO_USER,--groupmap=FROM_GROUP:TO_GROUP,FROM_GROUPID:TO_GROUP).
  • Explicitly try to use super-user operations (--super), e.g. changing owners. This will lead to errors if such operations are not permitted on the receiving side, indicating a lack of permissions or filesystem features, enabling you to detect if something went wrong.
  • Skip listed files in .rsyncExcludeFiles while syncing (--exclude-from=EXCLUSION_LIST_FILE).

example 2

This call is used when deploying our Puppet configuration from Git. It was here that I first had the need to use the --*map: features, since the files initially ended up being owned by gitlab-runner. This is fine when every file you deploy via Puppet to another machine is explicitly listed with its permissions and owner set in your codebase. If this is not the case, Puppet (3.x) will implicitly set the owner to the same UID that is used on the Puppetmaster - leading to all kinds of strange situations. To avoid this, I'm mapping owners and groups.

Additionally I'll chmod 640 all files and chmod 750 all directories that have been synced via the call to avoid having unsafe permissions on anything. All critical things should have their permissions set explicitly in our codebase anyway.

I have arrived at this specific combination of options when I wanted to list only files that are really changed at the Deploy step. Since during downloading dependencies the cache of Puppet modules is invalidated, all external files are marked as new every time. This can be circumvented via checksumming but it still leaves the modification date of directories changed (they are set when the downloaded archives are unpacked), therefore requiring the --omit-dir-times. Now, with this combination and --verbose the logs contain only files changed in our codebase and ones that changed due to changes in our dependencies. There are no longer hundreds of files marked as changed just because r10k needed to fetch a module again.

# in the .gitlab-ci.yml the command is on one line to avoid errors
# for the sake of readability I have reformatted the call.
# The order of the optional parameters has been changed to be
# more coherent.
sudo rsync --recursive \
           --times \
           --omit-dir-times \
           --checksum \
           --sparse \
           --force \
           --delete \
           --links \
           --exclude=.git* \
           --group \
           --owner \
           --usermap=gitlab-runner:root \
           --groupmap=gitlab-runner:puppet \
           --human-readable \
           --stats \
           --verbose \
           . /etc/puppet
  • Copy everything including all files and subfolders (--recursive) from the current location (.) to the target (/etc/puppet).
  • Make sure the modification times of the files at the target location matches the ones at the source location (--times).
  • When checking which files to sync, ignore the modification dates on folders (--omit-dir-times) and rely on checksumming only (--checksum) instead of checking modification times and file size.
  • Try to intelligently handle sparse files (--sparse). I'm rather sure this ended up here without any actual cause, by picking parameters from a meta parameter.
  • Delete files at the destination that are not at the source (--delete) and include empty directories while deleting (--force).
  • Recreate symlinks at the destination if there are any at the source (--links).
  • Exclude Git specific files and folders (--exclude=.git*).
  • Modify the group and owner of the files (--group, --owner)
  • Change the owner from gitlab-runner to root (--usermap=FROM_USER:TO_USER)
  • Change the group from gitlab-runner to puppet (--groupmap=FROM_GROUP:TO_GROUP)
  • Afterwards display detailed, human-readable statistics about the transfer (--human-readable, --stats).
  • Additionally display files transferred and a summary (--verbose).

Notes

If you are using sudo in combination with an automated system in which non-admin users can access sudo without password for specific tasks, make sure you have appropriate whitelists in place. You could, for example, restrict the use of sudo to a specific user, on a specific host witch a specific command. Given that the syntax of sudoers is not as precise as it might be with regular expressions, you'll have to be quite specific what command you'll want to allow and where to put wildcards, should you use any. Here's an example.

# No line breaks here, since it might confuse readers and they might end up with a damaged `sudoers` config.
your_user your_host.fully.qualified.domain= NOPASSWD: /full/path/to/command --a_parameter --another__parameter /the_source /the/target/location

Technical skills after almost 4 years at ICG

Posted on Thu 28 September 2017 in work • Tagged with Institute for Computer Vision and Computer Graphics

Please note that this post is intentionally written in past tense to avoid having to rewrite it completely in the future.

This post aims to be a summary of technologies I've learned to use during my period at the Institute for Computer Vision and Computer Graphics at TU Graz.

Table of Contents generated with DocToc

monitoring

While I was fortunate to avoid Nagios, I have quite a lot of experience with Sensu and its quirks. I sent several patches to their plugins and deployed a sizeable setup of checks and metrics, some of which were heavily customised to the ICG's needs. We deployed Sensu for Linux and Windows. You might like this post about how to set up Sensu with Puppet I've written about part of that work. I was also active on the sensu-users mailing list.

Furthermore I deployed Logstash and Logstash-forwarder for collecting, analysing and structuring log files. This work included coming up with custom patterns for matching as well as defining configuration for ingesting logs of dpkg, syslog, apache & apache-error, nginx & nginx-error, seafile & seahub as well as fail2ban.

The collected data was available via Grafana for metrics, Uchiwa for results of Sensu and Kibana for logs, all protected behind an Apache reverse proxy with authentication via whitelisted LDAP accounts. I integrated custom URLs to easily go from Sensu to the corresponding results in Grafana and Kibana and built multiple custom Grafana dashboards. Those dashboards either displayed general information or were custom-tailored to solving particular problems in operations.

Before becoming more intimate with Sensu, I wrote our own script for monitoring the output of two CLIs of hardware RAID vendors (storcli, tw_cli).

web servers

I have worked with Apache as well as Nginx, the majority of time with Apache, setting up static websites, WSGI based applications and reverse proxies with LDAP or password authentication. A part of the work with Apache was done via Puppet modules.

source control

I was in charge of two GitLab instances - one of which I migrated from a source installation to the Omnibus package - and maintained an old Apache Subversion instance. I am a strong supporter of Git and if need be can adjust to using SVN again. I've helped several of the researchers to set up their projects for GitLab Continuous Integration and used the feature myself extensively for both development and administration projects - a topic very dear to me.

datastores

For monitoring purposes I have built a setup including Graphite for storing metrics data, Redis for keeping monitoring related transient data for Sensu and Elasticsearch for storing logs with Elastic curator for removing them after a defined retention period. Setup of Graphite and Redis was done via Puppet modules.

I have limited knowledge of MySQL and PostgreSQL. I was part of a team developing an application using a Postgres backend. Further tasks included creating backups with pg_dump and editing a huge database dump by hand in an editor. The thought of this should give you nightmares. MySQL tasks were mostly creating backups.

I learned a few things about LDAP while I was modifying users, groups and configuration entries during everyday operations and operating system upgrades. Given my dislike of Java, I refused to install Java and by extension avoided using Apache Directory Studio, instead writing my .ldif files by hand using templates in my editor and applying them via ldapadd.

virtualization and containerization

For various development processes I was using Virtualbox together with Vagrant for easy setup of new machines that get thrown away. In production, Xen was used - I've written about some of that experience. Additionally, I built several custom Docker containers for the GitLab CI. We did not use any Docker containers hosting services in production. I have written about building a container for this very blog though.

configuration management

I've written the Puppet setup at the institute managing many, many services. Some hosts are entirely controlled by Puppet. The configuration is deployed from a git repository to the Puppetmaster after being run through syntax tests and integration tests via GitLab and Docker.

For shorter, one-shot tasks on multiple hosts I've lately taken a liking to Ansible. Generally I find configuration management solutions more intuitive than ones that specify processes.

security

I've cleaned up, simplified and improved readability of an existing Shorewall setup. The entire configuration is being dry-run in the GitLab CI before being deployed to production on success.

I've configured and deployed TLS for several services, including LDAP, web servers, IMAP/POP3 (Cyrus), SMTP (Postfix), Rabbitmq, Logstash and more. I've rolled out several versions of Openssh configurations and Fail2ban deployments via our own Puppet code. Generally I'm of the opinion that even traffic in your own datacenter should be encrypted - that's a remainder after reading that Google's internal lines were tapped a few years back.

I was in charge during some unfortunate events where security issues popped up and had to be investigated or violations to our IT policies had to be dealt with. These policies were based on the Admin Team's decisions and put into text by me. I've submitted detailed written reports about these activities to my boss.

troubleshooting OSs

While my main focus has been on Linux servers - mostly Ubuntu with some Debian - I have been busy troubleshooting problems on Linux desktops too, including broken X, crashing LightDM, missing CUDA drivers, issues with Secure Boot (and the shim-signed package). I've also seen my fair share of macOS problems given that we had several Mac users, including myself. Amongst the problems there were inaccessible BootCamp and completely broken installations due to users aborting upgrade processes. I have been mostly saved from Windows issues by my colleagues handling those. I have however written a tiny script allowing you to easily boot into Windows from your Ubuntu installation after realising the convenience of such a solution when using BootCamp.

upgrades and migrations

Over the years I've successfully upgraded many of our existing servers via do-release-upgrade or changing the Debian repository and fixed all occurring issues. I've also migrated a part of our infrastructure from DRBD 8 to DRBD 9 in order to replicate to more machines without layering.

Upgrading was made much easier by having all systems at a current state which I achieved by using Unattended Upgrades for many of our sources. Reading changelogs and news about new features to improve our infrastructure has been very helpful in that regard. One achievement I am very proud of is having a patch accepted into Ubuntu (Precise and Trusty).

Usually I'm working with a list of things that I check after upgrading and manually merge or rewrite configuration files which I find using the following command. A look into the logfiles of various services is always a good idea too.

find /etc -name "*dpkg-*" -or -name "*ucf-*" -or -name "*.merged" | sort

Additionally I oversaw and implemented the switch from manual configuration and firewall rule sync to a setup controlled by Puppet which is able to keep two hosts synced and configured.

remainders

This is the grab-bag area. Most things in here didn't warrant a longer section.

I replaced a manual process of dealing with DHCP with a Puppet-controlled setup - that's thankfully very easy using ISC-DHCP-SERVER. I deployed multiple applications that use Shibboleth authentication and worked with the central university IT section on that. I configured and deployed Mattermost and Seafile, made sure our network mounts, automounts, samba shares and Mailman instance worked and NTP is synced.

writing

I've written extensive documentation for new users and the on-boarding process, documentation for the Admin Team as well as a policy section. Additionally, I published several posts about my work with permission from the ICG here on my personal website.

Furthermore I meticulously took notes on all new issues and their solutions in our GitLab issue tracker, so that a knowledge base containing previous experiences was created instead of letting all my experience evaporate.


Reading recommendations (2017-08-13)

Posted on Sun 13 August 2017 in reading recommendations

~Onatcer tells devastating things about the new exams required to study Computer Science in Vienna in Der TU-Aufnahme-Test: Pleiten für alle! (German).

Troy Hunt provides a new service where one can check passwords against a gigantic collection of millions of leaked passwords with Introducing 306 Million Freely Downloadable Pwned Passwords.

Currently Final Fantasy XIV's Moonfire Faire seasonal festival is running and I already played through the seasonal quests in order not to miss anything. ~Luxpheras from the Community Team put up the post Shaved Ice Ice Baby to promote the event with some pictures of the spoils.


Sidenotes.


How I publish this blog

Posted on Mon 07 August 2017 in development

It was 2015 when I finally decided to act upon my dissatisfaction with the WordPress publishing process and move to a different solution. I exported my posts and pages from its MySQL database and moved on to Pelican - a static site generator written in Python. Usually, when you hear "static site generator" you think of Jekyll. Jekyll is the static site generator people know of - the major reason for that being that it is used behind the scenes for Github Pages.

Jekyll is written in Ruby, however, and I have not put enough time into Ruby to be more familiar with it than exchanging some lines in existing code here and there. Python is my tool of choice and when a friend mentioned Pelican I was immediately hooked - even though it took me many months to finally put my plans into motion.

Back in the days: WordPress

WordPress had always struck me as being built for ease of use. It is heavyweight, can be deployed almost everywhere and its features are plentiful. There was one major pain point for me though: For a reason I have never figured out, none of the available native clients (e.g. Blogo, Marsedit) ever managed to show me more than my last few posts instead of a full view of all historical ones.

I frequently edit posts in the days after they are published. I fix typos, update the wording if I think it is bad after reading it again and sometimes add additional information. I consider publishing an article a bit like writing software or configuring a system. It often needs a little adjustment after it has been in use (or in testing) for some time. With WordPress that meant I had to go to the admin page every time to change something. The workflow was something akin to:

  • go to bookmarked login site
  • swear about login being insecure due to missing TLS deployment
  • log in
  • go to section "posts"
  • find the post in question
  • edit the post by copying the modified content from my local file to the website
  • preview the post on the site
  • save the post

I dislike the need to look for click targets, to scan for the relevant article in the list, the waiting between interactions on a slow connection. The setup screamed for some sort of automation but nothing seemed easy to set up at that point.

Uploading Pelican

Immediately after switching to Pelican for content generation, I found myself in the puzzling situation of having a blog but no easy way to publish it. A bit of investigation uncovered Pelican shipping with a Makefile that includes a ftp_upload target though. I configured this and added a ~/.netrc file so I didn't need to type my password every time an upload was performed. This worked fine for a while. I even wrote a little bash aliases to run it.

source ~/.virtualenvironments/pelican/bin/activate \
  && cd ~/…/ghostlyrics-journal/Pelican \
  && make ftp_upload \
  && deactivate \
  && cd - \
  && terminal-notifier -message "GhostLyrics Journal published." -open "http://ghostlyrics.net

It was in May 2016 that the lftp build for macOS broke. That means that after an upgrade of macOS I was left without a way of easily deploying changes to the blog. Pelican uses lftp because of some of its features like mirroring a local folder and updating only the differences instead of copying the whole folder recursively every time you kick it. I think I tried to publish with Transmit once or twice but it is simply not built for this task.

I was enormously frustrated and heartbroken. I didn't write anything for weeks, instead hoping a solution would surface that didn't require engineering effort on my part. However, the build remained broken and so did my FTP upload.

After being inspired I decided that the status quo wasn't acceptable and went on to build a way that allowed me to simply run publish in Terminal and have everything done for me - reproducibly and rock solid.

Up comes Vagrant

In October 2016 I came up with a Vagrantfile that allowed me to publish from an Ubuntu machine via Vagrant. This worked around the author of lftp seemingly having little interest in building for macOS.

Vagrant.configure("2") do |config|
  config.vm.box = "bento/ubuntu-16.04"
  config.vm.synced_folder "/…/ghostlyrics-journal", "/pelican"

  config.vm.provision "file", source: "~/.netrc", run: "always", destination: ".netrc"

  config.vm.provision "shell", env:{"DEBIAN_FRONTEND" => "noninteractive"}, inline: <<-SHELL
    apt-get -qq update
    apt-get -qq -o=Dpkg::Use-Pty=0 install -y --no-install-recommends \
      make \
      python-markdown \
      python-typogrify \
      python-bs4 \
      python-pygments \
      pelican \
      lftp
  SHELL

  config.vm.provision "shell", privileged: false, run: "always", inline: <<-SHELL
    make -C /pelican/Pelican ftp_upload
  SHELL
end

In short: I use a bento Ubuntu box because I've had bad experience on multiple occasions with the boxes in the Ubuntu namespace. I sync the folder my blog resides in to /pelican in the VM. I copy the .netrc file with the credentials. The VM gets some packages I need to run Pelican and calls the ftp_upload make target. This also got a new bash alias.

cd ~/vagrant/xenial-pelican \
  && vagrant up \
  && vagrant destroy -f \
  && cd - \
  && tput bel

Now, if you only ever publish a few times, this works fine and is perfectly acceptable. If you intend to iterate, pushing out changes a few times within half an hour, you'll be stuck waiting more often than you'd like due to the VM booting and reconfiguring. This was necessary to avoid conflicts when I work on different machines with the Vagrantfile being in my Dropbox.

Wrapping it up with Docker

Enter Docker. Now I know what you are thinking: "Docker is not the solution to all our problems" and I agree - it is not. It seems like the right kind of tool for this job though. Being built on xhyve and therefore Hypervisor.framework it is decidedly more lightweight than Virtualbox. When it is already running, firing up a container that builds the blog, uploads it and shuts the running container down again is very, very fast.

I built the following Dockerfile with the command docker build -t pelican . while in the directory containing the Dockerfile and .netrc.

FROM buildpack-deps:xenial
LABEL maintainer="Alexander Skiba"

VOLUME "/pelican"
WORKDIR /pelican
ENV DEBIAN_FRONTEND noninteractive

ADD ".netrc" "/root"

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
      make \
      python3-pip \
      python3-setuptools \
      python3-wheel \
      lftp

RUN pip3 install \
      pelican \
      markdown \
      typogrify \
      bs4 \
      pygments

CMD ["make", "-C", "/pelican/Pelican", "ftp_upload"]

Again, I build on top of a Ubuntu Xenial machine, work in /pelican, copy the .netrc file and install packages. This time, however I install the packages via pip to get current versions. It is also of note that while building the image, one does not have access to files outside of the current directory and its subdirectory, which made a local copy of .netrc necessary. Furthermore, the paths for Docker volumes cannot be defined in the Dockerfile by design. Because of that, the new bash aliases is this:

docker run -v /…/ghostlyrics-journal/:/pelican pelican

This short command starts the container called pelican with the given folder mounted as volume into /pelican. Since I don't specify interactive mode, the CMD defined earlier is called and the blog built and uploaded. Afterwards the container exits since the command itself exits. Quite an elegant solution, I think.


Final Fantasy XIV: Stories about Fellowship

Posted on Sat 05 August 2017 in video games • Tagged with Stories

I started playing Final Fantasy XIV (FF) in February when my disappointment about the many quirks of Black Desert Online (BDO) reached an all-time high. After feeling that a lot of things were unpolished in BDO I wanted to try an MMO with monthly subscription - the assumption being that the extra money was used for a certain layer of polish and QA that I long for when playing a video game.

I was pleasantly surprised. All the GUIs were fine, not overloaded, no text outside of its intended boxes or similar stuff showing neglect on behalf of the developer. While the beginning of combat is rather boring and depressingly slow, it grows better when you get more skills. The world is build with attention to detail even though I felt that BDO's world felt more alive, especially Altinova. I want to point out that the writing is superb. The jokes, pop culture references and times when the game doesn't take itself serious are amazing.

When taking pictures in FF I am almost always taking images of events and experiences, even characters whereas in BDO my favorite motive was the environment.

Nadzeya looking at grilled food in Altinova

Another thing I realized early on is how the game is build to foster community and friendliness. There are systems in place to help new players (Novice Chat), that encourage players to play older content with others (dungeon bonuses, second chances for Khloe's Wondrous Tails) and to be generally helpful and cooperative while in an instance (player recommendations). All this is just so fundamentally different from the dog-eat-dog mentality in BDO where you can basically get stabbed outside safe zones with little to no repercussions for the murderer.

Let me tell you about Kakysha Saranictil, a rogue and ninja fighting for the good of the people of Eorzea. She is a hero both to the common folk as she is to statesmen. Fighting for the right cause is reason enough for her to help everyone, be them a poor miner in a almost forsaken village or the ruler of a grand city-state.

An airship leaving Ul'dah

While she started her journey as pugilist (read: martial artist) in Ul'dah, the prosperous desert nation, she soon discovered that her true calling are the shadows and so she became a member of The Dutiful Sisters of the Edelweiss in Limsa Lominsa where she studied under Captain Jacke. As her travels led her all over Eorzea she sadly realized that Jacke had little left to teach her. Gladly Oboro, a ninja hailing from Doma in the Far East took her under his wings and taught her the ways of the ninja.

Kakysha sitting cloaked in Idyllshire

Now, while her comrades at the Scions of the Seventh Dawn certainly kept her busy defending this or that nation from both primal and Garlean threat, she certainly did spent her downtime well, building trust with a more conservative faction of Ul'dah's lizard people, the Amalj'aa. A proud folk of warriors, they came to respect her when she helped them uphold their traditions again and again against their religious fanatical kin revering Ifrit as well as defending their clanswoman.

Admittedly even a hero needs a little rest from time to time and what better use of said downtime would there be than finally having dinner with her close friend, Ser Aymeric de Borel.

Aymeric laughing about a joke Kakysha made

Kakysha smiling at Aymeric

But even between all those events, she found a sense of belonging, of fellowship. Kakysha joined a Free Company soon after starting her journey, but ultimately felt unfulfilled by both the people and their way of treating each other. After a long period of solitude she ultimately came across the Seraphs, Jenji and Syn Seraph, who invited her to join their Free Company, The Black Crown, where people were pleasant and all was well. While Crown has a considerable amount of adventurers who have failed to show up in recent times there is a core group of heroes who are there to help others, to talk and to have fun with.

Recently one might come to the impression that Kakysha had become complacent, ignoring the plight of her fellow people. Nothing could be further from the truth - it is only that she needed to focus on solving the biggest issues first (namely, the liberation of Doma and Ala Mhigo) before tackling the smaller issues (left over sidequests) now that a manner of peace has been established.

Kakysha watching the people in Quarrymill

Look out for her on the Phoenix, EU server - you'll know her by her completely sand colored clothes - be they adorned by gems and jewels or more work oriented with belts and pouches, bright red hair and glasses.