On changing hard disks

Posted on Fri 22 January 2016 in work • Tagged with ICG

Now, I might have mentioned in the past that despite me working as a system administrator, I dislike working with actual hardware and prefer to touch machines as soon as they are SSH-ready or at most as soon as an operating system can be installed.

  • This post has been updated once.

Well, let's assume for a moment that a disk needs changing and neither the senior admin in my current job nor my predecessor are available. Let's assume that this has happened twice already and led to rather amusing stories; both times.

first time's always fun

The first time I was to change a disk I had help from my colleague Daniel Brajko who accompanied me to the server room, but let's start at the beginning.

I noticed that something was up and a disk had an error when I wrote my script to check the output of the RAID controllers' status information to get notified automatically when something was wrong. I decided to tackle this task since it was one important piece of work that my senior admin had assigned me during his absence.

After checking the serial and the disk size of the faulty drive when headed to the storage space and picked several disks since we were not sure which one was to go into exactly that server. Actually, at that time we were also not sure because some of the disks were not labelled with their disk size (looking at you, Seagate). With the disks and more equipment in a backpack, we ventured to the server room which is conveniently located within walking distance of our office.

We only came as far as the server room door, though. Neither my employee card nor the one of my colleague was authorized to enter, even though at least he's been in this job for over a year. Great. Alright, the helpful thing was that authorization had not yet been transferred from my predecessor to me yet and he still worked at our institute in a different position. He knew us and lent us the card in order to change disks as he clearly recognized the need for such maintenance. I had a bad feeling the whole time that someone would "catch" us and we'd have to explain why we were using this card in an extremely awkward situation.

With this card - impersonating our former colleague - we ventured into the server room, only to find that the machine in question was in our secondary server room - the one who is multiple blocks away. Alright, this wasn't going to be easy.

So we packed everything back up and walked to the secondary building. Daniel had only ever been there once, I had never been there. The building has two basement levels which are not particularly well lit nor particularly easy to find your way around in. I wouldn't necessarily call it a maze but it's certainly not far from that. After 15 minutes of running around without any clue we surrendered and went up to the ground floor to consult one of the university's information terminals to find our own server room. A glorious day, let me tell you.

After finding our server room and granting ourselves access with the borrowed card we entered the room, looked for our server cabinet (of course it was the only unlabelled one) and well… uhm. That was the point were we Daniel pointed out that, yes, we do need the keychain that I told him to leave behind because, "I already have everything we need".

And back we went. *sigh*. After fetching the keychain we also borrowed my predecessor's bike as well as another one and went back, back into the basement, changed the drive which was relatively painless after realizing we only had one disk with the correct storage capacity with us and returned.

And that's how it took two sysadmins a whole afternoon to change a damaged disk. After that episode we phoned the person in charge and got ourselves assigned the server room access permissions. but...

second time you're quickly done

Today this little e-mail arrived. That's the second time it did and I always like when my efforts pay out. :)

RAID status problematic.

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       256K    5587.88   RiW    ON     

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     931.51 GB   1953525168    [REDACTED]            
p1     OK               u0     931.51 GB   1953525168    [REDACTED]            
p2     DEVICE-ERROR     u0     931.51 GB   1953525168    [REDACTED]        
p3     OK               u0     931.51 GB   1953525168    [REDACTED]            
p4     OK               u0     931.51 GB   1953525168    [REDACTED]      
p5     OK               u0     931.51 GB   1953525168    [REDACTED]            
p6     OK               u0     931.51 GB   1953525168    [REDACTED]            
p7     OK               u0     931.51 GB   1953525168    [REDACTED]  

Okay. So. Senior admin is absent again, disk fails again. This time Daniel is also not there. "Fine," I tell myself, it will be painless this time. I was so, so wrong.

After making a quick joke with the researchers that maybe they should go home early because if I fail when replacing the disk, we won't have any e-mail service I grabbed the keys and a replacement disk - this time I couldn't find one with the right storage capacity again, but I got smarter and made an educated guess based on 5 of 8 characters of the serial number matching. I headed to the next building, ran into the admin from the other institute and joked if they also had "those mean things lacking a storage capacity description". He helpfully stated that they use the same model and they were 1 TB models which gave me some relief. After opening our server racks and checking all devices in there I came to a terrible realization: Of course I was in the wrong building. Again. (This time I made a list of all our devices in this building for our internal docs.)

Alright, back up from the basement, notified the office that the keychain has not gone missing and I'm taking it to go to the other building. I walked through the cold winter air to the other building, entered the basement and found the server room on the first try. This is a thing that tends to happen. If I am ever required to find my way to a place by myself I will keep finding the way there in the future. Anyway, so I hold my card to the scanner and… nothing happens. I cursed, waited a bit and tried again. Again, nothing. There's an emergency contact on the door and after returning to the ground floor in order to have cellphone reception I called that, we had a longer conversation and obviously I didn't receive all the permissions I should have gotten when the issue arose the first time. Shall we say I was a little annoyed that not both permissions had been transferred from my predecessor directly to me?

Update: It turns out I am again to blame for something, as I did have the permissions. However, I didn't know that the card activation via sensor only works for the building you last checked in at the sensor. So, believing my card is supposed to work after having just been at one sensor I obviously didn't visit the sensor at the other building.

After managing emergency access I scoured the room for our server rack. I panicked a little when there was nothing where I remembered seeing it last time. I mean, yes, it had been a while but my memory for locations is pretty accurate and I don't think anyone would've moved the machines without the admins noticing. Good thing no one else was in the room since I must've looked like a burglar using my iPhone's flashlight to search the empty server cabinet where our machines were supposed to be. Then I noticed that there were indeed machines. It was just that both were in real slim chassis and they were located in the topmost and bottommost slots. In addition one was turned off and so I missed both when looking less carefully. Oh, yeah. Our stuff was in the only unlabelled rack, because of course it still was. I really hope the people in charge don't have microphones there since I might've been swearing quite a lot.

The rest was easy work. Change the disk, make sure the RAID recognizes the new drive and, pack everything up and go home.

I'm morbidly curious what surprises the next drive change will offer me.

PS: Yes, labelling our rack is on top of my TODOs.


Unattended-Upgrades patch for Remove-unused-dependencies backported to Trusty, Precise

Posted on Wed 06 January 2016 in work • Tagged with ICG

As of today my contribution to unattended-upgrades has been backported into Ubuntu Trusty Tahr and Ubuntu Precise Pangolin, which are both LTS version currently in use. I'm probably more proud of myself than I should be but it was a great feeling to be of help to a global community and prevent further issues with automatic updates making systems unusable.

I will be removing the manually patched packages at the ICG soon and look forward to not maintaining a fork of the software for internal use as that tended to eat up valuable time from other projects.


Media Recap 2015 - II

Posted on Wed 02 December 2015 in media recap

After watching TotalBiscuit's video for There Came an Echo I wasn't really into playing the game, but when the soundtrack went up at Big Giant Circles I couldn't pass it up. Bought it some time ago and still love it, especially "Ignite Defense" and "LAX" (those are great Audiosurf tracks, BTW).

You should really listen to the soundtrack.

Video Games

  • Audiosurf 2 (Steam, formerly Early Access)
  • Deponia (Steam) - Hard to like the game given that none of its characters is written in a likable way. It does contain some rememorable scenes though. "Rufus has stolen the screws from the children's merry-go-round."
  • Dragon Age 2 (Xbox 360) - Playing again with all the DLC to show the girlfriend how constrained the team was making this, as well as how great the dialogue was.
  • Guild of Dungeoneering (Steam) - Yes, you can actually sell games based on their trailer soundtrack.
  • Halo 1 (Xbox One, Master Chief Collection) - Bought this one together with my Xbox One in order to relive the old times. Have fond memories of ploughing through the actual Halo 1 with Martin.
  • Halo Spartan Ops (Xbox One, Master Chief Collection) - Unless I got something wrong this seems to be the multiplayer replacement for Firefight. I loved ODST's Firefight and am deeply disappointed by this. I used Firefight as a kind of training ground for the campaign, but the Spartan Op I played solo was boring.
  • Ironcast (Steam) - I couldn't resist buying a new "match 3" game, especially one with elements of a roguelike. It was marked down during the Steam Exploration sale. Like this one quite a lot. I wish I had found the 'skip' button in dialogues earlier though. I accidentally clicked away quite some choices.
  • Kingdom (Steam) - Beautiful indie title which is deeper than one would expect at first sight.
  • Kingdom Hearts Re:Chain of Memories HD (Playstation 3) - Last time I played this game was a pirated version of the Game Boy Advance edition some years ago. Still, the later boss fights were as tough as I remembered them and I tended to switch off the PS3 due to rising anger at least one per boss fight in the mid-section.
  • Life is Strange (Xbox One) - While I am not as heavily into Life is Strange as my girlfriend, I can acknowledge it for the interesting and original game that it is. Its contemporary theme struck a nerve for the both of us.
  • Rune Factory 4 (Nintendo 3DS)
  • Secret Files 3 (Steam) - Disappointing. Feels incomplete, almost like sentence.
  • Starbound (Steam, Early Access)
  • Startopia (Steam) - Felt nostalgic. Initially played this title years ago when I borrowed it from Lukas.
  • Terraria (Steam) - Terraria has arrived on the Mac. I don't need to say more.
  • The Witcher 3 (Xbox One) - Holy… I adore the Witcher books and absolutely, wholeheartedly recommend The Witcher 3 to anyone on the look for a gritty, mature and sarcastic fantasy adventure RPG. I played this with the girlfriend on a completionist run. The game and its awesome first expansion Hearts of Stone kept us busy from June to November.

Books

It's a Witcher book aside from the main five books which was an entertaining read. Then there's Jennifer Estep's series about an assassin which had the usual fault of her books: She is explaining everything in every book in minuscule detail again even though many people will either still remember or read the books a binge.

Books by Richard Schwartz

I bought a stack of books from the friend of a friend who wanted to clear house. Those turned out to be very entertaining fantasy novels by Richard Schwartz. I haven't read the single titles in this universe yet, but due to traveling I spent some iTunes credits on the later books in order to avoid packing more. I even turned on data roaming and bought one book in the train in Germany - that should give you a good impression how much I enjoyed the series so far.

  • Das erste Horn
  • Die zweite Legion
  • Das Auge der Wüste
  • Der Herr der Puppen
  • Die Feuerinseln
  • Der Kronrat
  • Die Rose von Illian
  • Die weiße Flamme
  • Das blutige Land
  • Die Festung der Titanen
  • Die Macht der Alten

Movies

I've suggested watching the Fast and the Furious movies since I like them and in turn I watched the Harry Potter ones since I didn't know them. Due to conflicts of time and interest we haven't seen the last two Potters yet.

I recommend Inside Out. I can't remember when I've had such a nice time in cinema. It's easily my favorite movie of the year. Yeah, don't go and watch Minions - it's disappointing and weak.

  • Fast and the Furious, The
  • Fast and the Furious, The: Tokyo Drift
  • Fracture (Netflix, DE: Das perfekte Verbrechen)
  • Harry Potter and the Philosopher's Stone
  • Harry Potter and the Chamber of Secrets
  • Harry Potter and the Prisoner of Azkaban
  • Harry Potter and the Goblet of Fire
  • Harry Potter and the Order of the Phoenix
  • Harry Potter and the Half-Blood Prince
  • Inside Out (cinema, DE: Alles steht Kopf)
  • Jumper (Netflix)
  • Minions (cinema)
  • Transporter, The (Netflix)
  • V for Vendetta (Netflix)
  • xXx (Netflix)

Videos on Netflix

The Netflix series consumption has been more or less the same. Some Grimm, some Sherlock, a lot of Elementary. I have also checked out a documentary series about famous chefs which proved to be interesting.

Presentations

Podcasts


Using Continuous Integration for puppet

Posted on Sun 01 November 2015 in work • Tagged with ICG

I'll admit the bad stuff right away. I've been checking in bad code, I've had wrong configuration files on our services and it's happened quite often that files referenced in .pp manifests have had a different name than the one specified or were not moved to the correct directory during refactoring. I've made mistakes that in other languages would've been considered "breaking the build".

Given that most of the time I'm both developing and deploying our puppet code I've found many of my mistakes the hard way. Still I've wished for a kind of safety net for some time. Gitlab 8.0 finally gave me the chance by integration easy to use CI.

Getting started with Gitlab CI

  1. Set up a runner. We use a private runner on a separate machine for our administrative configuration (puppet, etc.) to have a barrier from the regular CI our researchers are provided with (or, as of the time of this writing, will be provided with soonish). I haven't had any problems with our docker runners yet.
  2. Enable Continuous Integration for your project in the gitlab webinterface.
  3. Add a gitlab-ci.yml file to the root of your repository to give instructions to the CI.

Test setup

I've improved the test setup quite a bit before writing this and aim to improve it further. I've also considered making the tests completely public on my github account, parameterize some scripts, handle configuration specific data in gitlab-ci.yml and using the github repository as a git submodule.

before script

In the before_script section which is run in every instance immediately before a job is run, I set some environment variables and run apt's update procedure once to ensure only the latest versions of packages are installed when packages are requested.

before_script:
  - export DEBIAN_FRONTEND=noninteractive
  - export NOKOGIRI_USE_SYSTEM_LIBRARIES=true
  - apt-get -qq update
  • DEBIAN_FRONTEND is set to suppress configuration prompts and just tell dpkg to use safe defaults.
  • NOKOGIRI_USE_SYSTEM_LIBRARIES greatly reduces build time for ruby's native extensions by not building its own libraries which are already on the system.

Optimizations

  • Whenever apt-get install is called, I supply -qq and -o=Dpkg::Use-Pty=0 to reduce the amount of text output generated.
  • Whenever gem install is called, I supply --no-rdoc and --no-ri to improve installation speed.

Puppet tests

All tests which I consider to belong to puppet itself run in the build stage. As is usual with Gitlab CI, only if all tests in this stage pass, the tests in the next stage will be run. Given that sanity checking application configurations which puppet won't be able to apply doesn't make a lot of sense, I've moved those checks into another stage.

I employ two of the three default stages for gitlab-ci: build and test. I haven't had the time yet to build everything for automatic deployment after all tests pass using the deploy stage.

puppet:
  stage: build
  script:
    - apt-get -qq -o=Dpkg::Use-Pty=0 install puppet ruby-dev
    - gem install --no-rdoc --no-ri rails-erb-lint puppet-lint
    - make libraries
    - make links
    - tests/puppet-validate.sh
    - tests/puppet-lint.sh
    - tests/erb-syntax.sh
    - tests/puppet-missing-files.py
    - tests/puppet-apply-noop.sh
    - tests/documentation.sh

While puppet-lint exists as .deb file, this installs it as a gem in order to have Ubuntu docker containers running the latest puppet-lint.

I use a Makefile in order to install the dependencies of our puppet code quickly as well as to create symlinks to simplify the test process instead of copying files around the test VM.

libraries:
  @echo "Info: Installing required puppet modules from forge.puppetlabs.com."
  puppet module install puppetlabs/stdlib
  puppet module install puppetlabs/ntp
  puppet module install puppetlabs/apt --version 1.8.0
  puppet module install puppetlabs/vcsrepo

links:
  @echo "Info: Symlinking provided modules for CI."
  ln -s `pwd`/modules/core /etc/puppet/modules/core
  ln -s `pwd`/modules/automation /etc/puppet/modules/automation
  ln -s `pwd`/modules/packages /etc/puppet/modules/packages
  ln -s `pwd`/modules/services /etc/puppet/modules/services
  ln -s `pwd`/modules/users /etc/puppet/modules/users
  ln -s `pwd`/hiera.yaml /etc/puppet/hiera.yaml

As you can see, I haven't had the chance to migrate to puppetlabs/apt 2.x yet.

puppet-validate

I use the puppet validate command on every .pp file I come across in order to make sure it is parseable. It is my first line of defense given that files which are not even able to make it pass the parser are certainly not going to do what I want in production.

#!/bin/bash
set -euo pipefail

find . -type f -name "*.pp" | xargs puppet parser validate --debug

puppet-lint

While puppet-lint is by no means perfect, I like to make it a habit to enable linters for most languages I work with in order for others to have an easier time reading my code should the need arise. I'm not above asking for help in a difficult situation and having readable output available means getting help for your problems will be much easier.

#!/bin/bash
set -euo pipefail

# allow lines longer then 80 characters
# code should be clean of warnings

puppet-lint . \
--no-80chars-check \
--fail-on-warnings \

As you can see I like to consider everything apart from the 80 characters per line check to be a deadly sin. Well, I'm exaggerating but as I said, I like to have things clean when working.

erb-syntax

ERB is a Ruby templating language which is used by puppet. I have only ventured into using templates two or three times, but that has been enough to make me wish for extra checking there too. I initially wanted to use rails-erb-check but after much cursing rails-erb-lint turned out to be easier to use. Helpfully it will just scan the whole directory recursively.

#!/bin/bash
set -euo pipefail

rails-erb-lint check

puppet-missing-files

While I've used puppet-lint locally previously it caught fewer errors than I would've liked due to it not checking whether files I sourced for files or templates existed. I was negatively surprised upon realizing that puppet validate didn't do that either, so I slapped together my own checker for that in Python.

Basically the script first builds a set of all .pp files and then uses grep to check for lines specifying either puppet: or template( which are telltale signs for files or templates respectively. Then each entry of said entry is verified by checking for its existence as either a path or a symlink.

#!/usr/bin/env python2
"""Test puppet sourced files and templates for existence."""

import os.path
import subprocess
import sys


def main():
    """The main flow."""

    manifests = get_manifests()
    paths = get_paths(manifests)
    check_paths(paths)


def check_paths(paths):
    """Check the set of paths for existence (or symlinked existence)."""

    for path in paths:
        if not os.path.exists(path) and not os.path.islink(path):
            sys.exit("{} does not exist.".format(path))


def get_manifests():
    """Find all .pp files in the current working directory and subfolders."""

    try:
        manifests = subprocess.check_output(["find", ".", "-type", "f",
                                             "-name", "*.pp"])
        manifests = manifests.strip().splitlines()
        return manifests
    except subprocess.CalledProcessError, error:
        sys.exit(1, error)


def get_paths(manifests):
    """Extract and construct paths to check."""

    paths = set()

    for line in manifests:
        try:
            results = subprocess.check_output(["grep", "puppet:", line])
            hits = results.splitlines()

            for hit in hits:
                working_copy = hit.strip()
                working_copy = working_copy.split("'")[1]
                working_copy = working_copy.replace("puppet://", ".")

                segments = working_copy.split("/", 3)
                segments.insert(3, "files")

                path = "/".join(segments)
                paths.add(path)

        # we don't care if grep does not find any matches in a file
        except subprocess.CalledProcessError:
            pass

        try:
            results = subprocess.check_output(["grep", "template(", line])
            hits = results.splitlines()

            for hit in hits:
                working_copy = hit.strip()
                working_copy = working_copy.split("'")[1]

                segments = working_copy.split("/", 1)
                segments.insert(0, ".")
                segments.insert(1, "modules")
                segments.insert(3, "templates")

                path = "/".join(segments)
                paths.add(path)

        # we don't care if grep does not find any matches in a file
        except subprocess.CalledProcessError:
            pass

    return paths

if __name__ == "__main__":
    main()

puppet-apply-noop

In order to perform tests on the most common tests in puppet world, I wanted to test every .pp file in a module's tests directory with puppet apply --noop, which is a kind of dry run. This outputs information what would be done in case of a real run. Unfortunately this information is highly misleading.

#!/bin/bash
set -euo pipefail

content=(core automation packages services users)

for item in ${content[*]}
do
  printf "Info: Running tests for module $item.\n"
  find modules -type f -path "modules/$item/tests/*.pp" -execdir puppet apply --modulepath=/etc/puppet/modules --noop {} \;
done

When run in this mode, puppet does not seem to perform any sanity checks at all. For example, it can be instructed to install a package with an arbitrary name regardless of the package's existence in the specified (or default) package manager.

Upon deciding this mode was not providing any value to my testing process I took a stab at implementing "real" tests instead by running puppet apply instead. The value added by this procedure is mediocre at best, given that puppet returns 0 even if it fails to apply all given instructions. Your CI will not realize that there have been puppet failures at all and happily report your build as passing.

puppet provides the --detailed-exitcodes flag for checking failure to apply changes. Let me quote the manual for you:

Provide transaction information via exit codes. If this is enabled, an exit code of ´2´ means there were changes, an exit code of ´4´ means there were failures during the transaction, and an exit code of ´6´ means there were both changes and fail‐ ures.

I'm sure I don't need to point out that this mode is not suitable for testing either given that there will always be changes in a testing VM.

Now, one could solve this by writing a small wrapper around the puppet apply --detailed-exitcodes call which checks for 4 and 6 and fails accordingly. I was tempted to do that. I might still do that in the future. The reason I didn't implement this already was that actually applying the changes slowed things down to a crawl. The installation and configuration of a gitlab instance added more than 90 seconds to each build.

A shortened sample of what is done in the gitlab build:

  • add gitlab repository
  • make sure apt-transport-https is installed
  • install gitlab
  • overwrite gitlab.rb
  • provide TLS certificate
  • start gitlab

Should I ever decide to implement tests which really apply their changes, the infrastructure needed to run those checks for everything we do with puppet in a timely manner would drastically increase.

documentation

I am adamant when it comes to documenting software since I don't want to imagine working without docs, ever.

In my Readme.markdown each H3 header is equivalent to one puppet class.

This test checks whether the amount of documentation in my preferred style matches the amount of puppet manifest files (.pp). If the Readme.markdown does not contain exactly the same amount of ### headers as there are puppet manifest files then it counts as a build failure since someone obviously missed to update the documentation.

#!/bin/bash
set -euo pipefail

count_headers=`grep -e "^### " Readme.markdown|wc -l|awk {'print $1'}`
count_manifests=`find . -type f -name "*.pp" |grep -v "tests"|wc -l|awk {'print $1'}`

if test $count_manifests -eq $count_headers
  then printf "Documentation matches number of manifests.\n"
  exit 0
else
  printf "Documentation does not match number of manifests.\n"
  printf "There might be missing manifests or missing documentation entries.\n"
  printf "Manifests: $count_manifests, h3 documentation sections: $count_headers\n"
  exit 1
fi

Application tests

As previously said I use the test stage for testing configurations for other applications. Currently I only test postfix's /etc/aliases file as well as our /etc/postfix/forwards which is an extension of the former.

applications:
  stage: test
  script:
      - apt-get -qq -o=Dpkg::Use-Pty=0 install postfix
      - tests/postfix-aliases.py

Future: There are plans for handling both shorewall as well as isc-dhcp-server configurations with puppet. Both of those would profit from having automated tests available.

Future: The different software setups will probably be done in different jobs to allow concurrent running as soon as the CI solution is ready for general use by our researchers.

postfix-aliases

In order to test the aliases, an extremely minimalistic configuration for postfix is installed and the postfix instance is started. If there is any output whatsoever I assume that the test failed.

Future: I plan to automatically apply both a minimal configuration and a full configuration in order to test both the main server and relay configurations for postfix.

#!/usr/bin/env python2
"""Test postfix aliases and forwards syntax."""

import subprocess
import sys


def main():
    """The main flow."""
    write_configuration()
    copy_aliases()
    copy_forwards()
    run_newaliases()


def write_configuration():
    """Write /etc/postfix/main.cf file."""

    configuration_stub = ("alias_maps = hash:/etc/aliases, "
                          "hash:/etc/postfix/forwards\n"

                          "alias_database = hash:/etc/aliases, "
                          "hash:/etc/postfix/forwards")

    with open("/etc/postfix/main.cf", "w") as configuration:
        configuration.write(configuration_stub)


def copy_aliases():
    """Find and copy aliases file."""

    aliases = subprocess.check_output(["find", ".", "-type", "f", "-name",
                                       "aliases"])
    subprocess.call(["cp", aliases.strip(), "/etc/"])


def copy_forwards():
    """Find and copy forwards file."""

    forwards = subprocess.check_output(["find", ".", "-type", "f", "-name",
                                        "forwards"])
    subprocess.call(["cp", forwards.strip(), "/etc/postfix/"])


def run_newaliases():
    """Run newaliases and report errors."""

    result = subprocess.check_output(["newaliases"], stderr=subprocess.STDOUT)
    if result != "":
        print result
        sys.exit(1)

if __name__ == "__main__":
    main()

Conclusion

While I've ran into plenty frustrating moments, building a CI for puppet was quite fun and I'm constantly thinking about how to improve this further. One way would be to create "real" test instances for configurations, like "spin up one gitlab server with all its required classes".

The main drawback in our current setup was two-fold:

  1. I haven't enabled more than one concurrent instances of our private runner.
  2. I haven't considered the performance impact of moving to whole instance testing in other stages and parallelizing those tests.

I look forward to implementing deployment on passing tests instead of my current method of automatically deploying every change in master.


Notes

  • Build stages do run after each other, however, they do not use the same instance of the docker container and therefore are not suited for installing prerequisites and running tests in different stages. Read: If you need an additional package in every stage, you need to install it during every stage.
  • If you are curious what the set -euo pipefail commands on top of all my shell scripts do, refer to Aaron Maxwell's Use the Unofficial Bash Strict Mode.
  • Our runners as of the time of this writing use buildpack-deps:trusty as their image.

Retaining your sanity while working on SWEB

Posted on Fri 14 August 2015 in university • Tagged with operating systems, sweb

  • This post was updated 2 times.

I'll openly admit, I'm mostly complaining. This is part of who I am. Mostly I don't see things for how great they are, I just see what could be improved. While that is a nice skill to have, it often gives people the impression that I'm not noticing all the good stuff and only ever talk about negative impressions. That's wrong. I try to make things better by improving them for everyone.

Sometimes that involves a bit of ranting or advice which may sound useless or like minuscule improvements to others. This post will contain a lot of that. I'll mention small things that can make your work with your group easier.

Qemu

Avoid the "Matrix combo"

You are working in a university setting, and probably don't spend your time in a dark cellar at night staring into one tiny terminal window coding in the console. Don't live like that - unless you really enjoy it.

Set your qemu console color scheme to some sensible default, like white on black or black on white instead of the Matrix-styled green on black.

In common/source/kernel/main.cpp:

-term_0->initTerminalColors(Console::GREEN, Console::BLACK);
+term_0->initTerminalColors(Console::WHITE, Console::BLACK);

Prevent automatic rebooting

Update: I've submitted a PR for this issue: #55 has been merged.

When you want to try and find a specific problem which causes your SWEB to crash, you don't want qemu to automatically reboot and cause your terminal or log to become full with junk. Fortunately you can disable automatic rebooting.

In arch/YOUR_ARCHITECTURE/CMakeLists.include (e.g. x86/32):

- COMMAND qemu-system-i386 -m 8M -cpu qemu32 -hda SWEB-flat.vmdk -debugcon stdio
+ COMMAND qemu-system-i386 -m 8M -cpu qemu32 -hda SWEB-flat.vmdk -debugcon stdio -no-reboot

- COMMAND qemu-system-i386 -no-kvm -s -S -m 8M -hda SWEB-flat.vmdk -debugcon stdio
+ COMMAND qemu-system-i386 -no-kvm -s -S -m 8M -hda SWEB-flat.vmdk -debugcon stdio -no-reboot

Automatically boot the first grub entry

If you are going for rapid iteration, you'll grow impatient always hitting Enter to select the first entry in the boot menu. Lucky you! You can skip that and boot directly to the first option. Optionally delete all other entries.

In utils/images/menu.lst:

default=0
timeout=0 

title = Sweb
root (hd0,0)
kernel = /boot/kernel.x

Code

Use Debug color flags different from black and white

The most popular color schemes for Terminal use one of two background colors - black and white. Don't ever use those for highlighting important information unless you want your information to be completely unreadable in one of the most common setups. You can change them to any other color you like.

In common/include/console/debug.h:

-const size_t LOADER             = Ansi_White;
+const size_t LOADER             = Ansi_WHATEVER_YOU_LIKE;

-const size_t RAMFS              = Ansi_White;
+const size_t RAMFS              = Ansi_NOT_WHITE_OR_BLACK;

Use C++11 style foreach loops

You may use C++11 standard code, which brings many features of which I found the easier syntax for writing foreach loops most beneficial. This way of writing foreach loops is shorter and improves the readability of your code a lot.

This is the old style for iterating over a container:

typedef ustl::map<example, example>::iterator it_type;
for(it_type iterator = data_structure.begin();
  iterator != data_structure.end(); iterator++)
{
  iterator->doSomething();
  printf("This isn't really intuitive unless you've more experience with C++.\n");
}

This is the newer method I strongly suggest:

for(auto example: data_structure)
{
  example.doSomething();
  printf("This is much more readable.\n");
}

Have your code compile without warnings

Truth be told, this should go without saying. If your code compiles with warnings it is likely it does not do exactly what you want. We saw that a lot during the practicals. Parts that only looked like they did what you wanted but on a second glance turned out to be wrong were already hinted at by compiler warnings.

If you don't know how to fix a compiler warning, look it up or throw another compiler at it. Since you are compiling with gcc and linting with clang you already have a good chance of being provided with at least one set of instructions on how to fix your code. Or, you know, ask your team members. You're in this together.

Besides, this is about sanity. Here, it's also about code hygiene.

Your code should be clean enough to eat off of. So take the time to leave your [...] files better than how you found them. ~Mattt Thompson

Git

I assume you know the git basics. I am a naturally curious person when it comes to tech (and a slew of other topics) and know a lot of things that don't have any relation to my previous work but I've been told that a lot of people don't know the workflow around github which has become popular with open source. I'll try to be brief. The same workflow can be applied to the gitlab software (an open source solution similar to github).

Let's assume you want to make a change to an open source project of mine, homebrew-sweb. You'd go through the following steps:

  1. Click "fork" on my repository site.
  2. Create a new branch in your clone of the project.
  3. Make changes and commit them.
  4. Push your new branch to your remote.
  5. Click the "submit pull request" button.

This means you don't have write access to their repository but they can still accept and merge your changes quickly as part of their regular workflow. Now, some projects may have differing requirements, e.g. you need to send your PRs to the develop branch instead of master.

A simpler version of this workflow can and should be used when working as a group. Basically use the existing steps without forking the repository.

Have feature branches

You don't want people to work in master, you want to have one known good branch and others which are in development. By working in branches, you can try and experiment without breaking your existing achievements.

Working with branches that contain single features instead of "all changes by Alex" works better because you can merge single features more easily depending on their stability and how well you tested them. This goes hand in hand with the next point.

When working with Pull Requests this has another upside: A Pull Request is always directly linked to a branch. If the branch gets updated server-side, the PR is automatically updated too, helping you to always merge the latest changes. When a PR is merged, the corresponding branch can be safely deleted since all code up to the merge is in master. This helps you avoid having many stale branches. Please don't push to branches with a PR again after merging.

Have a prefix in your branch names

Having a prefix in your branch name before its features signals to others who is responsible for a feature or branch. I used alx (e.g. alx-fork) to identify the branches I started and was the main contributor of.

Always commit into a feature branch

Committing directly into master is equal to not believing in code review. You don't want to commit into master directly, ever. The only exception for this rule in the Operating Systems course is to pull from upstream.

Since you probably set up the IAIK repository as upstream, you would do the following to update your repository with fixed provided by the course staff:

git checkout master
git pull upstream master
git push origin master

When it comes to team discipline I will be the one enforcing the rules. If we agreed on never committing into master I will revert your commits in master even if they look perfectly fine.

Have your reviewers merge Pull Requests

Now, you might wonder why you wouldn't just merge a PR someone found to be fine into master yourself. That is very simple. By having the reviewer click the Merge button, you can track who reviewed the PR afterwards.

Also, it doesn't leave the bitter taste of "I'm so great that I can merge without review" in your mouth. :)

Make sure your pull requests can be automatically merged

Nobody likes merge conflicts. You don't and your group members certainly don't. Make sure your branch can be merged automatically without conflicts into master. That means that before opening a Pull Request, you rebase your branch from master.

git checkout master
git pull
git checkout your-feature-branch
git rebase master

Repeat this process if master was updated after you submitted your PR to make sure it still can be merged without conflicts.

I want to make one thing very clear: As the person sending the Pull Request, it is your responsibility to make sure it merges clean, not the maintainer's nor the project leader's.

The reasoning behind this is taken from open source projects: Whenever you submit a patch but do not intend to keep on working on the software, you are leaving the burden of maintaining your code on the main developer. The least you can do is make sure it fits into their existing code base without additional pain.

Conclusion

There is quite a lot you and your partners can do to make the term with Operating Systems go a lot smoother. Some of it has to do with tech others with communication and team discipline. In case you're about to enroll in the course or already have, I wish you the best of luck!

Further reading:


I'll talk to Daniel about some of those issues and which might be okay to change. He's quite thoughtful about what to include and what not to accept for the project as it's delivered to the students. I'll see which suggestions can be sent upstream and update this post accordingly.