An explanation to a hand full of rsync parameters

Posted on Thu 05 October 2017 • Tagged with Institute for Computer Vision and Computer Graphics, Work

If you check out one of the community’s most favourite syncing and file transfer tools, rsync on any day, you will notice it has quite a lot of parameters (rsync --help). Here’s a short explanation of some of them and what they can be used for - taken from the .gitlab-ci.yml files of their respective projects.

example 1

This call is used in our automated deployment process for a web based project. It deploys the project into the correct directory, ensuring that it can still be read and executed after syncing. This is where the setting of group and user is important, since those are used for the creation of new files as well as reading of code by the interpreter. Since this is from a Git repository, it only makes sense to have a hidden file present to avoid syncing of specific files. Personally, I prefer to add --stats and --human-readable to every rsync that’s used for a deployment since you can see what changed on site in the GitLab build logs.

# In the .gitlab-ci.yml the command is on one line to avoid errors
# for the sake of readability I have reformatted the call.
# The order of the optional parameters has been changed to be
# more coherent.
sudo rsync --recursive \
           --perms \
           --stats \
           --human-readable \
           --times \
           --group \
           --owner \
           --usermap=gitlab-runner:labelme,0:labelme \
           --groupmap=gitlab-runner:labelme,0:labelme \
           --super \
           --exclude-from=.rsyncExcludeFiles \
           . /home/labelme/LabelMeAnnotationTool-master/

Copy everything including all files and subfolders (--recursive) from the current location (.) to the target (/home/labelme/LabelMeAnnotationTool-master).
Ensure that the permissions are the same at the target as they were at the source (--perms).
Afterwards display detailed, human-readable statistics about the transfer (--human-readable, --stats).
Make sure the modification times of the files at the target location matches the ones at the source location (--times).
Modify the group and owner of the files (--group, --owner)
from gitlab-runner and from the user with userid 0 to labelme (--usermap=FROM_USER:TO_USER,FROM_USERID:TO_USER,--groupmap=FROM_GROUP:TO_GROUP,FROM_GROUPID:TO_GROUP).
Explicitly try to use super-user operations (--super), e.g. changing owners. This will lead to errors if such operations are not permitted on the receiving side, indicating a lack of permissions or filesystem features, enabling you to detect if something went wrong.
Skip listed files in .rsyncExcludeFiles while syncing (--exclude-from=EXCLUSION_LIST_FILE).

example 2

This call is used when deploying our Puppet configuration from Git. It was here that I first had the need to use the --*map: features, since the files initially ended up being owned by gitlab-runner. This is fine when every file you deploy via Puppet to another machine is explicitly listed with its permissions and owner set in your codebase. If this is not the case, Puppet (3.x) will implicitly set the owner to the same UID that is used on the Puppetmaster - leading to all kinds of strange situations. To avoid this, I’m mapping owners and groups.

Additionally I’ll chmod 640 all files and chmod 750 all directories that have been synced via the call to avoid having unsafe permissions on anything. All critical things should have their permissions set explicitly in our codebase anyway.

I have arrived at this specific combination of options when I wanted to list only files that are really changed at the Deploy step. Since during downloading dependencies the cache of Puppet modules is invalidated, all external files are marked as new every time. This can be circumvented via checksumming but it still leaves the modification date of directories changed (they are set when the downloaded archives are unpacked), therefore requiring the --omit-dir-times. Now, with this combination and --verbose the logs contain only files changed in our codebase and ones that changed due to changes in our dependencies. There are no longer hundreds of files marked as changed just because r10k needed to fetch a module again.

# in the .gitlab-ci.yml the command is on one line to avoid errors
# for the sake of readability I have reformatted the call.
# The order of the optional parameters has been changed to be
# more coherent.
sudo rsync --recursive \
           --times \
           --omit-dir-times \
           --checksum \
           --sparse \
           --force \
           --delete \
           --links \
           --exclude=.git* \
           --group \
           --owner \
           --usermap=gitlab-runner:root \
           --groupmap=gitlab-runner:puppet \
           --human-readable \
           --stats \
           --verbose \
           . /etc/puppet

Copy everything including all files and subfolders (--recursive) from the current location (.) to the target (/etc/puppet).
Make sure the modification times of the files at the target location matches the ones at the source location (--times).
When checking which files to sync, ignore the modification dates on folders (--omit-dir-times) and rely on checksumming only (--checksum) instead of checking modification times and file size.
Try to intelligently handle sparse files (--sparse). I’m rather sure this ended up here without any actual cause, by picking parameters from a meta parameter.
Delete files at the destination that are not at the source (--delete) and include empty directories while deleting (--force).
Recreate symlinks at the destination if there are any at the source (--links).
Exclude Git specific files and folders (--exclude=.git*).
Modify the group and owner of the files (--group, --owner)
Change the owner from gitlab-runner to root (--usermap=FROM_USER:TO_USER)
Change the group from gitlab-runner to puppet (--groupmap=FROM_GROUP:TO_GROUP)
Afterwards display detailed, human-readable statistics about the transfer (--human-readable, --stats).
Additionally display files transferred and a summary (--verbose).

Notes

If you are using sudo in combination with an automated system in which non-admin users can access sudo without password for specific tasks, make sure you have appropriate whitelists in place. You could, for example, restrict the use of sudo to a specific user, on a specific host witch a specific command. Given that the syntax of sudoers is not as precise as it might be with regular expressions, you’ll have to be quite specific what command you’ll want to allow and where to put wildcards, should you use any. Here’s an example.

# No line breaks here, since it might confuse readers and they might end up with a damaged `sudoers` config.
your_user your_host.fully.qualified.domain= NOPASSWD: /full/path/to/command --a_parameter --another__parameter /the_source /the/target/location