Setting up NIX on a multi-users HPC environment
Written by Bruno BzeznikUpdated: 2018-04-06
Introduction
Note If you’re searching for a tutorial about Nix usage, you can go on our NIX TUTORIAL page. The current page is for system administrators that want to set up a multi-user Nix environment on a Linux HPC platform.
At GRICAD, we provide NIX as a reproducible and portable computing environment for the users of our High Performance Computing facilities. This post is about how we’ve set up our computing clusters for supporting NIX and some custom packages not yet pushed or not to be pushed into the upstream Nixpkgs repository. We succeeded this set-up on 2 different platforms: a BullX one (which is actually CentOS based) and a Debian based one.
What we call an HPC cluster here, is simply a set of interconnected Linux computing nodes and one or several head nodes. The computing nodes and the head nodes share some common network filesystems, generally NFS or a more performant solution (Lustre, BeeGFS, …). Users log on the head nodes and submit jobs on the computing nodes by using a special piece of software generally called a batch scheduler. So users have no direct access to the nodes, but may do some interactive tasks for preparing the jobs on the head nodes.
Regarding this Nix installation:
- on the head nodes: users may set-up nix profiles, install, compile or create nix packages,…
- on the computing nodes: users have just an execution context (no need for access to profile/packages management from the computing nodes).
- on every nodes: a common Nix store is shared, allowing efficient use of the space occupied by the installed packages, even for the custom packages of a given user that he may share with the others.
As a reference, this blog post helped us a lot.
The /nix store
One of the most important thing to set-up to allow users to use Nix efficiently is a /nix
directory shared on all the computing and head nodes. This path must be exactly /nix
in order to be able to use the official Nix binary caches. It’s possible to install the Nix store in another path, but in this case, every package and it’s dependencies will need to be recompiled as the pre-compiled binaries will not be useable. Package recompilation is a task that Nix does very well, on the fly, at installation time; but the problem is that it may require quite a lot of time and resources on the head node.
So, for example, set-up a mount on an NFS filesystem on all of your nodes:
Install NIX
The second step is to install the Nix command line tools. Download the source tarball from the Getting Nix page. Don’t follow the quick way with the install script, it’s not suitable for a multi-users installation.
In our computing center, we use environment modules and a software repository shared on all the nodes into the path /applis
. Even if it becomes obsolete if you use Nix, it’s not incompatible, so we choose to install the nix tools the same way we install other modules. For example, here’s how we did (of course, it will not be suitable for you, and you’ll have to adapt to your software environment):
Note that there’s no need to make this installation as root. You just have to make the nix binaries available to all of your nodes.
So, as a result, we have working nix tools binaries compiled into the path /applis/site/stow/gcc_4.8.2/nix_1.11
and we make them available with:
Then, try a simple test:
Create the build users
Nix will allow the users to automatically build (ie compile) the content of the packages. In a multi-users environment, nix provides a daemon that is responsible of the security of the shared nix store. So, the builds and packages installations are not directly done by the users, but by the nix-daemon
which is using common anonymous users. This principle also allows you (as the system administrator) to have some control over the build process, for example if you want to limit the number of build processes that can be run at the same time.
The following steps are to be run as root on a head node. If you have several head nodes for one computing cluster, you’ll have to do that only on one of your head nodes. Choose a powerful head node, as it will be the one executing the builds. We’ll se later how to allow the other head nodes to interact with the nix-daemon to be able to manage installations and builds (but the builds will always run on the node you configured).
Create a group for the nix build users:
Then, create 10 build users (still as root on your master head node):
Finally, initiate the configuration file and the store:
The multi-users profile script
To use NIX, your users will have to source a shell script into their environment. Here is a simple nix-multiuser.sh based on the one we are currently using. You might have to add/customize some environement variables.
Basically, this scripts:
- sets the PATH to nix tools binaries
- sets the NIX_PATH variable, that may be necessary for some advanced operations and the use of a custom channel. We’ll see this later…
- initializes per-user directories and configuration files
- sets the NIX_REMOTE variable that is necessary to use the NIX daemon
Put this file in a convenient place for your users, into a shared directory that is visible from all of your computing nodes and head nodes. For us, it is /applis/site/nix.sh
.
Our users are told to do this, in order to load NIX:
Starting the NIX daemon
The daemon must be started as root, after loading the Nix environment multiuser script:
Of course, you’ll have to place this into a startup-script in a convenient place for your distribution.
The Channel
As root, first, add and update a channel:
Testing
Now, you should be able to use Nix as a simple user from the head node running the daemon. Let’s do some basic operations:
The installed packages should also be useable from a computing node. Log on a node (probably using your batch scheduler) and do some tests:
Setting up other head nodes for nix-daemon access through socat (optional)
The NIX daemon only listen on a Unix socket. There’s no TCP socket. So if you have several head nodes, or if you want packages manipulations (installations, compilations, removal,…) possible from the computing nodes, you can set-up a socat TCP tunnel to make the Unix socket available through the network.
In order to do that, you need to make the socket directory point to a local directory (before starting the Nix daemon):
Then, you can start the socat tunnel into /var/run/nix/socket. Nix will use /nix/var/nix/daemon-socket/socket that points to the local socket:
Setting up a local Nix channel (optional)
We wanted to be provide our users some non-official packages (for example packages that are not yet merged into the upstream nixpkgs repository).
A channel is no more than an http server providing a binary-cache directory and an archive file of the maintained nix-expressions.
Feel free to check our ciment-channel on Github. This channel wraps the nixpkgs respository (as a git submodule) into a ciment channel that may provide some custom packages.