Running NixOS in Production

NixOS logo

NixOS logo created by Tim Cuthbertson (@timbertson), licenced under CC-BY 4.0.

I’ve moved some items from my previous site infrastructure over to a VPS powered running NixOS in the past weekend. What made me do it? There were a few reasons that came to mind for it.

The first is that I’m sort of done running on OpenVZ—the performance is pretty bare compared to other options out there. There’s only a few choices for operating systems to run on it. Ubuntu 16.04 (Xenial) is okay to work with, but on the limited resources, things always took a lot longer than they should have, whether it was running Ghost or trying to build this site on Gatsby.

Another reason for doing this is that I don’t remember a lot of the things I did setting up the site. Could I tell you I installed Certbot? Nope. Could I recite the hundred word set of compiler arguments to make I compiled nginx with Pagespeed support? Nope. Would I know how to update PHP from 7.2 once it reaches end of life? Nope—I’d have to look that one up (and I already have to do that in my day to day work!).

I could have picked any other OS to operate, like Debian or CentOS. But I ended up on NixOS for the following reasons:

  1. I wanted to learn more about NixOS, and how to run through application deployment and management to see how it stacks up against other methods.
  2. The idea of declaring my system configuration as code I could deploy was interesting to me, since I spend a lot of time running through package installation and configuration in my day-to-day—making that process easier would definitely give me time to do other things.

Some of the details

For some context, I have Nix installed on my Arch Linux system (there’s a useful guide to install it on their wiki). From there, it was a matter of putting together a few pieces of configuration. The build script itself is powered by Elixir—mainly a result of the fact I couldn’t get the Haskell script in the guide I was following to work.

The VPS is an Amazon EC2 instance, which was created using the instructions provided by NixOS to launch a provided Amazon Machine Image (AMI).

The first stop is server.nix, which is built like this:

let
  nixos = import <nixpkgs/nixos> {
    configuration = import ./configuration.nix;
  };
in
  nixos.system

configuration.nix contains the bulk of the configuration which is set up on the system:

{config, pkgs, ...}:
let
  unstable = import <nixos-unstable> {};
in
{
  imports = [ <nixpkgs/nixos/modules/virtualisation/amazon-image.nix> ./users.nix ./firewall.nix ./webserver.nix ];
  ec2.hvm = true;
  networking.hostName = "mnguyen-nix-demo";
  environment.systemPackages = [
    unstable.caddy2
    pkgs.fish
    pkgs.htop
    pkgs.mosh
    pkgs.vim
  ];

  programs.fish.enable = true;

  # sudo without requiring password
  security.sudo.wheelNeedsPassword = false;

  ## Enable BBR module
  boot.kernelModules = [ "tcp_bbr" ];

  ## Network hardening and performance
  boot.kernel.sysctl = {
    # Disable magic SysRq key
    "kernel.sysrq" = 0;
    # Ignore ICMP broadcasts to avoid participating in Smurf attacks
    "net.ipv4.icmp_echo_ignore_broadcasts" = 1;
    # Ignore bad ICMP errors
    "net.ipv4.icmp_ignore_bogus_error_responses" = 1;
    # Reverse-path filter for spoof protection
    "net.ipv4.conf.default.rp_filter" = 1;
    "net.ipv4.conf.all.rp_filter" = 1;
    # SYN flood protection
    "net.ipv4.tcp_syncookies" = 1;
    # Do not accept ICMP redirects (prevent MITM attacks)
    "net.ipv4.conf.all.accept_redirects" = 0;
    "net.ipv4.conf.default.accept_redirects" = 0;
    "net.ipv4.conf.all.secure_redirects" = 0;
    "net.ipv4.conf.default.secure_redirects" = 0;
    "net.ipv6.conf.all.accept_redirects" = 0;
    "net.ipv6.conf.default.accept_redirects" = 0;
    # Do not send ICMP redirects (we are not a router)
    "net.ipv4.conf.all.send_redirects" = 0;
    # Do not accept IP source route packets (we are not a router)
    "net.ipv4.conf.all.accept_source_route" = 0;
    "net.ipv6.conf.all.accept_source_route" = 0;
    # Protect against tcp time-wait assassination hazards
    "net.ipv4.tcp_rfc1337" = 1;
    # TCP Fast Open (TFO)
    "net.ipv4.tcp_fastopen" = 3;
    ## Bufferbloat mitigations
    # Requires >= 4.9 & kernel module
    "net.ipv4.tcp_congestion_control" = "bbr";
    # Requires >= 4.19
    "net.core.default_qdisc" = "cake";
  };

  # disable passwordless SSH
  services.openssh.passwordAuthentication = false;

  # Let trusted users upload unsigned packages
  nix.trustedUsers = ["@wheel"];

  # Clean up packages after a while
  nix.gc = {
    automatic = true;
    dates = "weekly UTC";
  };

  # Disable reinitialisation of AMI on restart or power cycle
  systemd.services.amazon-init.enable = false;

  swapDevices = [
    {
      device = "/swapfile";
      priority = 10;
      size = 1024;
    }
  ];

  systemd.services.fathom = {
    description = "Fathom Server";
    requires = ["network.target"];
    after = ["network.target"];
    wantedBy = ["multi-user.target"];

    serviceConfig = {
      Type = "simple";
      User = "minh";
      Restart = "on-failure";
      RestartSec = 3;
      WorkingDirectory = "/var/lib/fathom";
      ExecStart = "/home/minh/bin/fathom --config=/etc/fathom.env server";
    };
  };
}

imports pulls in specific NixOS configuration for EC2 which it’s running on, and I’ve started work separating out configuration to separate files, ./users.nix, ./firewall.nix, and ./webserver.nix.

Packages

You can see near the top of configuration.nix there’s a variable called environment.systemPackages. This lists all the additional software packages I want installed. In this case, I have fish, htop, mosh, and vim installed from the standard repo, as well as Caddy 2 from the unstable NixOS branch.

Swap

I don’t have any swap on the system, so the variable swapdevices is set to create a swap file at /swapfile. The size is defined in megabytes, so this file is 1 gigabyte in size.

Firewall

NixOS comes with a firewall setup out of the box to only be open for SSH on port 22. To change it up, we can set networking.firewall.allowedTCPPorts as a list of ports we want to be open.

I also want to move the default SSH port to something else, this is defined by services.openssh.listenAddresses.

In the end, my firewall.nix file looks like this:

{ config, pkgs, ...}:
{
  # SSHD Port reassignment
  services.openssh.listenAddresses = [
    { addr = "0.0.0.0"; port = 37586; }
  ];

  # Allowed TCP range
  networking.firewall.allowedTCPPorts = [ 80 443 37586 ];

  # Allow Mosh connections
  networking.firewall.allowedUDPPortRanges = [{ from = 60000; to = 60010; }];
}

Caddy server

Moving on, I set up the web server within webserver.nix:

{config, pkgs, ...}:
let
  unstable = import <nixos-unstable> {};
  caddyDir = "/var/lib/caddy";
  caddyConfig = pkgs.writeText "Caddyfile"
    ''{
  storage file_system {
    root /var/lib/caddy
  }
}
mnguyen.io {
  root * /srv/www/mnguyen.io
  file_server
  header / {
    X-Content-Type-Options "nosniff"
    X-Frame-Options "sameorigin"
    Referrer-Policy "no-referrer-when-downgrade"
    X-UA-Compatible "IE=edge,chrome=1"
    X-XSS-Protection "1; mode=block"
    Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
    Content-Security-Policy "default-src 'self'; connect-src https://analytics.mnguyen.io 'self'; font-src 'self' data:; img-src https://analytics.mnguyen.io 'self' data:; script-src https://analytics.mnguyen.io 'self' 'unsafe-inline'; style-src 'unsafe-inline'; worker-src 'self'; prefetch-src 'self'; report-uri https://mnguyen.report-uri.com/r/d/csp/enforce; report-to https://mnguyen.report-uri.com/r/d/csp/enforce"
  }
}

www.mnguyen.io {
  redir https://mnguyen.io{uri}
}

analytics.mnguyen.io {
  reverse_proxy localhost:9000
}
'';
in
{
  systemd.services.caddy = {
    description = "Caddy web server";
    after = [ "network-online.target" ];
    wants = [ "network-online.target" ];
    wantedBy = [ "multi-user.target" ];
    serviceConfig = {
      User = "caddy";
      Group = "caddy";
      ExecStart = ''
        ${unstable.caddy2}/bin/caddy run --config ${caddyConfig} --adapter caddyfile
      '';
      ExecReload = ''
        ${unstable.caddy2}/bin/caddy reload --config ${caddyConfig} --adapter caddyfile
      '';
      TimeoutStopSec = "5s";
      LimitNOFILE = 1048576;
      LimitNPROC = 512;
      PrivateTmp = true;
      ProtectSystem = "full";
      AmbientCapabilities = "cap_net_bind_service";
    };
  };

  users.users.caddy = {
    group = "caddy";
    uid = config.ids.uids.caddy;
    home = caddyDir;
    createHome = true;
    extraGroups = [ "users" ];
  };

  users.groups.caddy.gid = config.ids.uids.caddy;
}

What this does is set up the Caddyfile that Caddy 2 will be using, as well as the systemd service which sets up the command, sets the capabilities of the service so it can run at lower port numbers (below 1024 to be precise), and sets up a dedicated caddy user which it will run under.

Users

Speaking of which, users in NixOS can be defined in a few lines. This is what can go into users.nix:

{ config, pkgs, ...}:
{
  users.users.minh = {
    isNormalUser = true;
    extraGroups = ["wheel"];
    shell = pkgs.fish;

    # password = "my secure password";
    # hashedPassword = "$6$qTK.7QsrnONOr$ZsAfPlnEPLtpiO9j1qp/POkDga2LtK1UOD0nrG497CegYEq5e.E6iHf5tDqwfLViBSWEsw8sn5t885p6HyRgS1";

    openssh.authorizedKeys.keys = [
      "ssh-rsa PUBLICKEYHERE"
    ];
  };
}

The flag isNormalUser tells NixOS to give us a home directory at /home/minh. I then set up sudo access by adding this user to the wheel group, set the shell to fish, and set the allowed SSH key that is allowed to access this user on the system—I cannot sign in with a password unless it’s explicitly set.

If I did want to set a password, I could use the insecure option password which is a string, or slightly more secure passwordHash (which uses a hash generated from the command mkpasswd -m sha-512, detailed on the passwordHash options page)

Fathom analytics

I pulled the last public available release of Fathom Lite from their repository, and created a systemd service file to run it. I was worried for a bit that it wouldn’t run, as there were a lot of discussions going on the community forums about having to patch or otherwise emulate a standard Linux system to make binaries work.

Thankfully, Fathom isn’t one of those programs, and it worked flawlessly without any other work. It’s not packaged for NixOS, so this is probably the least NixOS part of this system.

Building NixOS

There are 2 commands to build our NixOS packages and send it to our remote system, then there’s 2 other commands to switch profiles and activate our new setup:

  1. nix-build: which builds the NixOS packages
  2. nix-copy-closure: sends the NixOS packages to our remote
  3. nix-env: Set the Nix profile to our uploaded packages
  4. switch-to-configuration: Switch the system configuration to our new configuration

These all come together in an Elixir script:

server_raw = File.read!("server_address.txt")
server_processed = String.replace(server_raw, "\n", "")

defmodule Build do
  def upload_to_system(path, address) do
    {path_str, _status} = path
    fixed_path_str = String.replace(path_str, "\n", "")
    System.cmd("nix-copy-closure", ["--to", "--use-substitutes", address, fixed_path_str])
    Build.activate_nix(path_str, address)
  end

  def activate_nix(path, address) do
    profile = "/nix/var/nix/profiles/system"
    System.cmd("ssh", ["#{address}", "sudo nix-env --profile #{profile} --set #{path}"])
    System.cmd("ssh", ["#{address}", "sudo #{profile}/bin/switch-to-configuration switch"])
  end
end

System.cmd("nix-build", ["server.nix", "--no-out-link"])
  |> Build.upload_to_system(server_processed)

Running the script runs all of those commands sequentially, and executes all the above. One thing I note about this is that the script hangs if there’s an issue with any of the configuration files, which could lead to a server that’s unresponsive. You may need to power cycle your remote instance if that happens.

Concluding remarks

I learned quite a bit about NixOS, but like most things involved quite a bit of trial and error. Thankfully, having this sort of configuration means I can easily rebuild the system (such as when I found myself unable to connect via SSH when I changed the port but didn’t realise the firewall needed to be changed as well).

If you’re interested in NixOS, I’d encourage you to take a look.

Useful links

In case you wanted to delve further into some of the resources I looked up while I was getting this set up:

  • Deploying NixOS to Amazon EC2 by Type Classes: This was very useful starting out to understand how to build NixOS locally, and how to ship it off to a different installation.
  • Running a local dns and web server by José Luis Lafuente: I found the section to define the custom systemd service very useful (just a heads up you want to set home for the user in that config to the Caddy data directory—mine was /var/lib/caddy)