Justin Azoff

Hi! Random things are here :-)

(Ab)using the NixOS Test framework for clean room PCAP generation

Generating clean room PCAP files can be a difficult orchestration problem. The NixOS Test framework makes this easy.

The Problem

I work with Zeek and Suricata which process live network traffic. When a performance or correctness issue arises on a customer network, a PCAP file is required to reproduce and resolve the issue. Unfortunately, capturing a live PCAP from a customer network is typically not feasible. Even if it were possible, such a file could contain sensitive information, making it unsuitable for inclusion in a public test suite. Therefore, it is necessary to generate a PCAP file from scratch.

A basic example without Nix or NixOS

We can automate the creation of a PCAP showing curl making an http request over a TLS connection using a script like this:

tcpdump -n -i wlp1s0 -w curl.pcap 'host 162.243.250.187' &
pid=$!
sleep 2
curl -4 -o /dev/null https://justin.azoff.dev/
sleep 1
kill $pid
wait $pid

Running that script generates a PCAP:

❯ sudo bash get-pcap.sh  
tcpdump: listening on wlp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15407  100 15407    0     0  93160      0 --:--:-- --:--:-- --:--:-- 93375
39 packets captured
39 packets received by filter
0 packets dropped by kernel

And the resulting PCAP shows the TLS session:

❯ tshark -r curl.pcap    
    1   0.000000 192.168.5.157 → 162.243.250.187 TCP 74 56546 → 443 [SYN] Seq=0 ...
    2   0.012140 162.243.250.187 → 192.168.5.157 TCP 74 443 → 56546 [SYN, ACK] Seq=0 ...
    3   0.012265 192.168.5.157 → 162.243.250.187 TCP 66 56546 → 443 [ACK] Seq=1 Ack=1 ...
    4   0.015841 192.168.5.157 → 162.243.250.187 TLSv1 583 Client Hello (SNI=justin.azoff.dev)
    ...

This process works, but it has a number of issues:

  • The version of curl wasn’t pinned to a specific version. This likely doesn’t matter for curl, but not every tool will work the same months or years down the road.
  • It depended on an external server to respond to the request. This script reached out to the internet and was not 100% self contained.
  • Testing more complicated setups like VPN tunnels or network filesystems is not easy.

Tools like docker and docker-compose can solve some of these issues, but the shared kernel makes reproducing anything involving VPN tunnels or filesystems difficult.

Enter the NixOS Test framework

The NixOS test framework is not specifically built for PCAP generation. However, it has the following features which make it particularly good at this task:

  • It supports creating multiple concurrent VMs that do not share a kernel.
  • The VMs do not have internet access, ensuring that tests are self contained.
  • VMs can be made to run arbitrary services and commands.
  • VMs bind mount the nix store (packages) from the host, making iteration fast.
  • Files can be copied off a VM as part of the test process.

A basic example using NixOS Test

We can start with an example from the tutorial as-is (other than bumping the nixos release).

let
  nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
  pkgs = import nixpkgs { config = {}; overlays = []; };
in

pkgs.testers.runNixOSTest {
  name = "client-server-test";

  nodes.server = { pkgs, ... }: {
    networking = {
      firewall = {
        allowedTCPPorts = [ 80 ];
      };
    };
    services.nginx = {
      enable = true;
      virtualHosts."server" = {};
    };
  };

  nodes.client = { pkgs, ... }: {
    environment.systemPackages = with pkgs; [
      curl
    ];
  };

  testScript = ''
    server.wait_for_unit("default.target")
    client.wait_for_unit("default.target")
    client.succeed("curl http://server/ | grep -o \"Welcome to nginx!\"")
  '';
}

Saving this to a file named curl.nix and running nix-build curl.nix will build and run two virtual machines: server running nginx, and client with curl installed. curl is then ran against the nginx server on an isolated internal network.

This solves all of our issues except that it does not actually generate a PCAP file. Three additional steps are required to extend the test to create a PCAP file:

  • tcpdump must be started before curl is ran.
  • tcpdump must be stopped after curl is ran.
  • The final PCAP file needs to be copied off of the VM, otherwise it won’t be persisted.

We can use a regular systemd service to manage tcpdump and client.copy_from_vm to save the PCAP file at the end. Using a variable for the name of the PCAP helps remove some repetition.

let
  nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
  pkgs = import nixpkgs { config = {}; overlays = []; };
  pcap = "curl";
in

pkgs.testers.runNixOSTest {
  name = "client-server-test";

  nodes.server = { pkgs, ... }: {
    networking = {
      firewall = {
        allowedTCPPorts = [ 80 ];
      };
    };
    services.nginx = {
      enable = true;
      virtualHosts."server" = {};
    };
  };

  nodes.client = { pkgs, ... }: {
    environment.systemPackages = with pkgs; [
      curl
    ];
    systemd.services.tcpdump = {
      enable = true;
      wantedBy = [ "default.target" ];
      after = [ "default.target" ];
      description = "write pcap";
      serviceConfig = {
        Type = "simple";
        ExecStart = ''${pkgs.tcpdump}/bin/tcpdump -n -i eth1 -w /tmp/${pcap}.tmp'';
        ExecStopPost = ''${pkgs.coreutils}/bin/mv /tmp/${pcap}.tmp /tmp/${pcap}.pcap'';
      };
    };
  };

  testScript = ''
    server.wait_for_unit("default.target")
    client.wait_for_unit("default.target")

    client.wait_for_file("/tmp/${pcap}.tmp", 5)

    client.succeed("sleep 2")
    client.succeed("curl http://server/ | grep -o \"Welcome to nginx!\"")
    client.succeed("sleep 2")

    client.stop_job("tcpdump")
    client.wait_for_file("/tmp/${pcap}.pcap", 5)
    client.copy_from_vm("/tmp/${pcap}.pcap", "")
  '';
}

A small sleep is needed before and after starting curl because I have found that even though tcpdump creates the output file it hasn’t started capturing packets yet.

Running nix-build curl-pcap.nix takes about 30 seconds and results in a pcap that was generated from entirely self contained resources:

❯ tshark -r ./result/curl.pcap ip
    4   1.786357  192.168.1.1 → 192.168.1.2  TCP 74 36496 → 80 [SYN] Seq=0 ...
    5   1.786709  192.168.1.2 → 192.168.1.1  TCP 74 80 → 36496 [SYN, ACK] Seq=0 ...
    6   1.786752  192.168.1.1 → 192.168.1.2  TCP 66 36496 → 80 [ACK] Seq=1 Ack=1 ...
    7   1.787244  192.168.1.1 → 192.168.1.2  HTTP 135 GET / HTTP/1.1
    8   1.787470  192.168.1.2 → 192.168.1.1  TCP 66 80 → 36496 [ACK] Seq=1 Ack=70 ...
    9   1.790129  192.168.1.2 → 192.168.1.1  HTTP 890 HTTP/1.1 200 OK  (text/html)

A more complicated example

To reproduce a performance problem I needed to generate a PCAP of a decently sized OpenVPN connection using TLS. Generating such a PCAP requires many more steps than a curl request:

  • A TLS CA needs to be bootstrapped.
  • Two certs need to be generated, one for the server and one for the client.
  • The certs need to be available on both nodes so they can trust each other.
  • The corresponding OpenVPN configuration files need to be generated for the server and the client.
  • The server should run iperf to act as a traffic generator.
  • The tunnel needs to come up after tcpdump is running to ensure we capture the handshake.
  • The client needs to wait for the tunnel to establish.
  • The client needs to run iperf trigger the traffic generation.

OpenVPN documentation often uses EasyRSA to bootstrap a CA, but minica makes the process of bootstrapping a CA and generating certs even easier.

We can use nix to build a keys package using minica and share this with both VMS. Adding secrets to the nix store like this is not recommended, but these keys are ephemeral and only going to be used for this test.

We can also use some basic string interpolation to ensure the two configuration files have matching sets of IP addresses without needing to copy paste things. The transferSize variable gets passed to iperf and controls how much data is sent across the tunnel.

To support generating a pcap over 1GB in size, we set the VM disk size to be larger than normal.

The final test looks like this:

let
  nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
  pkgs = import nixpkgs { config = {}; overlays = []; };

  pcap = "openvpn-tls";
  transferSize = "100M";
  port = 1194;
  remoteIP = "10.8.0.1";
  localIP = "10.8.0.2";
  keys = pkgs.stdenv.mkDerivation {
    name = "vpn-keys";
    dontUnpack = true;
    installPhase = with pkgs;
      ''
        mkdir -p $out/
        cd $out/
        ${minica}/bin/minica -ca-cert ca.pem -ca-key ca-key.pem --domains server
        ${minica}/bin/minica -ca-cert ca.pem -ca-key ca-key.pem --domains client
      '';
    };
in
  pkgs.testers.runNixOSTest {
    name = "openvpn-client-server-test";

    nodes.server = {pkgs, ...}: {
      virtualisation.memorySize = 4096;
      virtualisation.cores = 4;
      virtualisation.diskSize = 4096;
      services.iperf3.enable = true;
      networking.firewall.enable = false;
      services.openvpn.servers.server.config = ''
        dev tun
        proto udp
        ifconfig ${remoteIP} ${localIP}
        port ${toString port}

        tls-server
        ca ${keys}/ca.pem
        cert ${keys}/server/cert.pem
        key ${keys}/server/key.pem
        dh none

        cipher AES-256-CBC
        data-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC
        auth SHA1

        auth-nocache

        keepalive 10 60
        ping-timer-rem
        persist-tun
        persist-key
      '';

    };

    nodes.client = {pkgs, ...}: {
      virtualisation.memorySize = 2048;
      virtualisation.cores = 2;
      virtualisation.diskSize = 4096;
      networking.firewall.enable = false;
      environment.systemPackages = with pkgs; [
        iperf3
      ];
      systemd.services.tcpdump = {
        enable = true;
        wantedBy = ["default.target"];
        after = ["default.target"];
        description = "write pcap";
        serviceConfig = {
          Type = "simple";
          # the command to execute when the service starts up
          ExecStart = ''${pkgs.tcpdump}/bin/tcpdump -n -i eth1 -w /tmp/${pcap}.tmp'';
          ExecStopPost = ''${pkgs.coreutils}/bin/mv /tmp/${pcap}.tmp /tmp/${pcap}.pcap'';
        };
      };

      services.openvpn.servers.client.autoStart = false;
      services.openvpn.servers.client.config = ''
        #client
        remote server
        dev tun
        port ${toString port}
        ifconfig ${localIP} ${remoteIP}

        tls-client
        ca ${keys}/ca.pem
        cert ${keys}/client/cert.pem
        key ${keys}/client/key.pem

        cipher AES-256-CBC
        data-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC
        auth SHA1
      '';
    };

    testScript = ''
      start_all()

      server.wait_for_open_port(5201)

      client.wait_for_file("/tmp/${pcap}.tmp", 5)
      client.succeed("sleep 2")

      # start the vpn here so we ensure the pcap has the full setup
      client.systemctl("start openvpn-client")

      client.wait_until_succeeds("ping -c 1 ${remoteIP}", timeout=30)
      client.wait_for_open_port(5201, "${remoteIP}", timeout=10)

      client.succeed("sleep 2")
      client.succeed("iperf3 -R -c ${remoteIP} -n ${transferSize}")
      client.succeed("sleep 2")

      client.stop_job("tcpdump");
      client.wait_for_file("/tmp/${pcap}.pcap", 5)
      client.copy_from_vm("/tmp/${pcap}.pcap", "")
    '';
  }

The resulting PCAP is the expected size and contains an OpenVPN TLS tunnel.

❯ du -hs ./result/openvpn-tls.pcap 
114M	./result/openvpn-tls.pcap
❯ tshark -r ./result/openvpn-tls.pcap ip | head
   13   3.217590  192.168.1.1 → 192.168.1.2  OpenVPN 56 MessageType: P_CONTROL_HARD_RESET_CLIENT_V2
   14   3.221703  192.168.1.2 → 192.168.1.1  OpenVPN 68 MessageType: P_CONTROL_HARD_RESET_SERVER_V2
   15   3.226324  192.168.1.1 → 192.168.1.2  TLSv1 345 Client Hello
   16   3.237735  192.168.1.2 → 192.168.1.1  TLSv1.3 1264 Server Hello, Change Cipher Spec, Application Data, Application Data
   17   3.237759  192.168.1.2 → 192.168.1.1  TLSv1.3 1207 Continuation Data
   18   3.241191  192.168.1.1 → 192.168.1.2  OpenVPN 68 MessageType: P_ACK_V1
   19   3.247758  192.168.1.1 → 192.168.1.2  TLSv1.3 1264 Change Cipher Spec
   20   3.248055  192.168.1.1 → 192.168.1.2  TLSv1.3 1264 Continuation Data
   21   3.248127  192.168.1.1 → 192.168.1.2  TLSv1.3 118 Continuation Data
   22   3.248638  192.168.1.2 → 192.168.1.1  OpenVPN 72 MessageType: P_ACK_V1