(Ab)using the NixOS Test framework for clean room PCAP generation
Generating clean room PCAP files can be a difficult orchestration problem. The NixOS Test framework makes this easy.
The Problem
I work with Zeek and Suricata which process live network traffic. When a performance or correctness issue arises on a customer network, a PCAP file is required to reproduce and resolve the issue. Unfortunately, capturing a live PCAP from a customer network is typically not feasible. Even if it were possible, such a file could contain sensitive information, making it unsuitable for inclusion in a public test suite. Therefore, it is necessary to generate a PCAP file from scratch.
A basic example without Nix or NixOS
We can automate the creation of a PCAP showing curl making an http request over a TLS connection using a script like this:
tcpdump -n -i wlp1s0 -w curl.pcap 'host 162.243.250.187' &
pid=$!
sleep 2
curl -4 -o /dev/null https://justin.azoff.dev/
sleep 1
kill $pid
wait $pid
Running that script generates a PCAP:
❯ sudo bash get-pcap.sh
tcpdump: listening on wlp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15407 100 15407 0 0 93160 0 --:--:-- --:--:-- --:--:-- 93375
39 packets captured
39 packets received by filter
0 packets dropped by kernel
And the resulting PCAP shows the TLS session:
❯ tshark -r curl.pcap
1 0.000000 192.168.5.157 → 162.243.250.187 TCP 74 56546 → 443 [SYN] Seq=0 ...
2 0.012140 162.243.250.187 → 192.168.5.157 TCP 74 443 → 56546 [SYN, ACK] Seq=0 ...
3 0.012265 192.168.5.157 → 162.243.250.187 TCP 66 56546 → 443 [ACK] Seq=1 Ack=1 ...
4 0.015841 192.168.5.157 → 162.243.250.187 TLSv1 583 Client Hello (SNI=justin.azoff.dev)
...
This process works, but it has a number of issues:
- The version of curl wasn’t pinned to a specific version. This likely doesn’t matter for curl, but not every tool will work the same months or years down the road.
- It depended on an external server to respond to the request. This script reached out to the internet and was not 100% self contained.
- Testing more complicated setups like VPN tunnels or network filesystems is not easy.
Tools like docker and docker-compose can solve some of these issues, but the shared kernel makes reproducing anything involving VPN tunnels or filesystems difficult.
Enter the NixOS Test framework
The NixOS test framework is not specifically built for PCAP generation. However, it has the following features which make it particularly good at this task:
- It supports creating multiple concurrent VMs that do not share a kernel.
- The VMs do not have internet access, ensuring that tests are self contained.
- VMs can be made to run arbitrary services and commands.
- VMs bind mount the nix store (packages) from the host, making iteration fast.
- Files can be copied off a VM as part of the test process.
A basic example using NixOS Test
We can start with an example from the tutorial as-is (other than bumping the nixos release).
let
nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
pkgs = import nixpkgs { config = {}; overlays = []; };
in
pkgs.testers.runNixOSTest {
name = "client-server-test";
nodes.server = { pkgs, ... }: {
networking = {
firewall = {
allowedTCPPorts = [ 80 ];
};
};
services.nginx = {
enable = true;
virtualHosts."server" = {};
};
};
nodes.client = { pkgs, ... }: {
environment.systemPackages = with pkgs; [
curl
];
};
testScript = ''
server.wait_for_unit("default.target")
client.wait_for_unit("default.target")
client.succeed("curl http://server/ | grep -o \"Welcome to nginx!\"")
'';
}
Saving this to a file named curl.nix
and running nix-build curl.nix
will
build and run two virtual machines: server
running nginx
, and client
with
curl
installed. curl
is then ran against the nginx
server on an isolated
internal network.
This solves all of our issues except that it does not actually generate a PCAP file. Three additional steps are required to extend the test to create a PCAP file:
- tcpdump must be started before curl is ran.
- tcpdump must be stopped after curl is ran.
- The final PCAP file needs to be copied off of the VM, otherwise it won’t be persisted.
We can use a regular systemd
service to manage tcpdump and
client.copy_from_vm
to save the PCAP file at the end. Using a variable for
the name of the PCAP helps remove some repetition.
let
nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
pkgs = import nixpkgs { config = {}; overlays = []; };
pcap = "curl";
in
pkgs.testers.runNixOSTest {
name = "client-server-test";
nodes.server = { pkgs, ... }: {
networking = {
firewall = {
allowedTCPPorts = [ 80 ];
};
};
services.nginx = {
enable = true;
virtualHosts."server" = {};
};
};
nodes.client = { pkgs, ... }: {
environment.systemPackages = with pkgs; [
curl
];
systemd.services.tcpdump = {
enable = true;
wantedBy = [ "default.target" ];
after = [ "default.target" ];
description = "write pcap";
serviceConfig = {
Type = "simple";
ExecStart = ''${pkgs.tcpdump}/bin/tcpdump -n -i eth1 -w /tmp/${pcap}.tmp'';
ExecStopPost = ''${pkgs.coreutils}/bin/mv /tmp/${pcap}.tmp /tmp/${pcap}.pcap'';
};
};
};
testScript = ''
server.wait_for_unit("default.target")
client.wait_for_unit("default.target")
client.wait_for_file("/tmp/${pcap}.tmp", 5)
client.succeed("sleep 2")
client.succeed("curl http://server/ | grep -o \"Welcome to nginx!\"")
client.succeed("sleep 2")
client.stop_job("tcpdump")
client.wait_for_file("/tmp/${pcap}.pcap", 5)
client.copy_from_vm("/tmp/${pcap}.pcap", "")
'';
}
A small sleep is needed before and after starting curl because I have found that even though tcpdump creates the output file it hasn’t started capturing packets yet.
Running nix-build curl-pcap.nix
takes about 30 seconds and results in a pcap
that was generated from entirely self contained resources:
❯ tshark -r ./result/curl.pcap ip
4 1.786357 192.168.1.1 → 192.168.1.2 TCP 74 36496 → 80 [SYN] Seq=0 ...
5 1.786709 192.168.1.2 → 192.168.1.1 TCP 74 80 → 36496 [SYN, ACK] Seq=0 ...
6 1.786752 192.168.1.1 → 192.168.1.2 TCP 66 36496 → 80 [ACK] Seq=1 Ack=1 ...
7 1.787244 192.168.1.1 → 192.168.1.2 HTTP 135 GET / HTTP/1.1
8 1.787470 192.168.1.2 → 192.168.1.1 TCP 66 80 → 36496 [ACK] Seq=1 Ack=70 ...
9 1.790129 192.168.1.2 → 192.168.1.1 HTTP 890 HTTP/1.1 200 OK (text/html)
A more complicated example
To reproduce a performance problem I needed to generate a PCAP of a decently sized OpenVPN connection using TLS. Generating such a PCAP requires many more steps than a curl request:
- A TLS CA needs to be bootstrapped.
- Two certs need to be generated, one for the server and one for the client.
- The certs need to be available on both nodes so they can trust each other.
- The corresponding
OpenVPN
configuration files need to be generated for the server and the client. - The server should run
iperf
to act as a traffic generator. - The tunnel needs to come up after
tcpdump
is running to ensure we capture the handshake. - The client needs to wait for the tunnel to establish.
- The client needs to run
iperf
trigger the traffic generation.
OpenVPN
documentation often uses EasyRSA
to bootstrap a CA, but minica
makes the process of bootstrapping a CA and generating certs even easier.
We can use nix
to build a keys
package using minica
and share this with
both VMS. Adding secrets to the nix
store like this is not recommended, but
these keys are ephemeral and only going to be used for this test.
We can also use some basic string interpolation to ensure the two configuration
files have matching sets of IP addresses without needing to copy paste things.
The transferSize
variable gets passed to iperf
and controls how much data
is sent across the tunnel.
To support generating a pcap over 1GB in size, we set the VM disk size to be larger than normal.
The final test looks like this:
let
nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-24.05";
pkgs = import nixpkgs { config = {}; overlays = []; };
pcap = "openvpn-tls";
transferSize = "100M";
port = 1194;
remoteIP = "10.8.0.1";
localIP = "10.8.0.2";
keys = pkgs.stdenv.mkDerivation {
name = "vpn-keys";
dontUnpack = true;
installPhase = with pkgs;
''
mkdir -p $out/
cd $out/
${minica}/bin/minica -ca-cert ca.pem -ca-key ca-key.pem --domains server
${minica}/bin/minica -ca-cert ca.pem -ca-key ca-key.pem --domains client
'';
};
in
pkgs.testers.runNixOSTest {
name = "openvpn-client-server-test";
nodes.server = {pkgs, ...}: {
virtualisation.memorySize = 4096;
virtualisation.cores = 4;
virtualisation.diskSize = 4096;
services.iperf3.enable = true;
networking.firewall.enable = false;
services.openvpn.servers.server.config = ''
dev tun
proto udp
ifconfig ${remoteIP} ${localIP}
port ${toString port}
tls-server
ca ${keys}/ca.pem
cert ${keys}/server/cert.pem
key ${keys}/server/key.pem
dh none
cipher AES-256-CBC
data-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC
auth SHA1
auth-nocache
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
'';
};
nodes.client = {pkgs, ...}: {
virtualisation.memorySize = 2048;
virtualisation.cores = 2;
virtualisation.diskSize = 4096;
networking.firewall.enable = false;
environment.systemPackages = with pkgs; [
iperf3
];
systemd.services.tcpdump = {
enable = true;
wantedBy = ["default.target"];
after = ["default.target"];
description = "write pcap";
serviceConfig = {
Type = "simple";
# the command to execute when the service starts up
ExecStart = ''${pkgs.tcpdump}/bin/tcpdump -n -i eth1 -w /tmp/${pcap}.tmp'';
ExecStopPost = ''${pkgs.coreutils}/bin/mv /tmp/${pcap}.tmp /tmp/${pcap}.pcap'';
};
};
services.openvpn.servers.client.autoStart = false;
services.openvpn.servers.client.config = ''
#client
remote server
dev tun
port ${toString port}
ifconfig ${localIP} ${remoteIP}
tls-client
ca ${keys}/ca.pem
cert ${keys}/client/cert.pem
key ${keys}/client/key.pem
cipher AES-256-CBC
data-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC
auth SHA1
'';
};
testScript = ''
start_all()
server.wait_for_open_port(5201)
client.wait_for_file("/tmp/${pcap}.tmp", 5)
client.succeed("sleep 2")
# start the vpn here so we ensure the pcap has the full setup
client.systemctl("start openvpn-client")
client.wait_until_succeeds("ping -c 1 ${remoteIP}", timeout=30)
client.wait_for_open_port(5201, "${remoteIP}", timeout=10)
client.succeed("sleep 2")
client.succeed("iperf3 -R -c ${remoteIP} -n ${transferSize}")
client.succeed("sleep 2")
client.stop_job("tcpdump");
client.wait_for_file("/tmp/${pcap}.pcap", 5)
client.copy_from_vm("/tmp/${pcap}.pcap", "")
'';
}
The resulting PCAP is the expected size and contains an OpenVPN TLS tunnel.
❯ du -hs ./result/openvpn-tls.pcap
114M ./result/openvpn-tls.pcap
❯ tshark -r ./result/openvpn-tls.pcap ip | head
13 3.217590 192.168.1.1 → 192.168.1.2 OpenVPN 56 MessageType: P_CONTROL_HARD_RESET_CLIENT_V2
14 3.221703 192.168.1.2 → 192.168.1.1 OpenVPN 68 MessageType: P_CONTROL_HARD_RESET_SERVER_V2
15 3.226324 192.168.1.1 → 192.168.1.2 TLSv1 345 Client Hello
16 3.237735 192.168.1.2 → 192.168.1.1 TLSv1.3 1264 Server Hello, Change Cipher Spec, Application Data, Application Data
17 3.237759 192.168.1.2 → 192.168.1.1 TLSv1.3 1207 Continuation Data
18 3.241191 192.168.1.1 → 192.168.1.2 OpenVPN 68 MessageType: P_ACK_V1
19 3.247758 192.168.1.1 → 192.168.1.2 TLSv1.3 1264 Change Cipher Spec
20 3.248055 192.168.1.1 → 192.168.1.2 TLSv1.3 1264 Continuation Data
21 3.248127 192.168.1.1 → 192.168.1.2 TLSv1.3 118 Continuation Data
22 3.248638 192.168.1.2 → 192.168.1.1 OpenVPN 72 MessageType: P_ACK_V1