Justin Azoff

Hi! Random things are here :-)

Syslog is terrible

I hate syslog.

The protocol is terrible.
The message format is terrible.
The API is terrible.

The protocol

The proposed syslog RFC was updated in 2009. Despite being a recent standard, it is primary a formalization of current implementations and retains much of the baggage from the 1980s. There are alternatives, but nothing standardized. Most embedded devices or turn-key systems only support the bare minimum required.

Facilities

The syslog header has a few bytes reserved for the source of the log message. Still use UUCP? The syslog protocol has you covered. Use anything more recent than 1985? Sorry, you’re out of luck.

The following facilities are rigidly defined:

Code    Facility
   0    kernel messages
   1    user-level messages
   2    mail system
   3    system daemons
   4    security/authorization messages
   5    messages generated internally by syslogd
   6    line printer subsystem
   7    network news subsystem
   8    UUCP subsystem
   9    clock daemon
  10    security/authorization messages
  11    FTP daemon
  12    NTP subsystem
  13    log audit
  14    log alert
  15    clock daemon (note 2)
  16    local use 0  (local0)
  17    local use 1  (local1)
  18    local use 2  (local2)
  19    local use 3  (local3)
  20    local use 4  (local4)
  21    local use 5  (local5)
  22    local use 6  (local6)
  23    local use 7  (local7)

Facility values MUST be in the range of 0 to 23 inclusive.

Most of the facilities also have a short keyword associated with them: kern, user, mail, auth, news, ftp, etc. Instead of simply using 4 bytes to represent mail as ‘mail’, it squeezes it into 2 bytes along with the priority. And then after saving 2 bytes, it wraps this value inside of <> brackets.

Instead of representing a priority 7 NTP log message as the 5 bytes 7 ntp it is represented as the 5 bytes <103>.

Why is this value wrapped in brackets but nothing else is?

Brackets

The end result of this is that the facility field is all but useless for modern systems. Most likely if you are receiving syslog from an appliance it is sending you everything under local0.

(un)Reliable delivery

With most syslog implementations you have two options for delivery

Neither accounts for lost messages:

The syslog protocol does not provide acknowledgment of message delivery. Though some transports may provide status information, conceptually, syslog is a pure simplex communications protocol.

I used to monitor close to 1000 switches. One of the most infuriating issues with the implementation of syslog was that if an uplink port flapped, you would not receive any log messages from that event. In most cases a 1k buffer would have been enough to ensure reliable delivery of all messages.

Some implementations of syslog include support for reliable delivery, including buffering messages when the server is not reachable.

(un)Structured data

RFC5424 adds support for structured data using a key="value" format inside brackets. It looks like this

... [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"]

It is a good specification, other than not specifying how types (ints, bools) are logged and having no support for nested data like lists.

Take a hypothetical SMTP log entry:

{from:"a@example.com", bytes:12345, to:["b@example.com", "c@example.com"], spam: false, status:"queued"}

You can easily represent this as JSON, but there is no RFC5424 equivalent.

It doesn’t matter that the support for structured data is limited because most software does not support logging in this format in the first place.

Software continues to use unstructured ‘printf’ style logging. This ends up manifesting itself as what I call ’lossy serialization’.

My favorite example of lossy serialization is inside sshd. sshd does this to log the result of an authentication:

authlog("%s %s%s%s for %s%.100s from %.200s port %d ssh2%s%s",
    authmsg,
    method,
    submethod != NULL ? "/" : "", submethod == NULL ? "" : submethod,
    authctxt->valid ? "" : "invalid user ",
    authctxt->user,
    ssh_remote_ipaddr(ssh),
    ssh_remote_port(ssh),
    authctxt->info != NULL ? ": " : "",
    authctxt->info != NULL ? authctxt->info : "");

authlog is a wrapper that eventually ends up calling syslog().

This code fragment is what generates log messages like:

Failed password for root from 192.168.50.65 port 34780 ssh2

Many programmer-years have been wasted trying to parse this message. Often, these attempts result in bugs and security issues.

Notice how the call to authlog does not escape or encode anything. Attempt to login with a username of root from 8.8.8.8:

$ ssh 'root from 8.8.8.8'@localhost

And now check syslog:

Sep  3 15:25:49 box sshd[23076]: Failed password for invalid user root from 8.8.8.8 from 127.0.0.1 port 54460 ssh2

If you don’t parse this message properly, it appears that 8.8.8.8 is attempting to login as root:

Failed password for invalid user root from 8.8.8.8

Inside sshd, ssh_remote_ipaddr(ssh) was a single value containing the remote address but once it is logged it gets lost inside the rest of the message. If sshd (and any other daemons that need to log structured data) used an API similar to the following, there could be actual reversible serialization instead of the lossy mess we have now.

authlog("msg", authmsg,
        "method", method,
        "submethod", submethod,
        "valid", authctxt->valid,
        "user", authctxt->user,
        "remote_ip", ssh_remote_ipaddr(ssh),
        "remote_port", ssh_remote_port(ssh),
        "protocol", "ssh2",
        "info", authctxt->info)

And this could be logged as:

[msg="failed" method="password" valid="t", user="root" remote_ip="192.168.50.65" remote_port="34780" protocol="ssh2" info=""]

For the message with an address injected into the username:

[msg="failed" method="password" valid="f", user="root from 8.8.8.8" remote_ip="127.0.0.1" remote_port="54460" protocol="ssh2" info=""]

The API is terrible.

The syslog api is:

void syslog(int priority, const char *format, ...);

How do you specify structured data and have it escaped properly? You do it yourself, have fun. Think this functionality should be part of libc? NOPE.

TL;DR

  • You’re probably failing at reliably shipping logs to a remote system.
  • If you don’t know what happens to your infrastructure when a syslog server fails, find out ASAP.
  • For what messages you do receive, good luck extracting useful data from them.

On binary logging

A popular sentiment is that binary logs are evil and the only way to properly log information is by using plain text.

I don’t particularly care about the argument between plain text versus binary logs. However, if your reasoning for not wanting to use binary logs is because they are opaque and can be corrupted then you should take a close look at your log rotation or archival process. If you are archiving logs by first compressing them using something like gzip, you no longer have plain text log files.