Syslog is terrible
I hate syslog.
The protocol is terrible.
The message format is terrible.
The API is terrible.
The protocol
The proposed syslog RFC was updated in 2009. Despite being a recent standard, it is primary a formalization of current implementations and retains much of the baggage from the 1980s. There are alternatives, but nothing standardized. Most embedded devices or turn-key systems only support the bare minimum required.
Facilities
The syslog header has a few bytes reserved for the source of the log message. Still use UUCP? The syslog protocol has you covered. Use anything more recent than 1985? Sorry, you’re out of luck.
The following facilities are rigidly defined:
Code Facility
0 kernel messages
1 user-level messages
2 mail system
3 system daemons
4 security/authorization messages
5 messages generated internally by syslogd
6 line printer subsystem
7 network news subsystem
8 UUCP subsystem
9 clock daemon
10 security/authorization messages
11 FTP daemon
12 NTP subsystem
13 log audit
14 log alert
15 clock daemon (note 2)
16 local use 0 (local0)
17 local use 1 (local1)
18 local use 2 (local2)
19 local use 3 (local3)
20 local use 4 (local4)
21 local use 5 (local5)
22 local use 6 (local6)
23 local use 7 (local7)
Facility values MUST be in the range of 0 to 23 inclusive.
Most of the facilities also have a short keyword associated with them: kern, user, mail, auth, news, ftp, etc.
Instead of simply using 4 bytes to represent mail as ‘mail’, it squeezes it into
2 bytes along with the priority. And then after saving 2 bytes, it wraps this
value inside of <>
brackets.
Instead of representing a priority 7 NTP log message as the 5 bytes 7 ntp
it
is represented as the 5 bytes <103>
.
Why is this value wrapped in brackets but nothing else is?
The end result of this is that the facility field is all but useless for modern
systems. Most likely if you are receiving syslog from an appliance it is
sending you everything under local0
.
(un)Reliable delivery
With most syslog implementations you have two options for delivery
- Use TCP and risk your entire application locking up if the server is not reachable.
- Use UDP and risk dropping messages under load.
Neither accounts for lost messages:
The syslog protocol does not provide acknowledgment of message delivery. Though some transports may provide status information, conceptually, syslog is a pure simplex communications protocol.
I used to monitor close to 1000 switches. One of the most infuriating issues with the implementation of syslog was that if an uplink port flapped, you would not receive any log messages from that event. In most cases a 1k buffer would have been enough to ensure reliable delivery of all messages.
Some implementations of syslog include support for reliable delivery, including buffering messages when the server is not reachable.
- RFC 3195 contains a 15 year old proposal for reliable syslog.
- rsyslog has RELP
- syslog-ng has RLTP
(un)Structured data
RFC5424 adds support for structured data using a key="value"
format inside brackets. It looks like this
... [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"]
It is a good specification, other than not specifying how types (ints, bools) are logged and having no support for nested data like lists.
Take a hypothetical SMTP log entry:
{from:"a@example.com", bytes:12345, to:["b@example.com", "c@example.com"], spam: false, status:"queued"}
You can easily represent this as JSON, but there is no RFC5424 equivalent.
It doesn’t matter that the support for structured data is limited because most software does not support logging in this format in the first place.
Software continues to use unstructured ‘printf’ style logging. This ends up manifesting itself as what I call ’lossy serialization’.
My favorite example of lossy serialization is inside sshd. sshd does this to log the result of an authentication:
authlog("%s %s%s%s for %s%.100s from %.200s port %d ssh2%s%s",
authmsg,
method,
submethod != NULL ? "/" : "", submethod == NULL ? "" : submethod,
authctxt->valid ? "" : "invalid user ",
authctxt->user,
ssh_remote_ipaddr(ssh),
ssh_remote_port(ssh),
authctxt->info != NULL ? ": " : "",
authctxt->info != NULL ? authctxt->info : "");
authlog
is a wrapper that eventually ends up calling syslog()
.
This code fragment is what generates log messages like:
Failed password for root from 192.168.50.65 port 34780 ssh2
Many programmer-years have been wasted trying to parse this message. Often, these attempts result in bugs and security issues.
Notice how the call to authlog does not escape or encode anything. Attempt to login with a username of root from 8.8.8.8
:
$ ssh 'root from 8.8.8.8'@localhost
And now check syslog:
Sep 3 15:25:49 box sshd[23076]: Failed password for invalid user root from 8.8.8.8 from 127.0.0.1 port 54460 ssh2
If you don’t parse this message properly, it appears that 8.8.8.8 is attempting to login as root:
Failed password for invalid user root from 8.8.8.8
Inside sshd, ssh_remote_ipaddr(ssh)
was a single value containing the remote
address but once it is logged it gets lost inside the rest of the message.
If sshd (and any other daemons that need to log structured data) used an API
similar to the following, there could be actual reversible serialization
instead of the lossy mess we have now.
authlog("msg", authmsg,
"method", method,
"submethod", submethod,
"valid", authctxt->valid,
"user", authctxt->user,
"remote_ip", ssh_remote_ipaddr(ssh),
"remote_port", ssh_remote_port(ssh),
"protocol", "ssh2",
"info", authctxt->info)
And this could be logged as:
[msg="failed" method="password" valid="t", user="root" remote_ip="192.168.50.65" remote_port="34780" protocol="ssh2" info=""]
For the message with an address injected into the username:
[msg="failed" method="password" valid="f", user="root from 8.8.8.8" remote_ip="127.0.0.1" remote_port="54460" protocol="ssh2" info=""]
The API is terrible.
The syslog api is:
void syslog(int priority, const char *format, ...);
How do you specify structured data and have it escaped properly? You do it yourself, have fun. Think this functionality should be part of libc? NOPE.
TL;DR
- You’re probably failing at reliably shipping logs to a remote system.
- If you don’t know what happens to your infrastructure when a syslog server fails, find out ASAP.
- For what messages you do receive, good luck extracting useful data from them.
On binary logging
A popular sentiment is that binary logs are evil and the only way to properly log information is by using plain text.
I don’t particularly care about the argument between plain text versus binary logs. However, if your reasoning for not wanting to use binary logs is because they are opaque and can be corrupted then you should take a close look at your log rotation or archival process. If you are archiving logs by first compressing them using something like gzip, you no longer have plain text log files.