Error messages are hard

Error messages are hard. When a program logs an error message sometimes both of these things can be true:

The error message can be 100% accurate.
The error message can be completely unhelpful, if not outright misleading.

Let’s look at some examples:

python JSONDecodeError

import json
import sys

with open(sys.argv[1]) as f:
    contents = f.read()
print(json.loads(contents))

What happens if we run that against an invalid file:

$ python3 python_json.py test_one.txt
Traceback (most recent call last):
  File "python_json.py", line 5, in <module>
    print(json.loads(contents))
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

What is the root cause here? The file doesn’t contain JSON, it is empty.

Go and JS do a little better in this case, they both say

unexpected end of JSON input

This is true, but the message could still be improved to check for the empty case.

You may be thinking at this point that while the message is bad, you can assume that specific error means that the file was empty. You would be wrong.

Let’s run this same program again against another invalid file, and see what we get:

$ python3 python_json.py test_two.txt
...
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

That file was not empty, it was an HTML file:

$ cat test_two.txt
<!doctype html>
...

Go and JS do a bit better here:

invalid character '<' looking for beginning of value
Uncaught SyntaxError: Unexpected token '<', "<!doctype html>" is not valid JSON

Why might you run into this error? These errors are common in python programs that use the requests library to fetch a JSON file from a web server, but don’t first call raise_for_status() or check the status_code directly:

Bad

r = requests.get("https://example.com/file.json")
data = r.json()

Good

r = requests.get("https://example.com/file.json")
r.raise_for_status()
data = r.json()

Without the call to raise_for_status that program will raise the same json.decoder.JSONDecodeError. With the call to raise_for_status it will raise:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://example.com/file.json

How could this be improved?

The error messages from Go and JS show that there are some easy wins here. Multiple languages could benefit from special casing the empty string case.

OpenSSL wrong version number

Let’s run curl and see if we can understand what the problem is from the error message:

$ url=https://...
$ curl $url
curl: (35) OpenSSL/3.0.8: error:0A00010B:SSL routines::wrong version number

This error message may lead you to believe that there is an issue with the TLS version being used. For example a client that wants to use TLS 1.3 connecting to an old server that only supports 1.2.

If we run curl with --trace -, we can see what the real issue is:

...
<= Recv SSL data, 5 bytes (0x5)
0000: 48 54 54 50 2f                                  HTTP/
== Info: OpenSSL/3.0.8: error:0A00010B:SSL routines::wrong version number

The problem is that the server is speaking plain-text HTTP, not TLS, on that port. Where it was expecting to receive a SSL handshake, it received

HTTP/1.1 400 Bad Request
Server: nginx
...

Normally it’s supposed to get something like this:

<= Recv SSL data, 5 bytes (0x5)
0000: 16 03 03 00 7a                                  ....z

Since HTTP/ in ASCII is not a proper version, OpenSSL complains. The error message is accurate, but not helpful.

The assumption the user might make is that there is some issue with the supported TLS version, and not that the server is not even speaking TLS in the first place.

How could this be improved?

OpenSSL could include the value in the error message, or make it available to the caller somehow.

This would help, but logging 485454502f without converting it to ASCII first isn’t as helpful as it could be. A web search for 485454502f has results full of people who have that value in an error message, but don’t know why.

If the value was available to the caller, libcurl could check for this case.

OpenSSL could check if the invalid version is 485454502f and if so, handle this specific case. Since OpenSSL is used for more than just HTTP this might not be appropriate.

Connection reset by peer

Connection reset by peer, AKA ECONNRESET, AKA errno 104.

In theory, this error is straightforward. It is generated when an RST packet is received from the remote side of a TCP connection. If the client is logging this error message, the log on the server side should indicate what the problem is.

In the real world, you may find that both sides of a connection are inexplicably logging ECONNRESET at the same time. The root cause for this is that there is a middlebox somewhere resetting the connection. This device can be an “inline IPS”, or a “Next Gen Firewall”.

To further complicate things, admins of such devices are often unaware that the underlying mechanism used to block connections may involve injecting spoofed TCP resets. I was once troubleshooting this exact issue on a customer network and asked “Do you have an IPS or a Next Gen Firewall such as a Palo Alto”. They responded “yes”, and we got their Palo admin on the call. Their Palo admin proceeded to tell me that the Palo couldn’t be the cause of this problem, because “The Palo doesn’t do that”. We then did a search in the Palo UI for our IP address and the result was a screen full of entries that had an action not of something like allow, but of reset-both. The action reset-both corresponds to sending spoofed TCP resets to both the client and the server at the same time.

How could this be improved?

This is a hard one. I suppose the error message could be more accurate as

“Connection reset by peer or (because RST packets are not authenticated in any way) an RST packet was spoofed by a next gen firewall”