Continuous incremental scanning

Many security tools could be better. The problem isn’t that they are buggy or don’t work properly, but that they are designed to work in a way that is not as useful to defenders as they could be.

The primary example I’m going to use is ssh server weak credential scanning. There are many existing tools that do this, such as ncrack or hydra, but I wrote my own called ssh-auditor . Why would I reinvent the wheel when there are many tools that already do this? The answer is to support continuous incremental scanning. Let’s create a tool that doesn’t support this, and see what problems we ran into.

A tool without continuous incremental scanning

A simple SSH scanner might look something like this:

def ssh_login(host, username, password):
    """
    Using something like paramiko, attempt to login
    to 'host' using 'username' and 'password' as credentials
    """
    # implementation not important

def discover(cidr):
    """
    do a basic check for each host in 'cidr' to find hosts running sshd
    """
    # implementation not important

def main():
    hosts = discover("192.168.1.0/24")
    usernames = ["root", "admin"]
    passwords = ["password", "12345"]
    for host in hosts:
        for username in usernames:
            for password in passwords:
                if ssh_login(host, username, password):
                    print("Weak credential found:", host, username, password)

This looks very simple, but if a few changes were made it would turn into a fairly useful tool:

import the multiprocessing module for parallelism.
load the hosts, username, and password information from files.

The problem

Armed with the improved version of the above script (or ncrack, or hydra), you can now test hundreds of credential pairs across thousands of servers. This is exactly what you need if you are performing a one-off audit or penetration test. You can run the tool, crank up the threads to 111 and in a few hours, have your answer.

But what if you aren’t doing a one-off audit? What if, instead of testing every credential once, you had the requirement to scan every known server at least once a week?

A flawed solution

An initial attempt at fulfilling this requirement could be to simply run the tool from cron every Monday morning. This approach has many potential problems:

Some systems running sshd could be laptops that are not on site on Mondays.
Some could be desktops used by night-shift employees that are not powered up Monday mornings.
A system using default credentials could be installed on a Tuesday.
Scanning 100% of the systems in a burst of activity can break things.
If the host list or credential list is changed mid-week, nothing is checked until the next week.

A better solution

Implementing continuous incremental scanning solves all of these problems. Instead of the process running infrequently and scanning everything each time it is ran, it can be designed to run frequently, making incremental progress. Some tools do support running slowly and using a checkpoint file, but this is not sufficient in a dynamic environment.

For our example scanner, implementing incremental scanning could look something like this:

SCAN_INTERVAL = 60 * 60 * 24 * 7
def set_last_scan(host, username, password):
    """
    Using some data store, set the last time this specific credential was
    attemped to the current time
    """

def get_last_scan_age_in_seconds(host, username, password):
    """
    Using some data store, return the last time this specific credential was
    attemped or SCAN_INTERVAL if never scanned
    """

def incremental_ssh_login(host, username, password):
    if get_last_scan_age_in_seconds(host, username, password) < SCAN_INTERVAL:
        # Already scanned in the last week
        return None
    result = ssh_login(host, username, password)
    set_last_scan(host, username, password)
    return result

def main():
    hosts = discover("192.168.1.0/24")
    usernames = ["root", "admin"]
    passwords = ["password", "12345"]

    for host in hosts:
        for username in usernames:
            for password in passwords:
                if incremental_ssh_login(host, username, password):
                    print("Weak credential found:", host, username, password)

By retrieving the last time a specific attempt was made it can be skipped as long as the last scan age is lower than the SCAN_INTERVAL.

With these changes the tool can work incrementally and has the following features:

If new hosts are added and the tool is ran again, then only those hosts will be scanned.
If new usernames or passwords are added and the tool is ran again, then they will be tested against all hosts.
As soon as SCAN_INTERVAL seconds passes, credentials will be checked again.
If a scan is interrupted it can be immediately restarted and it will continue from where it left off.
All of the above can happen at the same time and the tool will always do the right thing.

This process could be considered a form of reconciling desired and actual states. The desired state is that every host+username+password should be recorded in the data store as scanned more recently than SCAN_INTERVAL. Every time the tool is ran it will work towards this state, or do nothing if it has already reached it.

Making it continuous

Now that the tool can scan incrementally and won’t redundantly check the same credentials it can simply be ran every hour. The scan age tracking will ensure that individual credentials are tested at most once every SCAN_INTERVAL. The hourly frequency will ensure that even infrequently connected hosts are found and scanned.

Improving other tools

As I mentioned initially, most tools aren’t buggy, but they could be better. When working on a tool, consider the following questions:

Expected Usage

Is this tool designed to run once, or repeatedly? For example, there are many tools that will scan an organizations GitHub repos for leaked credentials or code that is potentially vulnerable. An attacker may only run such a tool once, but someone from the organization may want to run it on a daily basis. If the tool is not designed for this use case, it will likely re-scan the same files each time it is ran. It may even re-clone entire repositories instead of simply reusing previously downloaded data.

Expected Completeness

Is this tool designed to find all potential issues, or just any issue? An attacker may only need to find one problem, but a defender needs to find and correct all of them. Finding all problems means ensuring that the audit has completed and the desired state has been reached.