check_docker.py
===============

The purpose of this plugin is to monitor a Docker installation over NRPE.
Put this script on the same machine as your Docker installation, then connect using Nagios (Core or XI).

There are currently 5 types of checks for this plugin: containers_exist, containers_running, containers_healthy, containers_CPU, containers_memory

Prerequisites
-------------

First, add the `nagios` user to the `docker` group so it can access the docker socket:

    usermod -a -G docker nagios

Then, make sure docker is running by executing

    docker version

This will also give you the API version you're using. 
This plugin was written and tested against docker API v1.30, and will default to using http:/v1.30/ as its base URL.
If you're using an earlier or later version, or if you're using a non-standard socket location, you can change these like so:

    ./check_docker.py --baseurl='http:/v1.29/' --socket='path/to/docker.sock'

This plugin does not currently support external connections (IP + Port)

Then, make sure cURL is installed and aliased to "curl".
For the default configuration, make sure this terminal command returns data:

    curl --unix-socket /var/run/docker.sock 'http:/v1.30/containers/json?all=true' -g

You should see a relatively long JSON object, starting with [{"Id":"...
As long as those two things worked, you should be good to go!
Otherwise, see Installation.

Installation
------------

For testing purposes, here's how I set up Docker on Cent 7 (as root):

    yum remove docker docker-common docker-selinux docker-engine -y
	yum install -y yum-utils device-mapper-persistent-data lvm2
	yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
	yum makecache fast
    yum install docker-ce -y 

To start docker:

    systemctl start docker

To make containers that exit immediately:

    sudo docker run hello-world

To make containers that run without a health check (this will take over your terminal):

    sudo docker run --rm -it ubuntu bash

To make containers that are healthy:

    sudo docker run --rm -it --health-cmd="exit 0" --health-interval=5s --health-retries=5 --health-timeout=2s ubuntu bash

To make containers that are unhealthy:

    sudo docker run --rm -it --health-cmd="exit 1" --health-interval=5s --health-retries=5 --health-timeout=2s ubuntu bash

How to use this plugin
----------------------

For the most basic possible check, run

    ./check_docker.py --check-type=containers_exist

This should return OK if you have fewer than 50 containers, warning if you have fewer than 75, or critical otherwise.

If you don't specify any containers or networks, the plugin tries to check every container that exists. 
You can make this explicit using the option "--all":

    ./check_docker.py --check-type=containers_exist --all

This is equivalent to the previous check.
To specify a network or container, use the following options (names, partial IDs, and full IDs should all work):

    ./check_docker.py --networks="first_network_name,2dc" --containers="985f,second_container_name" --check-type=containers_exist

This will find the count of containers in the network with name "first_network_name" and the network matching the characters "2dc", and will also add the listed containers
to the count if they exist.

If the containers don't exist, you can list them in long output with --list-bad-containers:

    ./check_docker.py --list-bad-containers --networks="first_network_name,2dc" --containers="985f,second_container_name" --check-type=containers_exist

You can modify the warning and critical thresholds like any standard nagios plugin:

    ./check_docker.py --check-type=containers_exist -w':5' -c':1'

This should return OK if you have more than 5 containers, warning if you have fewer than 5, and critical iff you have 0.

The other checks are as follows:
containers_running will count the number of containers that list "Running" as their status.
containers_healthy will count the number of containers with "healthy", "unhealthy" and "no_check" as their health, and will compare thresholds to the "healthy" count.  
containers_CPU will get the CPU usage of each container as a percentage of the system's CPU and aggregate them. 
It will compare the aggregate usage to the warning and critical thresholds and return perfdata based on the containers and networks specified  
containers_memory will get the memory usage of each container as a percent of the specified limit for the container.
It calculates the return code and perfdata in the same manner as containers_CPU

Full list of check types, relevant options, and their explanations
------------------------------------------------------------------

# Generic options (for all checks)

--baseurl

> The base URL of your cURL API. Under most circumstances, this should reflect the version number of the API. 
Example: --baseurl='http:/v1.30/'

--socket

> The unix socket used to communicate with the Docker cURL API. Normally, this will be /var/run/docker.sock (the default).
Example: --socket='/path/to/docker.sock'

--timeout

> The time, in seconds, before the plugin will automatically return UNKNOWN. Setting this to 0 (the default) means that
the plugin will never time out.

--containers

> A comma-delimited list of container names (exact) or IDs (partial or long) that you want to monitor. Incompatible with
--networks and --all

Example: --containers="angry_wilson,2dc,b22d80264b773511419457049a490954655f00dc3b061cbc49efc0ed04d2eb7d"

--networks

> A comma-delimited list of network names (exact) or IDs (partial or long) whose containers you want to monitor. Putting a network
in a list will result in all of its containers being monitored. Incompatible with --containers and --all.

Example: --networks="network_name, 7588aa5f117b, f6c1b7588aa5f117b1e27f194b4436d35be0eecc0cd4291897c9d498aad51461"

--warning

> A list of warning thresholds which work like any other nagios plugin. Specify one threshold to have it applied to everything being monitored,
or use one threshold per container/network (depending on which option you used) to have them applied in order.

Example: --warning='@12:14,3,:5'

--critical

> A list of critical thresholds which work like any other nagios plugin. See --warning.

--perfdata-max

> A value used by performance data processors to create graphs. This value creates the maximum value shown.

Example: --perfdata-max='100'

--perfdata-min

> A value used by performance data processors to create graphs. This value creates the minimum value shown.

Example: --perfdata-min='0'

--check-type

> Specify the type of check you want to run. Supported values are containers_exist, containers_running, containers_healthy, containers_CPU, and containers_memory.

> > containers_exist counts the number of containers that exist in the networks/list of containers that you're monitoring.
Only the total count is monitored. 
If you need to check the number of containers that exist per-network, use multiple checks.

> > containers_running counts the number of containers that have "Status" as "Running" in the networks/lists of containers that you're monitoring.
Only the total count is checked

> > containers_healthy counts the number of containers that are passing/failing their healthchecks, as well as those that are missing healthchecks.
By default, it treats containers without healthchecks as failing, but this can be configured.

> > containers_CPU gathers the CPU stats for each container for the last second, and calculates percent usage based on system usage over the same period. 
It then checks these against the warning and critical thresholds.

> > containers_memory gathers the memory usage (resident set size) stats for each container, and calculates usage in bytes (or as a percentage)


--verbose
--version
--all
--list-bad-containers
--total-usage
--total-average
--networks-use-avg
--percentage
--no-check-is-healthy
--ignore-no-healthcheck
--debug


Integration
-----------

The Docker daemon's own logs already go into the system logs, so no additional
integration is needed there.

You can also retrieve containers' logs through the Docker API: for each 
container, access the endpoint /containers/{id}/logs. To get only most recent
logs, use since={UNIX timestamp} in the query.

So a call might look something like

    curl --unix-socket /var/run/docker.sock -g http:/v1.30/containers/12924892c2938124984298/logs?{%27since%27=[%27209348582395%27]}

or something close to that.

As far as getting netflow data from a docker container, I'm not really sure
how it would differ from a normal machine. There are, for instance, 
containers that generate mock netflow and send it to a specific IP/port
(https://hub.docker.com/r/networkstatic/nflow-generator/). Hopefully we can
figure this out better as the time for NNA integrations approaches.
