Docker¶
The data gathering scripts can be run on a centralized machine with the appropriate setup (see Installation) or within one or more Docker instances which collect (a part of) the data.
First, you must have a (self-signed) SSL certificate for the controller server
which provides the API endpoints. Place the public certificate in the certs/
directory.
Then create a VERSION
file in the repository root containing a version string
of the agent, which usually indicates the branch name and commit hash behind
the module version, separated by dashes. If the VERSION
file is not
available, then the module version is used as is for internal purposes, but the
version information toward other components will be degraded. Therefore, we
recommend creating the VERSION
file as follows from the repository:
echo $(grep __version__ gatherer/__init__.py | \
sed -E "s/__version__ = .([0-9\\.]+)./\\1/")-$(git rev-parse \
--abbrev-ref HEAD)-$(git rev-parse HEAD) > VERSION
Now run docker build -t gros/data-gathering .
to build the Docker image. You
may wish to use a registry URL before the image name and push the image there
for distributed deployments.
Next, start a Docker instance based on the image. For example, run it with
docker run --name data-gathering -v env:/home/agent/env gros/data-gathering
to start the instance using environment variables from a file called env
to
set configuration, according to the file format
specified in that section. Ensure that the env
file is not actually your
virtual environment directory, which are also often called env
. You can also
set environment variables using the -e VARIABLE=value
flag before the image
name in the docker run
command, but this is less versatile.
Depending on this configuration, the Docker instance can run in ‘Daemon’ mode
or in ‘Jenkins’ mode. In ‘Daemon’ mode, the instance periodically checks if it
should scrape data. Therefore, it should be started in a daemonized form using
the option -d
. Set the environment variables CRON_PERIOD
(required) and
BIGBOAT_PERIOD
(optional) to appropriate periodic values (15min, hourly,
daily) for this purpose. To start a ‘Jenkins-style’ run, use the environment
variable JENKINS_URL=value
, which performs one scrape job for the projects
defined in the JIRA_KEY
environment variable immediately and terminates.
As mentioned, you can pass environment variables using the Docker parameter
-e
, or with the environment
section of a Docker Compose file.
Additionally, configuration is read from environment files which is stored in
/home/agent/env
or /home/agent/config/env
, via volume mounts as mentioned
before. For example, you can skip some of the pre-flight checks using
PREFLIGHT_ARGS="--no-secrets --no-ssh"
(see all these options from
scraper/preflight.py
). Note that you can enter a running docker instance
using docker exec -it data-gathering /home/agent/scraper/agent/env.sh
which
sets up the correct environment to run any of the scripts described in the
overview.
Normal operation of an agent requires a controller setup which is able to handle the pre-flight checks, including registration for exchanging SSH keys and encryption tokens, as well as the eventual export of the collected data.
More details regarding the specific configuration of the environment within the Docker instance can be found in the environment section.
Compose¶
For advanced setups with many configuration variables or volume mounts, it is
advisable to create a docker-compose file
to manage the Docker environment and resulting scraper configuration. Any
environment variables defined for the container are passed into the
configuration. During the build, a file called env
can be added to the build
context in order to set up environment variables that remain true in all
instances. For even more versatility, a separate configuration tool can alter
the configuration and environment files via shared volumes.
Example setups for Docker Compose can be found in separate repositories for BigBoat compose and agent configuration.