Aeron Insights - Prometheus Exporters

The Prometheus Exporters allow you to get the data about your Aeron application into a Prometheus-compatible monitoring system. There are two options for a Prometheus Exporter. One is an HTTP server that can run alongside each of your cluster nodes. Prometheus can then scrape each exporter directly. The other is a push client that can send the metrics for each node to a Prometheus Push Gateway. Prometheus can then scrape all the data from the Push Gateway. The HTTP server option is the more common approach for Prometheus-based systems, and does not require the additional Push Gateway component. The Push model may be preferable if security requirements preclude you from running a server process that listens on a port on your production systems.

Each of these exporters has its own jar that can be used to execute the exporter. The exporters gather data by reading the files created by Aeron, principally the CnC file in the media driver directory. To publish metrics for Archive and Cluster, the exporter will also need to read the archive and cluster data and MarkFile directories.

How to run the HTTP Exporter

To run the HTTP version of the exporter, execute the insights-prometheus-exporter-http-server.jar jar. This application supports the following options, which can be set through system properties:

Option Description Default

aeron.observable.sources

Paths to specific properties files, or to directories that contain properties files, for Aeron media driver, archive and cluster. See Observable Property Files for more details.

(not set, mandatory)

aeron.insights.exporter.listen.host

Address of the interface to bind to when starting the HTTP server

localhost

aeron.insights.exporter.listen.port

Port to bind to when starting the HTTP server

8080

aeron.insights.exporter.counters.custom

Path to the custom counters configuration file. (See below)

custom-counters.csv

Example:

java -Daeron.observable.sources=/aeron/config/ \
    -Daeron.insights.exporter.listen.host=${HOSTNAME} \
    -Daeron.insights.exporter.counters.custom=/aeron/config/example-counters.csv \
    -jar /aeron/libs/insights-prometheus-exporter-http-server.jar

You can see an example of running the Prometheus HTTP Exporter in the Insights Example.

How to run the Push Exporter

To run the Push Exporter, execute the insights-prometheus-exporter-push-client.jar jar. This application supports the following options, which can be set through system properties:

Option Description Default

aeron.observable.sources

Paths to specific properties files, or to directories that contain properties files, for Aeron media driver, archive and cluster. See Observable Property Files for more details.

(not set, mandatory)

aeron.insights.exporter.pushgateway.address

The base URI for the Prometheus Push Gateway

(not set, mandatory)

aeron.insights.exporter.job

The name of the job configured in Prometheus

(not set, mandatory)

aeron.insights.exporter.instance

The name or address of the instance this exporter is publishing for

(not set, mandatory)

aeron.insights.exporter.interval

How often to publish metrics

1s

aeron.insights.exporter.counters.custom

Path to the custom counters configuration file. (See below)

custom-counters.csv

Example:

java -Daeron.observable.sources=/aeron/config/ \
    -Daeron.insights.exporter.pushgateway.address=http://172.18.100.31:9091 \
    -Daeron.insights.exporter.job=aeron-insights \
    -Daeron.insights.exporter.instance=172.18.100.20:8080 \
    -Daeron.insights.exporter.counters.custom=/aeron/config/example-counters.csv \
    -jar /aeron/libs/insights-prometheus-exporter-push-client.jar

When configuring a Prometheus instance to scrape from a Push Gateway, you should set honor_labels: true. This will ensure that Prometheus recognises that the metrics were published from the Push Exporters rather than appearing to come from the Push Gateway itself. For more details, see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

You can see an example of running the Prometheus Push Exporter in the Insights Example.

How to export custom counters

In addition to the standard set of Aeron metrics published by the Insights Prometheus Exporters, you can also include your own custom application metrics. These come from custom counters that you can add using the addCounter method on the Aeron class.

To include these counters in data exported to Prometheus, you need to supply the Exporter with a list of the counters to include. You provide this by specifying a CSV file that contains the required data. The Insights Example includes a sample of this file (example-counters.csv). The fields are:

  • Type ID - The typeId value specified when creating the counter. This is how Insights will locate the counter. The type ID must be greater than 1000, as values below that are reserved for internal Aeron use.

  • Name - The name given to the metric in the exported data. Insights will apply a prefix custom_ to these counters to ensure they do not conflict with other metrics exported by Insights.

  • Help text - A description of the metric to aid users creating dashboards based on this data.

  • Type - Must be either COUNTER or GAUGE. Counters are expected to always increment, and will only reset on application restart. Gauges can increase or decrease in value.

The Prometheus data model documentation describes the rules around the format for metrics names and descriptions. They also provide best practices around naming and units.

Insights can only export one instance of a particular custom counter type. The Prometheus format supports having multiple instances by using labels, however this is not supported for custom counters.

Log levels

The Prometheus Exporters use Java Util Logging to produce output. You can configure the log levels by specifying a different configuration file using the java.util.logging.config.file system property. The default configuration is in the insights-logging.properties file, which is included in the exporter jars.

How to interpret the exported metrics

The Exporters produce metrics in the Prometheus Exposition format. This consists of a set of name-value pairs with some associated metadata. For details of how to interpret particular values, and suggested alerting rules, see the accompanying document on Aeron Metrics. Metrics in the Prometheus format look like this:

# HELP insights_timestamp_seconds Time at which metrics were collected
# TYPE insights_timestamp_seconds gauge
insights_timestamp_seconds 1.741857723923E9
# HELP driver_heartbeat_age_seconds Driver heartbeat age. How long it has been since the last media driver heartbeat.
# TYPE driver_heartbeat_age_seconds gauge
driver_heartbeat_age_seconds 0.099
# HELP driver_bytes_received_total Bytes received
# TYPE driver_bytes_received_total counter
driver_bytes_received_total 8960.0

The value will be a double. In some cases it may be in scientific notation. The Prometheus specification also permits the special values: NaN, +Inf, and -Inf.

The metric name states what the metric refers to. Metric names have a prefix indicating what produces the data for the metric. Specifically: `driver', `archive', `cluster', `insights' (for metrics produced by the Exporter itself), and `custom' (for custom application metrics).

The metadata consists of a type, which indicates if the metric is a counter or a gauge, and a help section that contains a human-readable description of the metric. The help text appears as a tooltip when using the metric explorer in Grafana.

Some metrics have multiple instances and are qualified with a label, which appears in braces after the metric name. These are for metrics that may appear multiple times within a single component, or for metrics that may be created by multiple components using the same media driver. For example, it is possible to run multiple Archives on the same media driver and each Archive will create the same set of metrics. These metrics will be qualified with the archiveId. Metrics may have multiple labels if they can appear multiple times within a single component, and there may be multiple components creating that metric. System metrics created by the media driver do not have a label, as they are guaranteed to appear at most once. Metrics with labels look like this:

# HELP driver_max_cycle_time_seconds Driver - Maximum time spent executing a duty cycle
# TYPE driver_max_cycle_time_seconds gauge
driver_max_cycle_time_seconds{agent="conductor"} 0.014726482
driver_max_cycle_time_seconds{agent="sender"} 0.00721809
driver_max_cycle_time_seconds{agent="receiver"} 0.007194907
# HELP archive_max_cycle_time_seconds Archive - Maximum time spent executing a duty cycle
# TYPE archive_max_cycle_time_seconds gauge
archive_max_cycle_time_seconds{archiveId="1",agent="archive-conductor"} 0.007397414
archive_max_cycle_time_seconds{archiveId="1",agent="archive-recorder"} 0.006716045
archive_max_cycle_time_seconds{archiveId="1",agent="archive-replayer"} 0.006411037

Labels correspond to the cardinality of the metric. If a metric appears once per archive, it has an archiveId label. If it appears once per cluster, it has a clusterId label. If it appears once per agent in an archive, it will have agent and archiveId labels. If it appears once per driver, it does not need a label.

The full set of metric cardinalities are:

  • per-driver

  • per-driver-agent

  • per-stream

  • per-archive

  • per-archive-agent, per-archive

  • per-cluster

  • per-clustered-service, per-cluster