Skip to main content

System Monitoring

Overview

There are common reasons for system monitoring:

  • Performance optimization: monitoring of CPU and memory usage can help identify and tackle important performance issues. As an example, an abnormally high CPU may be a signal for a hardware upgrade or a software optimization.
  • Resource management: analyze your system demands to plan upscales and performance spikes.
  • System maintenance: get actionable data in time to ensure the system meets users requirements.
  • Security: monitoring of network metrics can help identify and eliminate potential security threats.

Aggregator can be configured to enable application and standard JVM metrics to monitor your system - refer to Description of Metrics to learn more. You can use Prometheus to collect and store, and Grafana to visualize them.

Step 1: Enable Metrics

To enable application and JVM metrics, configure Aggregator via Java System Properties or Aggregator admin.properties configuration file. When configured, you can access metrics via a special endpoint /agg/api/metrics (ex. http://localhost:port/agg/api/metrics) in the format compatible with Prometheus.

Java System Properties

To enable system metrics, add these variables to your docker or docker-compose:

# docker-compose.yml example

environment:
- JAVA_OPTS=
-DAggregator.metrics.enable=true # enable metrics for Aggregator
-DAggregator.metrics.enableJvmMetrics=true # enable JVM metrics (disabled by default)
-DAggregator.metrics.enableTomcatMetrics=true # enable Tomcat metrics (disabled by default)
-DAggregator.metrics.anonymousAccess=true # enable anonymous access to metrics endpoints (disabled by default)

admin.properties

To enable system metrics, specify the following parameters in admin.properties configuration file:

   Aggregator.metrics.enable=true               # enable metrics for Aggregator
Aggregator.metrics.enableJvmMetrics=true # enable JVM metrics (disabled by default)
Aggregator.metrics.enableTomcatMetrics=true # enable Tomcat metrics (disabled by default)
Aggregator.metrics.anonymousAccess=true # enable anonymous access to metrics endpoints (disabled by default)
info

By default, metrics endpoints follow the configured security policy. Note that for remote access to metrics endpoints, you must also enable QuantServer.enableRemoteAccess=true. Refer to admin properties, User Access Control and Aggregator config to learn more.

Kubernetes Configuration

To enable all types of metrics in Kubernetes:

Enable serviceMonitor in Aggregator values.yaml. Note, that this service is enabled by default.

# include this in Aggregator values.yaml to enable system metrics monitoring
serviceMonitor:
enabled: true
namespace: monitoring
interval: "30s"
labels:
monitoring: application
info

Refer to Deployment in Kubernetes to learn more.

Step 2: Get Metrics

You can use Prometheus to collect and store system metrics. To periodically scrape metrics from Aggregator, add the Aggregator endpoint /agg/api/metrics to metrics_path section in Prometheus configuration file:

# Example of configuration for Prometheus

global:
scrape_interval: 1s # By default, scrape targets every 15 seconds.

# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'external-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'Aggregator'

metrics_path: /agg/api/metrics

# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 1s

static_configs:
- targets: ['localhost:8011']
info

To learn how to install and configure Prometheus, refer to Prometheus Installation and Prometheus Getting Started.

Step 3: Use Metrics

You can visualize the collected system metrics in Grafana dashboards:

  1. Install Grafana

  2. Install and configure Prometheus data source

  3. Add and configure Grafana dashboards

This is a sample illustration of the JVM Actuator Grafana dashboard with Aggregator JVM metrics:

Disable JVM Metrics

There are two ways to disable JVM metrics, depending on the method you used to enable them:

Java System Properties

To disable JVM metrics, add these variables to your docker or docker-compose file:

# docker-compose.yml example

environment:
- JAVA_OPTS=
-DAggregator.metrics.disableJvmMetrics=true # enable or disable JVM metrics

admin.properties

To disable JVM metrics, specify the following parameters in admin.properties configuration file:

Aggregator.metricsService=QSMetricsServiceInfo
Aggregator.metricsService.disableJvmMetrics=true # enable or disable JVM metrics

Description of Metrics

Aggregator metrics
Aggregator Application MetricDescription
aggregator_processes_deployedThe number of configured data connectors.
aggregator_processes_activeThe number of successfully running connectors.
aggregator_processes_failedThe number of failed connectors.
List of usable JVM Metrics
JVM MetricDescription
jvm_memory_committed_bytesThe total memory (in bytes) in the Java virtual machine runtime.
jvm_memory_used_bytesThe free memory (in bytes) in the Java virtual machine runtime.
jvm_memory_max_bytesThe total memory (in bytes) in the Java virtual machine runtime.
process_cpu_utilizationThe CPU Usage (in percent) of the Java virtual machine.
process_uptime_secondsThe amount of time (in seconds) that the Java virtual machine was running.
jvm_gc_totalThe number of garbage collection calls.
jvm_gc_interval_time_seconds_totalThe total time (in seconds) that elapsed between garbage collection calls.
jvm_gc_interval_totalThe total number of intervals between garbage collections calls. Per server Counter.
jvm_gc_duration_seconds_totalThe total time consumed, in seconds in garbage collection.
jvm_threads_daemon_threadsThe current number of live daemon threads.
jvm_threads_live_threadsThe current number of live threads including both daemon and non-daemon threads.
jvm_threads_peak_threadsThe peak live thread count since the Java virtual machine started or peak was reset.
jvm_classes_loaded_classesThe number of classes that are currently loaded in the Java virtual machine.
jvm_classes_unloaded_classes_totalThe total number of classes unloaded since the Java virtual machine has started execution.
process_files_open_filesThe open file descriptor count.