System Monitoring
Overview
There are common reasons for system monitoring:
- Performance optimization: monitoring of CPU and memory usage can help identify and tackle important performance issues. As an example, an abnormally high CPU may be a signal for a hardware upgrade or a software optimization.
- Resource management: analyze your system demands to plan upscales and performance spikes.
- System maintenance: get actionable data in time to ensure the system meets users requirements.
- Security: monitoring of network metrics can help identify and eliminate potential security threats.
Aggregator can be configured to enable application and standard JVM metrics to monitor your system - refer to Description of Metrics to learn more. You can use Prometheus to collect and store, and Grafana to visualize them.
Step 1: Enable Metrics
To enable application and JVM metrics, configure Aggregator via Java System Properties or Aggregator admin.properties configuration file. When configured, you can access metrics via a special endpoint /agg/api/metrics (ex. http://localhost:port/agg/api/metrics) in the format compatible with Prometheus.
Java System Properties
To enable system metrics, add these variables to your docker or docker-compose:
- Version 5.6
- Version 5.5
# docker-compose.yml example
environment:
- JAVA_OPTS=
-DAggregator.metrics.enable=true # enable metrics for Aggregator
-DAggregator.metrics.enableJvmMetrics=true # enable JVM metrics (disabled by default)
-DAggregator.metrics.enableTomcatMetrics=true # enable Tomcat metrics (disabled by default)
-DAggregator.metrics.anonymousAccess=true # enable anonymous access to metrics endpoints (disabled by default)
admin.properties
To enable system metrics, specify the following parameters in admin.properties configuration file:
Aggregator.metrics.enable=true # enable metrics for Aggregator
Aggregator.metrics.enableJvmMetrics=true # enable JVM metrics (disabled by default)
Aggregator.metrics.enableTomcatMetrics=true # enable Tomcat metrics (disabled by default)
Aggregator.metrics.anonymousAccess=true # enable anonymous access to metrics endpoints (disabled by default)
info
By default, metrics endpoints follow the configured security policy.
Note that for remote access to metrics endpoints, you must also enable QuantServer.enableRemoteAccess=true.
Refer to admin properties, User Access Control and Aggregator config to learn more.
# docker-compose.yml example
environment:
- JAVA_OPTS=
-DAggregator.metrics.enable=true # enable both application and JVM metrics
admin.properties
To enable system metrics, specify the following parameters in admin.properties configuration file:
Aggregator.enableMetrics=true # enable both application and JVM metrics
Kubernetes Configuration
To enable all types of metrics in Kubernetes:
Enable serviceMonitor in Aggregator values.yaml. Note, that this service is enabled by default.
# include this in Aggregator values.yaml to enable system metrics monitoring
serviceMonitor:
enabled: true
namespace: monitoring
interval: "30s"
labels:
monitoring: application
info
Refer to Deployment in Kubernetes to learn more.
Step 2: Get Metrics
You can use Prometheus to collect and store system metrics. To periodically scrape metrics from Aggregator, add the Aggregator endpoint /agg/api/metrics to metrics_path section in Prometheus configuration file:
# Example of configuration for Prometheus
global:
scrape_interval: 1s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'external-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'Aggregator'
metrics_path: /agg/api/metrics
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 1s
static_configs:
- targets: ['localhost:8011']
info
To learn how to install and configure Prometheus, refer to Prometheus Installation and Prometheus Getting Started.
Step 3: Use Metrics
You can visualize the collected system metrics in Grafana dashboards:
Install and configure Prometheus data source
Add and configure Grafana dashboards
This is a sample illustration of the JVM Actuator Grafana dashboard with Aggregator JVM metrics:
Disable JVM Metrics
There are two ways to disable JVM metrics, depending on the method you used to enable them:
Java System Properties
To disable JVM metrics, add these variables to your docker or docker-compose file:
# docker-compose.yml example
environment:
- JAVA_OPTS=
-DAggregator.metrics.disableJvmMetrics=true # enable or disable JVM metrics
admin.properties
To disable JVM metrics, specify the following parameters in admin.properties configuration file:
Aggregator.metricsService=QSMetricsServiceInfo
Aggregator.metricsService.disableJvmMetrics=true # enable or disable JVM metrics
Description of Metrics
Aggregator metrics
| Aggregator Application Metric | Description |
|---|---|
| aggregator_processes_deployed | The number of configured data connectors. |
| aggregator_processes_active | The number of successfully running connectors. |
| aggregator_processes_failed | The number of failed connectors. |
List of usable JVM Metrics
| JVM Metric | Description |
|---|---|
| jvm_memory_committed_bytes | The total memory (in bytes) in the Java virtual machine runtime. |
| jvm_memory_used_bytes | The free memory (in bytes) in the Java virtual machine runtime. |
| jvm_memory_max_bytes | The total memory (in bytes) in the Java virtual machine runtime. |
| process_cpu_utilization | The CPU Usage (in percent) of the Java virtual machine. |
| process_uptime_seconds | The amount of time (in seconds) that the Java virtual machine was running. |
| jvm_gc_total | The number of garbage collection calls. |
| jvm_gc_interval_time_seconds_total | The total time (in seconds) that elapsed between garbage collection calls. |
| jvm_gc_interval_total | The total number of intervals between garbage collections calls. Per server Counter. |
| jvm_gc_duration_seconds_total | The total time consumed, in seconds in garbage collection. |
| jvm_threads_daemon_threads | The current number of live daemon threads. |
| jvm_threads_live_threads | The current number of live threads including both daemon and non-daemon threads. |
| jvm_threads_peak_threads | The peak live thread count since the Java virtual machine started or peak was reset. |
| jvm_classes_loaded_classes | The number of classes that are currently loaded in the Java virtual machine. |
| jvm_classes_unloaded_classes_total | The total number of classes unloaded since the Java virtual machine has started execution. |
| process_files_open_files | The open file descriptor count. |