Streams metrics

Understand the metrics provided by Streams

3 minute read

Introduction

Streams provides metrics in a Prometheus format when the monitoring parameters are enabled. These are meant to be scraped by a Prometheus server and visualized through Grafana on our custom dashboards.

Metrics types

Although we assume you are familiar with the Prometheus metrics format, here is a quick explanation of the main metrics types produced by micrometer and how to make use of them:

  • Gauge: A gauge is the simplest metric to use. Like the fuel gauge of a car, it is a value that goes up and down and has immediate meaning.

  • Counter: The more common metric type in Prometheus, a counter is a number that always goes up (that restriction makes it more reliable). It is usually differentiated into a rate to get an indication of how fast that number is going up.

  • Timer: A timer records task durations by keeping a counter of completed tasks, and a counter of their cumulated time. Utilize those metrics through ratios and rates.

Metrics

Custom metrics

The following custom metrics have been implemented. They include labels describing the service name, type, and unique identifier, as well as some custom labels when suitable.

Type Metric Description Example usage
Gauge streams_global_topics Total number of Streams topics
Gauge streams_global_persistent_subscriptions Total number of persistent subscriptions streams_global_persistent_subscriptions{subscription_status=“active”}
Gauge streams_topics Number of active topics on a Streams publisher streams_topics{streams_service=“publisher-http-poller”}
Gauge streams_active_subscriptions Number of active subscriptions on a Streams subscriber streams_topics{streams_service=“subscriber-sse”}
Counter streams_input_events_total Number of messages received by a Streams service rate(streams_input_events_total{streams_name=“streams-subscriber-sse”}[2m])
Counter streams_output_events_total Number of messages emitted by a Streams service rate(streams_input_events_total{data_type=“patch”}[2m])

The keyword “global” refers to the scope of the metrics. Global metrics are reported by all instances of the same type of service and will have the same value regardless of the service instance that reports them; whereas, the others only give information about the service instance which reported them.

Default Spring Boot metrics

Our frameworks provides additional metrics; however, we recommend you pay attention to the following timers:

  • http_client_request_seconds : timer of the web requests emitted by Streams (i.e., polling or webhook).

  • http_server_requests_seconds : timer of the web requests received and handled by Streams (i.e., REST API, self-health check, and http-post).

These are split into a sum of cumulated seconds spent waiting for the requests, a counter of the number of requests, and a gauge of the maximum request time over the latest period of time. For example, to get the average incoming requests’ processing time, you could plot:

rate(http_server_requests_seconds_sum[2m])/rate(http_server_requests_seconds_count[2m])

Monitoring

Regardless of basic system metrics, such as CPU usage or memory usage, Streams expose custom metrics that need to be monitored to prevent issues with your Streams installation:

  • Number of active subscriptions:
    • It must be under 1500 for each subscriber pods.
    • It is set by streams_active_subscriptions.
  • Rate of input events for Streams hub:
    • It must be under 500 events by seconds.
    • It is set by rate(streams_input_events_total{streams_service="hub", data_type="snapshot"}[2m]).
  • JVM memory used:
    • It must be lower than the total pod memory for each pod, otherwise they will be OOM kill after a short period of time.
    • It is respectively set by jvm_memory_used_bytes and container_memory_working_set_bytes.
Last modified December 14, 2021: Add Customize Installation section (#132) (80decca)