With Okctl we create an observability stack in the cluster that provides metrics, traces, and logs from the Kubernetes cluster, relevant AWS resources, and the applications running in the cluster.

Observability stack

The observability stack relies on Grafana at its core. Using the data sources capability of Grafana we can easily integrate with a variety of backends. The ones we currently support are:

These backends provide us with the basic building blocks we required to build a fully functional observability stack.

Declarative configuration

We love declarative configuration, being able to check everything into git is the best thing to happen since sliced bread. We use declarative configuration to add dashboards, alerts, and scrapers to Grafana and Prometheus. By using ConfigMaps with annotations to add dashboards to Grafana we can easily track these resources in our version control system also, similarly for ServiceMonitor type for Prometheus.

Alerting

We will eventually use the AlertManager for setting up alerts, feel free to do so now, but we haven't started looking at this in-depth yet.

Prometheus

NB: We only support prometheus for metrics currently, as such, you need to ensure that your application has a metrics endpoint that can be scraped by the ServiceMonitor you setup.

We integrate Prometheus into the cluster by using kube-prometheus, where prometheus-operator is used to automatically start scraping an application for metrics.

The full list of available CustomResourceDefinitions provides a good overview of the capabilities provided by this operator.

Configuring Prometheus with application.yaml

To enable Prometheus scraping with an application.yaml, uncomment the following lines and run okctl apply application:

kind: Application
...
prometheus:
  path: /metrics

Configuring Prometheus manually

For setting up monitoring of your application, we recommend reading this guide. The most relevant part is the setup of the ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

Once you have setup a ServiceMonitor for your application you can login to the Grafana dashboard and use the query explorer using the Prometheus datasource and start searching for the metrics you have defined.

Loki

We scrape the logs of all pods in the Kubernetes cluster using Promtail and send these on to Loki. For details on how best to set up your logging for use with Loki, we recommend reading the documentation. Essentially, these logs will be available and can be queried from the query explorer in Grafana when the Loki data source is selected.

Tempo

Tempo receives the traces from your application and makes them available for querying. For more details on how to use Tempo, we recommend reading the documentation.

Grafana

We have set up Grafana for you, and multiple data sources. Once you have found the metrics, traces, or logs you are interested in following you can set up a dashboard for easy viewing. The easiest way of creating a dashboard is through the UI. Once you are satisfied with the result, you can export it as JSON and add it to Grafana via a declarative config.

We achieve this because we have enabled a sidecar for dashboards. In essence, you define a ConfigMap like so:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-grafana-dashboard
  labels:
     grafana_dashboard: "1"
data:
  k8s-dashboard.json: |-
  [...]

For more inspiration, you can take a look at some of the default dashboards.

The important part is the grafana_dashboard label, also, please remember that the name of the dashboard, e.g.: k8s-dashboard.json needs to be unique, if you use the same name everywhere they will overwrite each other.

Roadmap

  • Collecting logs, metrics, and traces
  • Setting up alarms and alerts.