With Okctl we create an observability stack in the cluster that provides metrics, traces, and logs from the Kubernetes cluster, relevant AWS resources, and the applications running in the cluster.
The observability stack relies on Grafana at its core. Using the data sources capability of Grafana we can easily integrate with a variety of backends. The ones we currently support are:
- AWS CloudWatch for AWS resources, including EKS control plane logs
- Loki for logs
- Prometheus for metrics
- Tempo for traces
These backends provide us with the basic building blocks we required to build a fully functional observability stack.
We love declarative configuration, being able to check everything into git is the best thing to happen since sliced bread. We use declarative configuration to add dashboards, alerts, and scrapers to Grafana and Prometheus. By using
ConfigMaps with annotations to add dashboards to Grafana we can easily track these resources in our version control system also, similarly for
ServiceMonitor type for Prometheus.
We will eventually use the AlertManager for setting up alerts, feel free to do so now, but we haven't started looking at this in-depth yet.
NB: We only support prometheus for metrics currently, as such, you need to ensure that your application has a metrics endpoint that can be scraped by the
ServiceMonitor you setup.
We integrate Prometheus into the cluster by using kube-prometheus, where prometheus-operator is used to automatically start scraping an application for metrics.
The full list of available CustomResourceDefinitions provides a good overview of the capabilities provided by this operator.
Configuring Prometheus with application.yaml
To enable Prometheus scraping with an
application.yaml, uncomment the following lines and run
okctl apply application:
kind: Application ... prometheus: path: /metrics
Configuring Prometheus manually
For setting up monitoring of your application, we recommend reading this guide. The most relevant part is the setup of the
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app labels: team: frontend spec: selector: matchLabels: app: example-app endpoints: - port: web
Once you have setup a
ServiceMonitor for your application you can login to the
Grafana dashboard and use the query explorer using the
Prometheus datasource and start searching for the metrics you have defined.
We scrape the logs of all pods in the Kubernetes cluster using Promtail and send these on to Loki. For details on how best to set up your logging for use with Loki, we recommend reading the documentation. Essentially, these logs will be available and can be queried from the query explorer in Grafana when the Loki data source is selected.
Tempo receives the traces from your application and makes them available for querying. For more details on how to use Tempo, we recommend reading the documentation.
We have set up Grafana for you, and multiple data sources. Once you have found the metrics, traces, or logs you are interested in following you can set up a dashboard for easy viewing. The easiest way of creating a dashboard is through the UI. Once you are satisfied with the result, you can export it as JSON and add it to Grafana via a declarative config.
We achieve this because we have enabled a sidecar for dashboards. In essence, you define a
ConfigMap like so:
apiVersion: v1 kind: ConfigMap metadata: name: sample-grafana-dashboard labels: grafana_dashboard: "1" data: k8s-dashboard.json: |- [...]
For more inspiration, you can take a look at some of the default dashboards.
The important part is the
grafana_dashboard label, also, please remember that the name of the dashboard, e.g.:
k8s-dashboard.json needs to be unique, if you use the same name everywhere they will overwrite each other.
- Collecting logs, metrics, and traces
- Setting up alarms and alerts.