We are about to enter a time for smoother sailing keeping your okctl environment and cluster up to date.

This will primarely be achived with the okctl-upgrade command (see : https://www.okctl.io/upgrading-okctl/)

However, to get to a point where you only need to run a single command for all your upgrade needs, there are some things to consider first (if you built your cluster on okctl < v0.0.79)

Loki persistence patch

This patch will make sure your loki logs are persisted even if the loki pod has to restart. Please note that you will lose your current logs when you run this patch, and that you need to back up anything you need to keep before you run it.

okctl venv -c mitt-cluster.yaml 

curl https://raw.githubusercontent.com/oslokommune/okctl-upgrade/main/gists/loki-persistent-storage/loki-pvc.yaml | kubectl apply -f -
helm -n monitoring upgrade --install loki grafana/loki -f https://raw.githubusercontent.com/oslokommune/okctl-upgrade/main/gists/loki-persistent-storage/loki-values.yaml

See the thread here: https://oslokommune.slack.com/archives/CV9EGL9UG/p1629718366030700

Resource quota patch, this one is critical

A patch was written before we had an upgrade system. It will apply resources quotas to pod, so that if for example Loki logs exceeds disk space available it will no longer bring down the cluster.

Run the following to apply the patch:

okctl venv -c cluster.yaml # replace with name of your cluster.yaml file

curl --silent https://raw.githubusercontent.com/oslokommune/okctl-upgrade/main/gists/resource-quota-patch/resource-quota-patch.sh > resource-patch.sh && bash resource-patch.sh

It is safe to answer yes on the "Are you comfortable losing your logs? [y/N] " question if you ran the loki persistence patch first. Anyway, you just backed up your logs first, like we suggested, right?

See the original slack thread here: https://oslokommune.slack.com/archives/CV9EGL9UG/p1630391547006400

This was discovered and the patch provided in August 2021, since then we have had two incidents of a cluster going down due the same error that the patch fixes. If in doubt, ask for help now and not tomorrow, we will help you over at #kjøremiljø-support.

Did you apply that patch yet? Your cluster might go down at any time, tick-tock.

Release v0.0.80 and some manual maintenance

If you read the okctl release notes and do as they suggest, skip this section. For the remaining 95% out there, the steps are gathered here for your convenience :)

As of okctl v0.0.80 we put the state.db in a remote S3 bucket to avoid nasty race-conditions occuring if for example an apply cluster is started from two places at once.

  1. Run okctl maintenance state-upload <path to state.db> to move the state.db file to a remote location.
    The state.db
    usually resides in <iac root>/infrastructure/<cluster name>/state.db
  2. Delete the relevant state.db file, commit and push the changes.

You can also have a look at the actual release notes here: https://github.com/oslokommune/okctl/releases/tag/v0.0.80

This is all good, but do I seriously need to read okctl release-notes and pay attention to mentions in the support channel to stay up to date?(!)

Short answer: no.

We realize that it has not been easy to keep track as a user of okctl and have decided to make things a bit more clear. From now on, all upgrades will be notified in a separate channel: #okctl-viktig.

We will also maybe nag you a bit more, if there are other critical upgrades released and we see that they are not being applied.

‌‌