ArgoCD is not working, but I need to deploy stuff

If ArgoCD for whatever reason doesn't work, you can apply changes yourself, the same way ArgoCD does it.

okctl venv -c my-cluster.yaml
kustomize build infrastructure/applications/my-app/overlays/my-cluster/argocd-applicaiton.yaml | kubectl apply -f -

I get an error message containing "ssh" and/or "knownhosts"

Some examples:

Error: synchronizing declaration with state: reconciling nameserver delegation: initiating dns zone delegation: staging repository: cloning repository: knownhosts: /home/x/.ssh/known_hosts:10: illegal base64 data at input byte 140
Error: synchronizing declaration with state: reconciling nameserver delegation: initiating dns zone delegation: staging repository: cloning repository: ssh: handshake failed: knownhosts: key mismatch
Error: synchronizing declaration with state: reconciling nameserver delegation: initiating dns zone delegation: staging repository: cloning repository: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

Possible solution: known_hosts contains something invalid

mv ~/.ssh/known_hosts{,.bak}
ssh-keyscan github.com > ~/.ssh/known_hosts

Possible solution: SSH agent doesn't know about private key for your IAC repository

ssh-add ~/.ssh/<relevant private key>

When running okctl upgrade, I get Body: Not found

Example

Error: upgrading: parsing upgrade binaries: validating release: fetching checksums: downloading checksum file: http call did not return status OK. URL: https://github.com/oslokommune/okctl-upgrade/releases/download/untagged-f95a8beee472bebe3a19/okctl-upgrade-checksums.txt. Status: 404 Not Found. Body: Not Found

This can happen if you are running an upgrade at the same time we're in the middle of uploading a release. The solution is to just wait for 10 minutes and try again.

Specifically, if there is any instances of "goreleaser" running on https://github.com/oslokommune/okctl-upgrade/actions, they must be completed before you attempt upgrading again.

This bug is tracked in issue: https://trello.com/c/xYHF2vVe/596-okctl-upgrade-fails-if-run-at-the-same-time-as-were-making-a-release


I get an error message with "getting existing cluster <cluster-name>: not found"

This can be an indication that your state.db was not found AWS S3. If you have just updated okctl from 0.0.79 or lower, you need to:

  1. Run okctl maintenance state-upload <path-to-state.db> to move the state.db file to a remote location.
    The state.db usually resides in /infrastructure/<cluster-name>/state.db
  2. Delete the relevant state.db file, commit and push the changes.

See release notes for 0.0.80


Okctl keeps trying to do the Github Device Authentication Flow while trying to do <any action>

This is known to happen if pass init <gpg-key-id> has not been run after installing pass.


okctl forward postgres fails on applying security group policy

The command okctl forward postgres fails with an error

Error: applying security group policy: SecurityGroupPolicy.vpcresources.k8s.aws "xxxxxxxxx-pgbouncer-" is invalid: metadata.name: Invalid value: "xxxxxxxxx-pgbouncer-": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

This is due to a bug.

Workaround:

  • Open ~/.okctl/conf.yml
  • Make sure the username is set to your username, like this:
user:
    id: ... # (Don't edit this)
    username: ooo123456 # Replace with your username

On Okctl delete cluster, some resources are not deleted (automatic deletion is coming in a later version)

Workaround: manually delete the following resources, as described in Delete cluster.

It is recommended to delete the infrastructure directory in your IAC-repository as the last manual step.


Okctl create cluster: Create identity pool fails / Re-create cluster within short timespan fails

If you do the following:

  • Create a cluster
  • Delete it
  • Create a new cluster with the same domain name (e.g. whatever.oslo.systems)

This might fail if you do these steps within 15 minutes. This is due to DNS resolvers caching NS server records.
More details: https://github.com/oslokommune/okctl/pull/231

Workaround: Wait for up to 15 minutes before creating the cluster again.

15 minutes is the TTL (Time to live, i.e. cache expiry) of the NS record. You can see this value in Route 53-> Hosted zones -> [Your domain] -> [NS record for your top domain] -> Edit -> See TTL field.


Okctl create cluster: Failed to create external secrets helm chart

You get the following error (shortened):

..  creating: external-secrets (elapsed: 1 second 76 microseconds)WARN[0007] failed to process request, because: failed to create external secrets helm chart: failed to update repository: failed to fetch https://kubernetes-charts-incubator.storage.googleapis.com/index.yaml : 403 Forbidden  endpoint=create service=helm/externalSecrets
✓   creating
Error:
....
request failed with Internal Server Error, because: failed to create external secrets helm chart: failed to update repository: failed to fetch https://kubernetes-charts-incubator.storage.googleapis.com/index.yaml : 403 Forbidden

This happens because Helm changed URLs to their repositories. Update your ~/.okctl/helm/repositories.yaml, and update URLs from:

Name Old Location New Location
stable https://kubernetes-charts.storage.googleapis.com https://charts.helm.sh/stable
incubator https://kubernetes-charts-incubator.storage.googleapis.com https://charts.helm.sh/incubator

Okctl apply cluster: Always prompts for GitHub machine authentication, even after it has been set

There is an issue with some versions of pinentry-curses where sometimes the prompt to enter a password for your PGP key will not appear. We store the authentication token in a keyring, and since it cannot be decrypted without the password Okctl just skips ahead. The solution is to export the following environment variable:

GPG_TTY=$(tty)
export GPG_TTY

This can be done in your current shell before you run Okctl commands or can be put in your .bashrc or similar to ensure you will always be prompted for your
encryption key password. A bit more detailed explanation can be found on StackOverflow.


Okctl is expecting an oslokommune-boundary to be present, but it's missing

You're probably trying to create an okctl cluster on a Crayon account. We've yet to adapt okctl to work on the new
accounts, so until then you can run the following command to create a dummy policy in the new account.

aws iam create-policy \
  --policy-name oslokommune-boundary \
  --path /oslokommune/ \
  --policy-document "{\"Version\": \"2012-10-17\", \"Statement\": [ {\"Sid\": \"AllowAccessToAllServices\", \"Effect\": \"Allow\", \"NotAction\": [\"iam:CreateUser\"], \"Resource\": \"*\"}]}"