Maintaining an Openappstack cluster

Logging

Logs from pods and containers can be read in different ways:

  • In the cluster filesystem at /var/log/pods/ or /var/logs/containers/.
  • Using kubectl logs.
  • Querying aggregated logs with grafana, see below.

Central log aggregation

We use promtail, Loki and grafana for easy access of aggregated logs. The Loki documentation is a good starting point how this setup works, and the Using Loki in Grafana gets you started with querying your cluster logs with grafana.

You will find the loki grafana integration on your cluster at https://grafana.oas.example.org/explore together with some generic query examples.

LogQL query examples

Please also refer to the LogQL documentation.

Query all apps for errors etc:

{job=~".*"} |~ "error|fail|exception|fatal"
{job=~".*"} |= "level=error"

Flux

Flux is responsible for installing applications. It used helm-operator to deploy the desired helm releases.

Query all messages from flux:

{app="flux"}

Query all messages from flux and helm-operator:

{app=~"(flux|helm-operator)"}

flux messages containing wordpress:

{app = "flux"} |= "wordpress"

flux messages containing wordpress without unchanged events (to only show the installation messages):

{app = "flux"} |= "wordpress" != "unchanged"

Filter out redundant flux messages:

{ app = "flux" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)"

Debug oauth2 single sign-on with rocketchat:

{container_name=~"(hydra|rocketchat)"}

Cert-manager

Cert manager is responsible for requesting Let’s Encrypt TLS certificates.

Query cert-manager messages containing chat:

{app="cert-manager"} |= "chat"

Hydra

Hydra is the single sign-on system.

Show only warnings and errors from hydra:

{container_name="hydra"} != "level=info"

Backup

On your provisioning machine

During the installation process, a cluster config directory is created on your provisioning machine, located in the top-level sub-directory clusters in your clone of the openappstack git repository. You may want to back this up. It contains all files generated during the create and install commands of the CLI, together with the generated secrets that are stored during installation.

On your cluster

OpenAppStack supports using the program Velero to make backups of your OpenAppStack instance to external storage via the S3 API. See the installation instructions for setup details. By default this will make nightly backups of the entire cluster (minus Prometheus data). To make a manual backup, run

velero create backup BACKUP_NAME --exclude-namespaces velero --wait

from your VPS. See velero --help for other commands, and Velero’s documentation for more information.

Note: in case you want to make an (additional) backup of application data via alternate means, all persistent volume data of the cluster are stored in directories under /var/lib/OpenAppStack/local-storage.

Restore

Restore instructions will follow, please reach out to us if you need assistance.

Change the IP of your cluster

In case your cluster needs to migrate to another IP, make sure to update the IP address in /etc/rancher/k3s/k3s.yaml and, if applicable, your local kube config.

Delete evicted pods

In case your cluster disk usage is over 80%, kubernetes taints the node with DiskPressure. Then it tries to evict pods, which is pointless in a single node setup but still happened anyway. Sometimes hundreds of pods will end up in evicted state but still showed up after DiskPressure recovered. See also the out of resource handling with kubelet documentation.

You can delete all evicted pods with this:

kubectl get pods --all-namespaces -ojson | jq -r '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1'