How to Monitor Your VKE Cluster with tobs
Introduction
tobs - The Observability Stack for Kubernetes is a Kubernetes monitoring stack that stores metrics in Prometheus, visualize them with Grafana, and store metrics in a TimescaleDB as long-term storage. Some key components of tobs are:
- Prometheus to collect metrics
- Grafana to visualize metrics from Prometheus
- Promscale to store metrics in long-term storage and allow analysis with both PromQL and SQL
- TimescaleDB as the long-term database for Promscale
- Other useful components such as AlertManager, Node-Exporter, Kube-State-Metrics, and so on.
This tutorial explains how to:
- Install tobs in your Vultr Kubernetes Engine (VKE) cluster
- Use Prometheus to store your metrics in a long-term database
- Use Vultr Object Storage as the backup for your metrics database
- Use Grafana to visualize your metrics
- Access the database and perform SQL queries for complex analysis on metrics
Prerequisites
Create Vultr Kubernetes Cluster
- Create a new VKE cluster with at least two 4GB nodes. 2GB nodes are not recommended for the tobs stack.
- Download the configuration and configure
kubectl
on your local machine
Create Vultr Object Storage
- Create a new Vultr Object Storage to use as an S3-compatible backup for TimescaleDB.
- Create a new bucket inside your Vultr Object Storage. The bucket name is
demo-tobs
in this tutorial.
Install tobs
Check your Kubernetes version.
$ kubectl get nodes
The result is like:
NAME STATUS ROLES AGE VERSION node-02e926515d59 Ready <none> 2m20s v1.22.6 node-85a58c1750d1 Ready <none> 2m17s v1.22.6
Check the compatibility matrix to see which tobs version supports your Kubernetes version.
Install tobs with Helm.
Using tobs to install a full observability stack with openTelemetry support requires installation of cert-manager. To do install it please follow the cert-manager documentation.
cert-manager is not required when using tobs with opentelemetry support disabled.
The following command will install Kube-Prometheus, TimescaleDB, OpenTelemetry Operator, and Promscale into your Kubernetes cluster:
$ helm repo add timescale https://charts.timescale.com/ $ helm repo update $ helm install --wait <release_name> timescale/tobs
> Note: --wait flag is necessary for successful installation as tobs helm chart can create opentelemetry Custom Resources only after opentelemetry-operator is up and running. This flag can be omitted when using tobs without opentelemetry support.
To deploy tobs on your Vultr Kubernetes cluster, you need to customize the configuration of the tobs stack. Run the following command to create a configuration file named values.yaml
$ helm show values timescale/tobs > my_values.yml
Edit the
values.yaml
with your text editor and replacestorage: 8Gi
withstorage: 10Gi
. This change is essential because Vultr Block Storage requires at least 10GB in volume size.Under
timescaledb-single
, you can change the storage volume fromsize: 150Gi
to a smaller number to optimize the resource cost.Install the tobs stack with S3 backup with the following command. You should enter the information of your Vultr Object Storage (bucket name, hostname, key, and secret key)
$ helm upgrade --wait --install <release_name> --values my_values.yml timescale/tobs
Your output should look like this. When prompted for the S3 credentials, use the values from your Vultr Object Storage bucket.
WARNING: Using a generated self-signed certificate for TLS access to TimescaleDB. This should only be used for development and demonstration purposes. To use a signed certificate, use the "--tls-timescaledb-cert" and "--tls-timescaledb-key" flags when issuing the tobs install command. Creating TimescaleDB tobs-certificate secret Creating TimescaleDB tobs-credentials secret We'll be asking a few questions about S3 buckets, keys, secrets and endpoints. For background information, visit these pages: Amazon Web Services: - https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html - https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html Digital Ocean: - https://developers.digitalocean.com/documentation/spaces/#aws-s3-compatibility Google Cloud: - https://cloud.google.com/storage/docs/migrating#migration-simple What is the name of the S3 bucket? demo-tobs What is the name of the S3 endpoint? (leave blank for default) ewr1.vultrobjects.com What is the region of the S3 endpoint? (leave blank for default) What is the S3 Key to use? NBWOK-redacted-1IMQSU What is the S3 Secret to use? tdCV0vN-redacted-Kx3OIVy Creating TimescaleDB tobs-pgbackrest secret Installing The Observability Stack
The installation takes a few minutes to complete. You can check the progress with the
kubectl get pods
command. Keep in mind that you can see manyCrashLoopBackOff
statuses during the deployments.The following output of
kubectl get pods
indicates a successful installation. If you have any errors, check the Troubleshooting section at the end of this tutorial.NAME READY STATUS RESTARTS AGE alertmanager-tobs-kube-prometheus-alertmanager-0 2/2 Running 0 2m42s prometheus-tobs-kube-prometheus-prometheus-0 2/2 Running 0 2m42s tobs-grafana-95df94dc9-qpnsm 3/3 Running 4 (95s ago) 2m54s tobs-grafana-db--1-g7pz7 0/1 Completed 4 2m53s tobs-kube-prometheus-operator-d65c55845-xjz6h 1/1 Running 0 2m54s tobs-kube-state-metrics-56c568fdcc-6f8hp 1/1 Running 0 2m54s tobs-prometheus-node-exporter-2v74v 1/1 Running 0 2m54s tobs-prometheus-node-exporter-fj7x9 1/1 Running 0 2m54s tobs-promlens-7f778cc958-wb9dj 1/1 Running 0 2m54s tobs-promscale-57497c97cf-htv2b 1/1 Running 4 (119s ago) 2m54s tobs-timescaledb-0 2/2 Running 3 (40s ago) 2m53s
Configuration
All component configuration happens through the helm values.yaml
file. You can view the self-documenting default values.yaml in the project repository. See more documentation about individual configuration settings in the Helm chart docs.
Troubleshooting
Get all events:
$ kubectl get events --sort-by='.metadata.creationTimestamp'
Describe any specific pod to find the problem:
$ kubectl describe pod <pod name>
View logs of any pod:
$ kubectl logs <pod name>
Promscale pod can have a CrashLoopBackOff
error due to failed password authentication. Here are the steps to resolve the problem:
View the log of the Promscale pod. Replace the pod name with your pod name
$ kubectl logs tobs-promscale-f984f8bf7-t4jts
You may see the following error:
password authentication failed for user \"postgres\" (SQLSTATE 28P01))
Get the password to access the database from
tobs-credentials
secret.$ kubectl get secrets tobs-credentials -o jsonpath="{.data.PATRONI_SUPERUSER_PASSWORD}"
Edit password stored inside
tobs-promscale
secret where the key isPROMSCALE_DB_PASSWORD
.$ kubectl edit secrets tobs-promscale
Delete the Promscale pod. Replace the pod name with your pod name.
$ kubectl delete pod tobs-promscale-f984f8bf7-jlxw8