
Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large volumes of data across multiple commodity servers. It's distributed architecture avoids single points of failure and enables horizontal scalability. Cassandra excels at write-heavy workloads and offers high write and read throughput, making it ideal for data-intensive applications. It also provides tunable consistency, accommodating varying data consistency needs.
K8ssandra is an open-source project that simplifies the deployment and management of Apache Cassandra on Kubernetes. It includes the K8ssandra Operator, which automates tasks such as cluster provisioning, scaling, backups, and repairs.
This article explains how to deploy a multi-node Apache Cassandra cluster on a Kubernetes Engine using the K8ssandra Operator.
Prerequisites
Before you begin, you need to:
- Have access to a Kubernetes Engine with at least
4
nodes and4 GB
RAM per node. - Have access to an Ubuntu Server as a non-root user with sudo privileges to use as the control pane.
- Install Kubectl on your workstation.
- Install Helm Package Manager on your workstation.
- Ensure you have your cluster Kube config file and have configured kubectl on your workstation to connect to the cluster.
Install Cassandra CLI (cqlsh
)
The Cassandra CLI tool cqlsh
is a Python-based command-line utility used to interact with Cassandra databases. Follow the steps below to install it.
Update the server package index.
console$ sudo apt update
Install Python and Pip.
console$ sudo apt install -y python3 python3-pip python3.12-venv
View the installed Python version.
console$ python3 --version
Your output should be similar to the one below:
Python 3.12.7
View the installed Pip version.
console$ pip --version
Your output should be similar to the one below:
pip 24.2 from /usr/lib/python3/dist-packages/pip (python 3.12)
Create a python virtual environment.
console$ python3 -m venv cassandra
Activate the python virtual environment.
console$ source cassandra/bin/activate
Install the latest version of the
cqlsh
command-line interface:console$ pip install -U cqlsh
View the installed
cqlsh
version.console$ cqlsh --version
Your output should be similar to the one below:
cqlsh 6.2.0
Install Cert-Manager
Cert-Manager is a Kubernetes operator that manages and issues TLS/SSL certificates within a cluster from trusted authorities such as Let's Encrypt. K8ssandra uses cert-manager
to automate certificate management within a Cassandra clusters. This includes creating the Java key-stores and trust-stores needed from the certificates. Follow the steps in this section to install the cert-manager
resources required by the K8ssandra Operator.
Using Helm, add the Cert-Manager Helm repository to your local repositories.
console$ helm repo add jetstack https://charts.jetstack.io
Update the local Helm charts index.
console$ helm repo update
Install Cert-Manager to your VKE cluster.
console$ helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --set crds.enabled=true
When successful, verify that all Cert-Manager resources are available in the cluster.
console$ kubectl get all -n cert-manager
Your output should be similar to the one below:
NAME READY STATUS RESTARTS AGE pod/cert-manager-cainjector-686546c9f7-m9gp7 1/1 Running 0 43s pod/cert-manager-d6746cf45-sjjs6 1/1 Running 0 43s ... NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cert-manager ClusterIP 10.110.17.176 <none> 9402/TCP 44s ... NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cert-manager 1/1 1 1 43s ... NAME DESIRED CURRENT READY AGE replicaset.apps/cert-manager-cainjector-686546c9f7 1 1 1 43s ...
Install the K8ssandra Operator
To manage Apache Cassandra clusters on Kubernetes, install the K8ssandra Operator using Helm. Follow the steps below to install the operator.
Add the K8ssandra operator repository to your Helm sources.
console$ helm repo add k8ssandra https://helm.k8ssandra.io/stable
Install the K8ssandra operator in your cluster.
console$ helm install k8ssandra-operator k8ssandra/k8ssandra-operator \ --namespace k8ssandra-operator \ --create-namespace
Wait a few minutes and view the cluster deployment to verify that the K8ssandra operator is available.
console$ kubectl -n k8ssandra-operator get deployment
Your output should look like the one below:
NAME READY UP-TO-DATE AVAILABLE AGE k8ssandra-operator 1/1 1 1 20s k8ssandra-operator-cass-operator 1/1 1 1 20s
Verify that the K8ssandra operator pods are ready and running.
console$ kubectl get pods -n k8ssandra-operator
Your output should look like the one below:
NAME READY STATUS RESTARTS AGE k8ssandra-operator-65b9c7c9c-km28b 1/1 Running 0 46s k8ssandra-operator-cass-operator-54845bc4f6-hsqds 1/1 Running 0 46s
Set Up a Multi-Node Apache Cassandra Cluster on Kubernetes Engine
Use the K8ssandra Operator to deploy a highly available Cassandra cluster on Kubernetes Engine. Follow the steps below to set up Apache Cassandra Cluster.
Check available StorageClasses in your cluster.
console$ kubectl get storageclass
Your output should look like the one below:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE vultr-block-storage (default) block.csi.vultr.com Delete Immediate true 6m24s vultr-block-storage-hdd block.csi.vultr.com Delete Immediate true 6m25s vultr-block-storage-hdd-retain block.csi.vultr.com Retain Immediate true 6m25s ...
Using a text editor such as
nano
, create a new manifest filecluster.yaml
.console$ nano cluster.yaml
Add the following contents to the file. Replace
vultr-block-storage
with the available StorageClass name in your cluster (as listed in the previous step).yamlapiVersion: k8ssandra.io/v1alpha1 kind: K8ssandraCluster metadata: name: demo spec: cassandra: serverVersion: "4.0.1" datacenters: - metadata: name: dc1 size: 3 storageConfig: cassandraDataVolumeClaimSpec: storageClassName: vultr-block-storage accessModes: - ReadWriteOnce resources: requests: storage: 10Gi config: jvmOptions: heapSize: 512M stargate: size: 1 heapSize: 256M
Save and close the file.
The above manifest file defines the Cassandra cluster configuration with the following values:
- Cassandra version:
4.0.1
- Three cluster worker nodes.
- The
vultr-block-storage
storage class with a 10 GB persistent volume per node. - The Cassandra node JVM heap size is 512 MB.
- The Stargate node JVM is allocated 256 MB heap.
Note- Cassandra version:
Apply the deployment to your cluster.
console$ kubectl apply -n k8ssandra-operator -f cluster.yaml
Wait for at least
15
minutes and view the cluster pods.console$ kubectl get pods -n k8ssandra-operator --watch
Verify that all pods are ready and running similar to the output below:
NAME READY STATUS RESTARTS AGE demo-dc1-default-stargate-deployment-64747477d7-hfck9 1/1 Running 0 78s demo-dc1-default-sts-0 2/2 Running 0 6m5s demo-dc1-default-sts-1 2/2 Running 0 6m5s demo-dc1-default-sts-2 2/2 Running 0 6m5s k8ssandra-operator-65b9c7c9c-km28b 1/1 Running 0 17m k8ssandra-operator-cass-operator-54845bc4f6-hsqds 1/1 Running 0 17m
When all Cassandra database pods are ready, the Stargate
Pod
creation is initiated. Stargate provides a data gateway with REST, GraphQL, and Document APIs in front of the Cassandra database. The name of the StargatePod
should be similar to:demo-dc1-default-stargate-deployment-xxxxxxxxx-xxxxx
.
Verify the Linked Block Storage PVCs for Cassandra Cluster Persistence
The K8ssandra operator deploys Apache Cassandra pods as StatefulSets
to ensure stable network identities and persistent storage. Each StatefulSet
manages a Cassandra node and is backed by a Persistent Volume Claim (PVC) for durable data storage. This section helps you verify that the PVCs are provisioned through Block Storage service for your Kubernetes Engine cluster.
Verify that the
StatefulSet
are available and ready in your cluster.console$ kubectl get statefulset -n k8ssandra-operator
Your output should look like the one below:
NAME READY AGE demo-dc1-default-sts 3/3 7m14s
This above output validates that all three Cassandra nodes are initialized and the
StatefulSet
is active.Verify the available Storage class.
console$ kubectl get sc vultr-block-storage
Your output should look like the one below:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE vultr-block-storage (default) block.csi.vultr.com Delete Immediate true 24m
List all Persistent Volume Claims (PVCs) across namespaces to confirm they are bound.
console$ kubectl get pvc --all-namespaces
Your output should look like the one below:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE k8ssandra-operator server-data-demo-dc1-default-sts-0 Bound pvc-a62852bae9d24dad 10Gi RWO vultr-block-storage <unset> 13m k8ssandra-operator server-data-demo-dc1-default-sts-1 Bound pvc-cfb279b19d0c4a55 10Gi RWO vultr-block-storage <unset> 13m k8ssandra-operator server-data-demo-dc1-default-sts-2 Bound pvc-b09e184f4d7741f6 10Gi RWO vultr-block-storage <unset> 13m
Create a Kubernetes Service To Access the Cassandra Cluster
To expose your Cassandra cluster externally and allow connections using the native Cassandra protocol (CQL) on port 9042, you must create a Kubernetes Service resource of type LoadBalancer. This will assign a public IP via Load Balancer integration.
Create a new service resource file
service.yaml
.console$ nano service.yaml
Add the following contents to the file.
yamlapiVersion: v1 kind: Service metadata: name: cassandra labels: app: cassandra spec: type: LoadBalancer externalTrafficPolicy: Local ports: - port: 9042 targetPort: 9042 selector: app.kubernetes.io/name: cassandra
Save and close the file.
The above configuration defines a Kubernetes service with a
LoadBalancer
type to access Cassandra cluster using port9042
.Apply the service in the
k8ssandra-operator
namespace.console$ kubectl apply -n k8ssandra-operator -f service.yaml
Wait for at least
5
minutes to deploy the cluster Loadbalancer resource and view the Cassandra service.console$ kubectl get svc/cassandra -n k8ssandra-operator
Verify the IP Address in your External-IP value to use for accessing the cluster.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cassandra LoadBalancer 10.103.92.169 192.0.2.1 9042:32444/TCP 2m12s
Note<pending>
, wait a few more minutes and re-run the command. After an IP is assigned, use this IP to connect to Cassandra with a CQL client such ascqlsh
.
Test the Apache Cassandra Cluster
cqlsh
is a command-line interface that allows users to connect to the Cassandra cluster. Follow the steps in this section to execute CQL (Cassandra Query Language) statements, and perform various database operations, such as creating, modifying, and querying data.
Export your Cassandra service Load Balancer IP to the
CASS_IP
variable.console$ CASS_IP=$(kubectl get svc cassandra -n k8ssandra-operator -o jsonpath="{.status.loadBalancer.ingress[*].ip}")
View the assigned Cassandra IP.
console$ echo $CASS_IP
Export the cluster access username to the
CASS_USERNAME
variable.console$ CASS_USERNAME=$(kubectl get secret demo-superuser -n k8ssandra-operator -o=jsonpath='{.data.username}' | base64 --decode)
View the Cassandra username.
console$ echo $CASS_USERNAME
Export the cluster the password to the
CASS_PASSWORD
variable.console$ CASS_PASSWORD=$(kubectl get secret demo-superuser -n k8ssandra-operator -o=jsonpath='{.data.password}' | base64 --decode)
View the Cassandra password.
console$ echo $CASS_PASSWORD
Using
cqlsh
, log in to the Cassandra cluster using your variable values.console$ cqlsh -u $CASS_USERNAME -p $CASS_PASSWORD $CASS_IP 9042
Create a new keyspace
demo
in the Cassandra database.sqldemo-superuser@cqlsh> CREATE KEYSPACE demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
Create a new table
users
in thedemo
keyspace.sqldemo-superuser@cqlsh> CREATE TABLE demo.users (id text primary key, name text, country text);
Insert records into the
users
table.sqldemo-superuser@cqlsh> BEGIN BATCH INSERT INTO demo.users (id, name, country) VALUES ('42', 'John Doe', 'UK'); INSERT INTO demo.users (id, name, country) VALUES ('43', 'Joe Smith', 'US'); APPLY BATCH;
Query the table data to view the stored values.
sqldemo-superuser@cqlsh> SELECT * FROM demo.users;
Your output should be similar to the one below:
id | country | name ----+---------+----------- 43 | US | Joe Smith 42 | UK | John Doe (2 rows)
Conclusion
You have deployed an Apache Cassandra cluster on a Kubernetes Engine environment using the open-source K8ssandra operator. You configured persistent data storage with Block Storage and accessed the Cassandra cluster using the cqlsh
CLI. For more information on advanced configuration and usage, refer to the official Cassandra documentation.
No comments yet.