How to Install Apache Cassandra on Debian 12
Introduction
Apache Cassandra is a high-performance open-source distributed NoSQL database that handles large amounts of structured data with high availability and scalability. When you Install Apache Cassandra, it offers flexibility beyond traditional databases, as it does not rely on schema and efficiently manages structured, semi-structured, and unstructured data while providing operational simplicity using nodes.
This article explains how to install and configure Apache Cassandra on Debian 12. You will create a new single-node Cassandra cluster and enable access to the Cassandra query language shell (CQLSH) to manage the database server.
Prerequisites
Before you begin:
- Deploy a Debian 12 instance on Vultr with at least
4GB
RAM and enable the limited user login feature. - Access the instance using SSH.
Install Apache Cassandra
Apache Cassandra is not available in the default package repositories on Debian 12 and requires the Java Open JDK 11 package as a dependency to run. Follow the steps below to install all dependency packages and Cassandra repository sources using the APT package manager on your server.
Update the server's package information index.
console$ sudo apt update
Open the
/etc/apt/source.list
file.console$ sudo nano /etc/apt/source.list
Add the following directive at the end of the file.
inideb http://deb.debian.org/debian unstable main non-free contrib
Save and close the file.
Update the server's package information index to load the new repository.
console$ sudo apt update
Install the Open JDK 11 package.
console$ sudo apt install openjdk-11-jdk
View the Java version.
console$ java --version
Output:
openjdk 11.0.25-ea 2024-10-15 OpenJDK Runtime Environment (build 11.0.25-ea+5-post-Debian-1) OpenJDK 64-Bit Server VM (build 11.0.25-ea+5-post-Debian-1, mixed mode, sharing)
Add the Apache Cassandra repository to your APT sources.
console$ echo "deb [signed-by=/etc/apt/keyrings/apache-cassandra.asc] https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
Download the Apache Cassandra repository signing key.
console$ sudo curl -o /etc/apt/keyrings/apache-cassandra.asc https://downloads.apache.org/cassandra/KEYS
Update your server's package information index to apply the new Cassandra repository.
console$ sudo apt update
Install Apache Cassandra.
console$ sudo apt install cassandra
View the
cassandra
system service status and verify that it's running.console$ sudo systemctl status cassandra
Output:
● cassandra.service - LSB: distributed storage system for structured data Loaded: loaded (/etc/init.d/cassandra; generated) Active: active (running) since Fri 2024-09-27 12:40:18 UTC; 4s ago Docs: man:systemd-sysv-generator(8) Process: 11707 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS) Tasks: 23 (limit: 4021) Memory: 1.1G
The Cassandra service is active and running on your server based on the above output.
Query the Cassandra cluster node status and verify it's up.
console$ sudo nodetool status
Output:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 249.74 KiB 16 100.0% ab72ec62-84ff-4a52-a7aa-39b2f2c9a898 rack1
Explore the Cassandra Python guide to seamlessly integrate the Apache Cassandra driver into your Python applications.
Configure Apache Cassandra
Apache Cassandra runs in a single-node cluster mode unless you modify the /etc/cassandra/cassandra.yaml
main configuration file to detect new nodes and listen for connection requests on a specific address. Follow the steps below to configure Apache Cassandra and set up a new cluster on your server.
Stop the Cassandra service.
console$ sudo systemctl stop cassandra
Remove all data directories to clear the default
TestCluster
files.console$ sudo rm -rf /var/lib/cassandra/*
Open the
cassandra.yaml
Cassandra configuration file using a text editor likenano
.console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
cluster_name
directive and replaceTestCluster
with your desired cluster name likeMyCluster
.yamlcluster_name: 'MyCluster'
Find the
seeds
section and verify that it's set to your localhost node address127.0.0.1:7000
.yamlseed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "127.0.0.1:7000"
Cassandra uses seed nodes to bootstrap new nodes joining the cluster. Modify the
seeds
directive to include the IP addresses of other Cassandra nodes to connect to your cluster.Find the
listen_address
directive and verify it's set tolocalhost
.yamllisten_address: localhost
The
listen_address
directive enables Cassandra to communicate with other nodes in a cluster.localhost
enables Cassandra to only listen for connections on the server. Enter your server's IP address or0.0.0.0
as thelisten_address
value to allow Cassandra to listen for incoming connections on all network interfaces.Find the
rpc_address
directive and verify it's set tolocalhost
.yamlrpc_address: localhost
Cassandra uses the
rpc_address
value to listen for CQLSH client connections using remote procedure calls (RPC) to the node.Save and close the file.
Restart the Cassandra service to apply the configuration changes.
console$ sudo systemctl restart cassandra
Discover how to install Apache Cassandra in Ubuntu to efficiently manage and scale your database operations.
Secure Apache Cassandra
Apache Cassandra uses plain authentication by default, making it vulnerable to unauthorized access. Enable password-based authentication to ensure data confidentiality and secure access to the Cassandra console. Follow the steps below to modify the default Cassandra configuration and enable authentication for all database users.
Open the main Cassandra configuration file.
console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
authenticator
andauthorizer
directives, and change the values toPasswordAuthenticator
andCassandraAuthorizer
.yamlauthenticator: PasswordAuthenticator authorizer: CassandraAuthorizer
Save and close the file.
The above configuration enables password authentication and sets role-based access control.
Restart the Cassandra service to apply the configuration changes.
console$ sudo systemctl restart cassandra
Log in to the Cassandra shell.
console$ cqlsh -u cassandra -p cassandra
Create a new administrative user, such as
admin
and set a secure password to use with Apache Cassandra.sqlcqlsh> CREATE USER admin WITH PASSWORD 'pswd12345' SUPERUSER;
Exit the Cassandra shell.
sqlcqlsh> exit
Log in to the Cassandra shell as the new administrative user.
console$ cqlsh -u admin -p pswd12345
Exit the Cassandra shell.
sqlcqlsh> exit
Perform Data Modeling Tasks in Apache Cassandra
Apache Cassandra uses a different data model compared to relational databases with the following key concepts:
- Keyspace: Similar to a database in relational systems.
- Table: Defines the structure of your data.
- Partition Key: Determines how Cassandra distributes data across nodes.
- Clustering Columns: Determines the order of data in a partition.
Cassandra uses the CQL (Cassandra Query Language) and the CQLSH (Cassandra Query Language Shell) to create and manage databases. Follow the steps below to acess the Cassandra shell and perform common data modeling tasks.
Log in to the Cassandra shell as the administrative user you created earlier.
console$ cqlsh -u admin -p pswd12345
Create a new keyspace, such as
example_keyspace
.sqlcqlsh> CREATE KEYSPACE example_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
A keyspace defines the replication strategy and is distributed across nodes in Apache Cassandra. Configure a replication strategy such as
SimpleStrategy
for single-node clusters orNetworkTopologyStrategy
for multi-node clusters when creating keyspaces.Switch to the keyspace.
sqlcqlsh> USE example_keyspace;
Create a new table such as
users
and define 3 columns.sqlcqlsh> CREATE TABLE users ( user_id UUID PRIMARY KEY, username TEXT, email TEXT );
Insert new data in the
users
table using theINSERT
command.sqlcqlsh> INSERT INTO users (user_id, username, email) VALUES (uuid(), 'john_doe', 'john@example.com');
View the table data using the
SELECT
command.sqlcqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
Output:
user_id | email | username --------------------------------------+------------------+---------- 5fbea013-f4b4-4911-becc-62ccb9b1e55b | john@example.com | john_doe
Update a record in the table using the
UPDATE
command. For example, update the user's email using the respectiveuser_id
value.sqlcqlsh> UPDATE users SET email = 'newemail@example.com' WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
Delete a record from the
users
table using theDELETE
command.sqlcqlsh> DELETE FROM users WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
Query the table data again and verify that it's empty.
sqlcqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
Output:
user_id | email | username --------------------------------------+------------------+----------
Back up and Restore Apache Cassandra Nodes
Backing up Cassandra nodes enables you to recover the data in case of database failure or data loss. The Cassandra nodetool
utility allows you to take snapshots and incremental backups. Follow the steps below to back up and restore Apache Cassandra Nodes using the nodetool
utility.
Take a new snapshot from the
example_keyspace
your created earlier and store it asmy_backup
.console$ sudo nodetool snapshot --tag my_backup example_keyspace
Output:
Requested creating snapshot(s) for [all keyspaces] with snapshot name [my_backup] and options {skipFlush=false} Snapshot directory: my_backup
Open the main Cassandra configuration file to enable incremental backups.
console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
incremental_backups
directive and change the value fromfalse
totrue
.yamlincremental_backups: true
Restart Cassandra to apply the backup configuration changes.
console$ sudo systemctl restart cassandra
Run the following command to copy the snapshot data to the keyspace directory in case of data loss.
console$ sudo cp -R /var/lib/cassandra/data/example_keyspace/users-*/snapshots/my_backup/* /var/lib/cassandra/data/example_keyspace/users-*/
Log in to the Cassandra shell.
console$ cqlsh -u admin -p pswd12345
Query the
users
table in theexample_keyspace
to verify the table data is available.consolecqlsh> SELECT * FROM example_keyspace.users LIMIT 10;
Troubleshoot Common Apache Cassandra Installation Errors
Apache Cassandra may display runtime and installation errors. Follow these steps to troubleshoot and fix common issues you may encounter installing Apache Cassandra.
Connection Refused Error
View the Cassandra service status and verify it's running.
console$ sudo systemctl status cassandra
Query the default Apache Cassandra port
9042
and verify that it's actively listening for incoming connections.console$ sudo ss -tulnp | grep 9042
View the Cassandra logs to find new entries and troubleshoot the error.
console$ sudo tail -f /var/log/cassandra/system.log
Authentication Failed Error
Ensure your username and password details are correct and run the following command to log in to the Cassandra shell.
console$ cqlsh -u admin -p pswd12345
Query the main Cassandra configuration and verify password authentication is enabled.
console$ sudo grep "authenticator:" /etc/cassandra/cassandra.yaml
Output:
authenticator: PasswordAuthenticator
Log in as the Cassandra super user.
console$ cqlsh -u cassandra -p cassandra
Reset your target user's password. Replace
admin
with your actual Cassandra user.sqlcqlsh> ALTER USER admin WITH PASSWORD 'new_password';
Out-of-Memory Error
View the Cassandra memory usage.
console$ nodetool info | grep "Heap Memory"
Open the
cassandra-env.sh
file to increase the heap size.console$ sudo nano /etc/cassandra/cassandra-env.sh
Find the following heap directives and increase the Cassandra memory values.
iniMAX_HEAP_SIZE="4G" HEAP_NEWSIZE="800M"
Save and close the file.
Restart Cassandra to apply changes.
console$ sudo systemctl restart cassandra
If you receive the following memory error:
cassandra.service: Failed with result 'oom-kill'.
Upgrade your server plan and verify that it has at least
4GB
RAM to run Cassandra.
Slow Query Response Error
Check the Cassandra keyspace for large partitions. Replace
keyspace_name
with your actual keyspace andtable_name
with your target table.console$ sudo nodetool tablestats keyspace_name.table_name
Enable tracing to analyze query performance in the Cassandra shell.
sqlcqlsh> TRACING ON; cqlsh> SELECT * FROM keyspace_name.table_name WHERE ...; cqlsh> TRACING OFF;
Change the compaction strategy to improve the query response rate.
sqlcqlsh> ALTER TABLE keyspace_name.table_name WITH compaction = {'class': 'LeveledCompactionStrategy'};
Conclusion
You have installed Apache Cassandra on your Debian 12 and performed database management tasks. You can integrate Cassandra with other nodes to create a multi-node cluster and set up applications to read and write data in the Cassandra database. For more information and advanced configuration options, visit the Apache Cassandra documentation.