
Introduction
Apache Cassandra is a high-performance open-source distributed NoSQL database that handles large amounts of structured data with high availability and scalability. When you Install Apache Cassandra, it offers flexibility beyond traditional databases, as it does not rely on schema and efficiently manages structured, semi-structured, and unstructured data while providing operational simplicity using nodes.
This article explains how to install and configure Apache Cassandra on Debian 12. You will create a new single-node Cassandra cluster and enable access to the Cassandra query language shell (CQLSH) to manage the database server.
Prerequisites
Before you begin:
- Deploy a Debian 12 instance on Vultr with at least
4GBRAM and enable the limited user login feature. - Access the instance using SSH.
Install Apache Cassandra
Apache Cassandra is not available in the default package repositories on Debian 12 and requires the Java Open JDK 11 package as a dependency to run. Follow the steps below to install all dependency packages and Cassandra repository sources using the APT package manager on your server.
Update the server's package information index.
console$ sudo apt update
Open the
/etc/apt/source.listfile.console$ sudo nano /etc/apt/source.list
Add the following directive at the end of the file.
inideb http://deb.debian.org/debian unstable main non-free contrib
Save and close the file.
Update the server's package information index to load the new repository.
console$ sudo apt update
Install the Open JDK 11 package.
console$ sudo apt install openjdk-11-jdk
View the Java version.
console$ java --version
Output:
openjdk 11.0.25-ea 2024-10-15 OpenJDK Runtime Environment (build 11.0.25-ea+5-post-Debian-1) OpenJDK 64-Bit Server VM (build 11.0.25-ea+5-post-Debian-1, mixed mode, sharing)Add the Apache Cassandra repository to your APT sources.
console$ echo "deb [signed-by=/etc/apt/keyrings/apache-cassandra.asc] https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
Download the Apache Cassandra repository signing key.
console$ sudo curl -o /etc/apt/keyrings/apache-cassandra.asc https://downloads.apache.org/cassandra/KEYS
Update your server's package information index to apply the new Cassandra repository.
console$ sudo apt update
Install Apache Cassandra.
console$ sudo apt install cassandra
View the
cassandrasystem service status and verify that it's running.console$ sudo systemctl status cassandra
Output:
● cassandra.service - LSB: distributed storage system for structured data Loaded: loaded (/etc/init.d/cassandra; generated) Active: active (running) since Fri 2024-09-27 12:40:18 UTC; 4s ago Docs: man:systemd-sysv-generator(8) Process: 11707 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS) Tasks: 23 (limit: 4021) Memory: 1.1GThe Cassandra service is active and running on your server based on the above output.
Query the Cassandra cluster node status and verify it's up.
console$ sudo nodetool status
Output:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 249.74 KiB 16 100.0% ab72ec62-84ff-4a52-a7aa-39b2f2c9a898 rack1Explore the Cassandra Python guide to seamlessly integrate the Apache Cassandra driver into your Python applications.
Configure Apache Cassandra
Apache Cassandra runs in a single-node cluster mode unless you modify the /etc/cassandra/cassandra.yaml main configuration file to detect new nodes and listen for connection requests on a specific address. Follow the steps below to configure Apache Cassandra and set up a new cluster on your server.
Stop the Cassandra service.
console$ sudo systemctl stop cassandra
Remove all data directories to clear the default
TestClusterfiles.console$ sudo rm -rf /var/lib/cassandra/*
Open the
cassandra.yamlCassandra configuration file using a text editor likenano.console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
cluster_namedirective and replaceTestClusterwith your desired cluster name likeMyCluster.yamlcluster_name: 'MyCluster'
Find the
seedssection and verify that it's set to your localhost node address127.0.0.1:7000.yamlseed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "127.0.0.1:7000"
Cassandra uses seed nodes to bootstrap new nodes joining the cluster. Modify the
seedsdirective to include the IP addresses of other Cassandra nodes to connect to your cluster.Find the
listen_addressdirective and verify it's set tolocalhost.yamllisten_address: localhost
The
listen_addressdirective enables Cassandra to communicate with other nodes in a cluster.localhostenables Cassandra to only listen for connections on the server. Enter your server's IP address or0.0.0.0as thelisten_addressvalue to allow Cassandra to listen for incoming connections on all network interfaces.Find the
rpc_addressdirective and verify it's set tolocalhost.yamlrpc_address: localhost
Cassandra uses the
rpc_addressvalue to listen for CQLSH client connections using remote procedure calls (RPC) to the node.Save and close the file.
Restart the Cassandra service to apply the configuration changes.
console$ sudo systemctl restart cassandra
Discover how to install Apache Cassandra in Ubuntu to efficiently manage and scale your database operations.
Secure Apache Cassandra
Apache Cassandra uses plain authentication by default, making it vulnerable to unauthorized access. Enable password-based authentication to ensure data confidentiality and secure access to the Cassandra console. Follow the steps below to modify the default Cassandra configuration and enable authentication for all database users.
Open the main Cassandra configuration file.
console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
authenticatorandauthorizerdirectives, and change the values toPasswordAuthenticatorandCassandraAuthorizer.yamlauthenticator: PasswordAuthenticator authorizer: CassandraAuthorizer
Save and close the file.
The above configuration enables password authentication and sets role-based access control.
Restart the Cassandra service to apply the configuration changes.
console$ sudo systemctl restart cassandra
Log in to the Cassandra shell.
console$ cqlsh -u cassandra -p cassandra
Create a new administrative user, such as
adminand set a secure password to use with Apache Cassandra.sqlcqlsh> CREATE USER admin WITH PASSWORD 'pswd12345' SUPERUSER;
Exit the Cassandra shell.
sqlcqlsh> exit
Log in to the Cassandra shell as the new administrative user.
console$ cqlsh -u admin -p pswd12345
Exit the Cassandra shell.
sqlcqlsh> exit
Perform Data Modeling Tasks in Apache Cassandra
Apache Cassandra uses a different data model compared to relational databases with the following key concepts:
- Keyspace: Similar to a database in relational systems.
- Table: Defines the structure of your data.
- Partition Key: Determines how Cassandra distributes data across nodes.
- Clustering Columns: Determines the order of data in a partition.
Cassandra uses the CQL (Cassandra Query Language) and the CQLSH (Cassandra Query Language Shell) to create and manage databases. Follow the steps below to acess the Cassandra shell and perform common data modeling tasks.
Log in to the Cassandra shell as the administrative user you created earlier.
console$ cqlsh -u admin -p pswd12345
Create a new keyspace, such as
example_keyspace.sqlcqlsh> CREATE KEYSPACE example_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
A keyspace defines the replication strategy and is distributed across nodes in Apache Cassandra. Configure a replication strategy such as
SimpleStrategyfor single-node clusters orNetworkTopologyStrategyfor multi-node clusters when creating keyspaces.Switch to the keyspace.
sqlcqlsh> USE example_keyspace;
Create a new table such as
usersand define 3 columns.sqlcqlsh> CREATE TABLE users ( user_id UUID PRIMARY KEY, username TEXT, email TEXT );
Insert new data in the
userstable using theINSERTcommand.sqlcqlsh> INSERT INTO users (user_id, username, email) VALUES (uuid(), 'john_doe', 'john@example.com');
View the table data using the
SELECTcommand.sqlcqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
Output:
user_id | email | username --------------------------------------+------------------+---------- 5fbea013-f4b4-4911-becc-62ccb9b1e55b | john@example.com | john_doeUpdate a record in the table using the
UPDATEcommand. For example, update the user's email using the respectiveuser_idvalue.sqlcqlsh> UPDATE users SET email = 'newemail@example.com' WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
Delete a record from the
userstable using theDELETEcommand.sqlcqlsh> DELETE FROM users WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
Query the table data again and verify that it's empty.
sqlcqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
Output:
user_id | email | username --------------------------------------+------------------+----------
Back up and Restore Apache Cassandra Nodes
Backing up Cassandra nodes enables you to recover the data in case of database failure or data loss. The Cassandra nodetool utility allows you to take snapshots and incremental backups. Follow the steps below to back up and restore Apache Cassandra Nodes using the nodetool utility.
Take a new snapshot from the
example_keyspaceyour created earlier and store it asmy_backup.console$ sudo nodetool snapshot --tag my_backup example_keyspace
Output:
Requested creating snapshot(s) for [all keyspaces] with snapshot name [my_backup] and options {skipFlush=false} Snapshot directory: my_backupOpen the main Cassandra configuration file to enable incremental backups.
console$ sudo nano /etc/cassandra/cassandra.yaml
Find the
incremental_backupsdirective and change the value fromfalsetotrue.yamlincremental_backups: true
Restart Cassandra to apply the backup configuration changes.
console$ sudo systemctl restart cassandra
Run the following command to copy the snapshot data to the keyspace directory in case of data loss.
console$ sudo cp -R /var/lib/cassandra/data/example_keyspace/users-*/snapshots/my_backup/* /var/lib/cassandra/data/example_keyspace/users-*/
Log in to the Cassandra shell.
console$ cqlsh -u admin -p pswd12345
Query the
userstable in theexample_keyspaceto verify the table data is available.consolecqlsh> SELECT * FROM example_keyspace.users LIMIT 10;
Troubleshoot Common Apache Cassandra Installation Errors
Apache Cassandra may display runtime and installation errors. Follow these steps to troubleshoot and fix common issues you may encounter installing Apache Cassandra.
Connection Refused Error
View the Cassandra service status and verify it's running.
console$ sudo systemctl status cassandra
Query the default Apache Cassandra port
9042and verify that it's actively listening for incoming connections.console$ sudo ss -tulnp | grep 9042
View the Cassandra logs to find new entries and troubleshoot the error.
console$ sudo tail -f /var/log/cassandra/system.log
Authentication Failed Error
Ensure your username and password details are correct and run the following command to log in to the Cassandra shell.
console$ cqlsh -u admin -p pswd12345
Query the main Cassandra configuration and verify password authentication is enabled.
console$ sudo grep "authenticator:" /etc/cassandra/cassandra.yaml
Output:
authenticator: PasswordAuthenticatorLog in as the Cassandra super user.
console$ cqlsh -u cassandra -p cassandra
Reset your target user's password. Replace
adminwith your actual Cassandra user.sqlcqlsh> ALTER USER admin WITH PASSWORD 'new_password';
Out-of-Memory Error
View the Cassandra memory usage.
console$ nodetool info | grep "Heap Memory"
Open the
cassandra-env.shfile to increase the heap size.console$ sudo nano /etc/cassandra/cassandra-env.sh
Find the following heap directives and increase the Cassandra memory values.
iniMAX_HEAP_SIZE="4G" HEAP_NEWSIZE="800M"
Save and close the file.
Restart Cassandra to apply changes.
console$ sudo systemctl restart cassandra
If you receive the following memory error:
cassandra.service: Failed with result 'oom-kill'.Upgrade your server plan and verify that it has at least
4GBRAM to run Cassandra.
Slow Query Response Error
Check the Cassandra keyspace for large partitions. Replace
keyspace_namewith your actual keyspace andtable_namewith your target table.console$ sudo nodetool tablestats keyspace_name.table_name
Enable tracing to analyze query performance in the Cassandra shell.
sqlcqlsh> TRACING ON; cqlsh> SELECT * FROM keyspace_name.table_name WHERE ...; cqlsh> TRACING OFF;
Change the compaction strategy to improve the query response rate.
sqlcqlsh> ALTER TABLE keyspace_name.table_name WITH compaction = {'class': 'LeveledCompactionStrategy'};
Conclusion
You have installed Apache Cassandra on your Debian 12 and performed database management tasks. You can integrate Cassandra with other nodes to create a multi-node cluster and set up applications to read and write data in the Cassandra database. For more information and advanced configuration options, visit the Apache Cassandra documentation.