How to Install Apache Cassandra on Debian 12

Updated on November 1, 2024
How to Install Apache Cassandra on Debian 12 header image

Introduction

Apache Cassandra is a high-performance open-source distributed NoSQL database that handles large amounts of structured data with high availability and scalability. When you Install Apache Cassandra, it offers flexibility beyond traditional databases, as it does not rely on schema and efficiently manages structured, semi-structured, and unstructured data while providing operational simplicity using nodes.

This article explains how to install and configure Apache Cassandra on Debian 12. You will create a new single-node Cassandra cluster and enable access to the Cassandra query language shell (CQLSH) to manage the database server.

Prerequisites

Before you begin:

Install Apache Cassandra

Apache Cassandra is not available in the default package repositories on Debian 12 and requires the Java Open JDK 11 package as a dependency to run. Follow the steps below to install all dependency packages and Cassandra repository sources using the APT package manager on your server.

  1. Update the server's package information index.

    console
    $ sudo apt update
    
  2. Open the /etc/apt/source.list file.

    console
    $ sudo nano /etc/apt/source.list
    
  3. Add the following directive at the end of the file.

    ini
    deb http://deb.debian.org/debian unstable main non-free contrib
    

    Save and close the file.

  4. Update the server's package information index to load the new repository.

    console
    $ sudo apt update
    
  5. Install the Open JDK 11 package.

    console
    $ sudo apt install openjdk-11-jdk
    
  6. View the Java version.

    console
    $ java --version
    

    Output:

    openjdk 11.0.25-ea 2024-10-15
    OpenJDK Runtime Environment (build 11.0.25-ea+5-post-Debian-1)
    OpenJDK 64-Bit Server VM (build 11.0.25-ea+5-post-Debian-1, mixed mode, sharing)
  7. Add the Apache Cassandra repository to your APT sources.

    console
    $ echo "deb [signed-by=/etc/apt/keyrings/apache-cassandra.asc] https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
    
  8. Download the Apache Cassandra repository signing key.

    console
    $ sudo curl -o /etc/apt/keyrings/apache-cassandra.asc https://downloads.apache.org/cassandra/KEYS
    
  9. Update your server's package information index to apply the new Cassandra repository.

    console
    $ sudo apt update
    
  10. Install Apache Cassandra.

    console
    $ sudo apt install cassandra
    
  11. View the cassandra system service status and verify that it's running.

    console
    $ sudo systemctl status cassandra
    

    Output:

    ● cassandra.service - LSB: distributed storage system for structured data
        Loaded: loaded (/etc/init.d/cassandra; generated)
        Active: active (running) since Fri 2024-09-27 12:40:18 UTC; 4s ago
            Docs: man:systemd-sysv-generator(8)
        Process: 11707 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS)
        Tasks: 23 (limit: 4021)
        Memory: 1.1G

    The Cassandra service is active and running on your server based on the above output.

  12. Query the Cassandra cluster node status and verify it's up.

    console
    $ sudo nodetool status
    

    Output:

    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load        Tokens  Owns (effective)  Host ID                               Rack 
    UN  127.0.0.1  249.74 KiB  16      100.0%            ab72ec62-84ff-4a52-a7aa-39b2f2c9a898  rack1 

    Explore the Cassandra Python guide to seamlessly integrate the Apache Cassandra driver into your Python applications.

Configure Apache Cassandra

Apache Cassandra runs in a single-node cluster mode unless you modify the /etc/cassandra/cassandra.yaml main configuration file to detect new nodes and listen for connection requests on a specific address. Follow the steps below to configure Apache Cassandra and set up a new cluster on your server.

  1. Stop the Cassandra service.

    console
    $ sudo systemctl stop cassandra
    
  2. Remove all data directories to clear the default TestCluster files.

    console
    $ sudo rm -rf /var/lib/cassandra/*
    
  3. Open the cassandra.yaml Cassandra configuration file using a text editor like nano.

    console
    $ sudo nano /etc/cassandra/cassandra.yaml
    
    • Find the cluster_name directive and replace TestCluster with your desired cluster name like MyCluster.

      yaml
      cluster_name: 'MyCluster'
      
    • Find the seeds section and verify that it's set to your localhost node address 127.0.0.1:7000.

      yaml
      seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
          parameters:
              - seeds: "127.0.0.1:7000"
      

      Cassandra uses seed nodes to bootstrap new nodes joining the cluster. Modify the seeds directive to include the IP addresses of other Cassandra nodes to connect to your cluster.

    • Find the listen_address directive and verify it's set to localhost.

      yaml
      listen_address: localhost
      

      The listen_address directive enables Cassandra to communicate with other nodes in a cluster. localhost enables Cassandra to only listen for connections on the server. Enter your server's IP address or 0.0.0.0 as the listen_address value to allow Cassandra to listen for incoming connections on all network interfaces.

    • Find the rpc_address directive and verify it's set to localhost.

      yaml
      rpc_address: localhost
      

      Cassandra uses the rpc_address value to listen for CQLSH client connections using remote procedure calls (RPC) to the node.

      Save and close the file.

  4. Restart the Cassandra service to apply the configuration changes.

    console
    $ sudo systemctl restart cassandra
    

Discover how to install Apache Cassandra in Ubuntu to efficiently manage and scale your database operations.

Secure Apache Cassandra

Apache Cassandra uses plain authentication by default, making it vulnerable to unauthorized access. Enable password-based authentication to ensure data confidentiality and secure access to the Cassandra console. Follow the steps below to modify the default Cassandra configuration and enable authentication for all database users.

  1. Open the main Cassandra configuration file.

    console
    $ sudo nano /etc/cassandra/cassandra.yaml
    
  2. Find the authenticator and authorizer directives, and change the values to PasswordAuthenticator and CassandraAuthorizer.

    yaml
    authenticator: PasswordAuthenticator
    authorizer: CassandraAuthorizer
    

    Save and close the file.

    The above configuration enables password authentication and sets role-based access control.

  3. Restart the Cassandra service to apply the configuration changes.

    console
    $ sudo systemctl restart cassandra
    
  4. Log in to the Cassandra shell.

    console
    $ cqlsh -u cassandra -p cassandra
    
  5. Create a new administrative user, such as admin and set a secure password to use with Apache Cassandra.

    sql
    cqlsh> CREATE USER admin WITH PASSWORD 'pswd12345' SUPERUSER;
    
  6. Exit the Cassandra shell.

    sql
    cqlsh> exit
    
  7. Log in to the Cassandra shell as the new administrative user.

    console
    $ cqlsh -u admin -p pswd12345
    
  8. Exit the Cassandra shell.

    sql
    cqlsh> exit
    

Perform Data Modeling Tasks in Apache Cassandra

Apache Cassandra uses a different data model compared to relational databases with the following key concepts:

  • Keyspace: Similar to a database in relational systems.
  • Table: Defines the structure of your data.
  • Partition Key: Determines how Cassandra distributes data across nodes.
  • Clustering Columns: Determines the order of data in a partition.

Cassandra uses the CQL (Cassandra Query Language) and the CQLSH (Cassandra Query Language Shell) to create and manage databases. Follow the steps below to acess the Cassandra shell and perform common data modeling tasks.

  1. Log in to the Cassandra shell as the administrative user you created earlier.

    console
    $ cqlsh -u admin -p pswd12345
    
  2. Create a new keyspace, such as example_keyspace.

    sql
    cqlsh> CREATE KEYSPACE example_keyspace 
           WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    

    A keyspace defines the replication strategy and is distributed across nodes in Apache Cassandra. Configure a replication strategy such as SimpleStrategy for single-node clusters or NetworkTopologyStrategy for multi-node clusters when creating keyspaces.

  3. Switch to the keyspace.

    sql
    cqlsh> USE example_keyspace;
    
  4. Create a new table such as users and define 3 columns.

    sql
    cqlsh> CREATE TABLE users (
           user_id UUID PRIMARY KEY,
           username TEXT,
           email TEXT
           );
    
  5. Insert new data in the users table using the INSERT command.

    sql
    cqlsh> INSERT INTO users (user_id, username, email) 
           VALUES (uuid(), 'john_doe', 'john@example.com');
    
  6. View the table data using the SELECT command.

    sql
    cqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
    

    Output:

    user_id                              | email            | username
    --------------------------------------+------------------+----------
    5fbea013-f4b4-4911-becc-62ccb9b1e55b | john@example.com | john_doe
  7. Update a record in the table using the UPDATE command. For example, update the user's email using the respective user_id value.

    sql
    cqlsh> UPDATE users 
          SET email = 'newemail@example.com' 
          WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
    
  8. Delete a record from the users table using the DELETE command.

    sql
    cqlsh> DELETE FROM users 
           WHERE user_id = 5fbea013-f4b4-4911-becc-62ccb9b1e55b;
    
  9. Query the table data again and verify that it's empty.

    sql
    cqlsh> SELECT * FROM users WHERE username = 'john_doe' ALLOW FILTERING;
    

    Output:

    user_id                              | email            | username
    --------------------------------------+------------------+----------

Back up and Restore Apache Cassandra Nodes

Backing up Cassandra nodes enables you to recover the data in case of database failure or data loss. The Cassandra nodetool utility allows you to take snapshots and incremental backups. Follow the steps below to back up and restore Apache Cassandra Nodes using the nodetool utility.

  1. Take a new snapshot from the example_keyspace your created earlier and store it as my_backup.

    console
    $ sudo nodetool snapshot --tag my_backup example_keyspace
    

    Output:

    Requested creating snapshot(s) for [all keyspaces] with snapshot name [my_backup] and options {skipFlush=false}
    Snapshot directory: my_backup
  2. Open the main Cassandra configuration file to enable incremental backups.

    console
    $ sudo nano /etc/cassandra/cassandra.yaml
    

    Find the incremental_backups directive and change the value from false to true.

    yaml
    incremental_backups: true
    
  3. Restart Cassandra to apply the backup configuration changes.

    console
    $ sudo systemctl restart cassandra
    
  4. Run the following command to copy the snapshot data to the keyspace directory in case of data loss.

    console
    $ sudo cp -R /var/lib/cassandra/data/example_keyspace/users-*/snapshots/my_backup/* /var/lib/cassandra/data/example_keyspace/users-*/
    
  5. Log in to the Cassandra shell.

    console
    $ cqlsh -u admin -p pswd12345
    
  6. Query the users table in the example_keyspace to verify the table data is available.

    console
    cqlsh> SELECT * FROM example_keyspace.users LIMIT 10;
    

Troubleshoot Common Apache Cassandra Installation Errors

Apache Cassandra may display runtime and installation errors. Follow these steps to troubleshoot and fix common issues you may encounter installing Apache Cassandra.

Connection Refused Error

  1. View the Cassandra service status and verify it's running.

    console
    $ sudo systemctl status cassandra
    
  2. Query the default Apache Cassandra port 9042 and verify that it's actively listening for incoming connections.

    console
    $ sudo ss -tulnp | grep 9042
    
  3. View the Cassandra logs to find new entries and troubleshoot the error.

    console
    $ sudo tail -f /var/log/cassandra/system.log
    

Authentication Failed Error

  1. Ensure your username and password details are correct and run the following command to log in to the Cassandra shell.

    console
    $ cqlsh -u admin -p pswd12345
    
  2. Query the main Cassandra configuration and verify password authentication is enabled.

    console
    $ sudo grep "authenticator:" /etc/cassandra/cassandra.yaml
    

    Output:

    authenticator: PasswordAuthenticator
  3. Log in as the Cassandra super user.

    console
    $ cqlsh -u cassandra -p cassandra
    
  4. Reset your target user's password. Replace admin with your actual Cassandra user.

    sql
    cqlsh> ALTER USER admin WITH PASSWORD 'new_password';
    

Out-of-Memory Error

  1. View the Cassandra memory usage.

    console
    $ nodetool info | grep "Heap Memory"
    
  2. Open the cassandra-env.sh file to increase the heap size.

    console
    $ sudo nano /etc/cassandra/cassandra-env.sh
    

    Find the following heap directives and increase the Cassandra memory values.

    ini
    MAX_HEAP_SIZE="4G"
    HEAP_NEWSIZE="800M"
    

    Save and close the file.

  3. Restart Cassandra to apply changes.

    console
    $ sudo systemctl restart cassandra
    

    If you receive the following memory error:

    cassandra.service: Failed with result 'oom-kill'.

    Upgrade your server plan and verify that it has at least 4GB RAM to run Cassandra.

Slow Query Response Error

  1. Check the Cassandra keyspace for large partitions. Replace keyspace_name with your actual keyspace and table_name with your target table.

    console
    $ sudo nodetool tablestats keyspace_name.table_name
    
  2. Enable tracing to analyze query performance in the Cassandra shell.

    sql
    cqlsh> TRACING ON;
    cqlsh> SELECT * FROM keyspace_name.table_name WHERE ...;
    cqlsh> TRACING OFF;
    
  3. Change the compaction strategy to improve the query response rate.

    sql
    cqlsh> ALTER TABLE keyspace_name.table_name 
           WITH compaction = {'class': 'LeveledCompactionStrategy'};
    

Conclusion

You have installed Apache Cassandra on your Debian 12 and performed database management tasks. You can integrate Cassandra with other nodes to create a multi-node cluster and set up applications to read and write data in the Cassandra database. For more information and advanced configuration options, visit the Apache Cassandra documentation.