How to Implement Arrays with Vultr Managed Databases for PostgreSQL

Updated on January 3, 2023

Introduction

In the PostgreSQL server, the array data type allows you to define a column that holds comma-separated elements of similar, related data. Arrays are crucial when implementing database applications with tightly-coupled row data that require little or no relationships. For instance, you can use the PostgreSQL arrays in the following scenarios:

  • Storing multiple customers' phone numbers or email addresses.

  • Analyzing employees' salaries.

  • Storing questions and answers for online courses.

In all the above scenarios, creating lookup tables or other types of relationships would be a sign of database misdesign. This is because each record's data is totally unrelated to the other records. This guide shows you how to implement the array data type with a managed PostgreSQL database cluster on Ubuntu 20.04 server.

Prerequisites

To complete this tutorial:

1. Set Up a Sample Database

To understand how the PostgreSQL array data type works, you need to set up a sample database and a table on your managed PostgreSQL database cluster. The sample table stores customer's information. That is, their names, phone numbers, and savings accounts balances for the last three months. Because each customer can have multiple phone numbers and different bank balances during the period, you must implement the array data type to capture the information.

Follow the steps below to initialize the database structure:

  1. Establish an SSH connection to your server and run the following commands to install the postgresql-client package. This package provides access to the PostgreSQL server via the psql command-line tool.

     $ sudo apt update 
     $ sudo apt install -y postgresql-client
  2. Run the following psql command to connect to your managed PostgreSQL database cluster. Replace the value of h, -p, and -U with the correct host, port, and username for your PostgreSQL database server.

     $ psql -h SAMPLE_POSTGRESQL_DB_HOST_STRING.vultrdb.com -p 16751 -U vultradmin defaultdb
  3. Ensure you get the following password prompt. Key in your database password and press Enter to proceed.

     Password for user vultradmin:
  4. Create a sample microfinance database.

     defaultdb=> CREATE DATABASE microfinance;

    Output.

     CREATE DATABASE
  5. Connect to the new microfinance database.

     defaultdb=> \c microfinance;

    Output.

     You are now connected to database "microfinance" as user "vultradmin".
  6. Create a sample customers table. The customer_id column is the PRIMARY KEY in this table. The SERIAL keyword instructs the PostgreSQL server to assign a customer_id for new rows. The first_name and last_name columns store the customers' full names using the VARCHAR(50) data type. Because a customer can have multiple phone numbers, you're defining the phone column using the VARCHAR(50)[] (array) data type. Also, you're declaring a savings_balances column using the DECIMAL(17, 2)[] (array) data type.

     microfinance=# CREATE TABLE customers (
                        customer_id SERIAL PRIMARY KEY,
                        first_name VARCHAR(50),
                        last_name VARCHAR(50),
                        phone VARCHAR[],   
                        savings_balances DECIMAL(17, 2)[]             
                    );

    Output

     CREATE TABLE

With the sample database and table in place, proceed to the next step and learn how to work with the PostgreSQL array elements.

2. Accessing Array Elements In PostgreSQL

The PostgreSQL server supports different array functions. This step illustrates inserting data into the array columns, accessing array elements, and updating individual array elements.

2.1. Insert Data Into the Array Columns

PostgreSQL allows two methods for inserting data into array columns:

  1. Using curly braces ({sample_element_1, sample_element_2, sample_element_n}).

  2. Using the array constructor (ARRAY [sample_element_1, sample_element_2, sample_element_n]).

This guide implements both INSERT methods for demonstration purposes. Run the following commands to insert sample records into the customers table.

  • Curly braces INSERT statement:

      microfinance=# INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('JOHN', 'DOE', '{111423, 777849}', '{580.27, 1600.23, 4000.45}');
                     INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('MARY', 'SMITH', '{444888, 999321}', '{16000.45, 32000.89, 28000.73}');
                     INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('PETER', 'HENRY', '{765140, 895180}', '{1344.81, 2000.73, 3000.85}');
  • Array constructor INSERT statement:

      microfinance=# INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('ANN', 'JAMES', ARRAY[789566, 345689], ARRAY[9800.23, 7800.45, 4970.23]);
                     INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('FRANCIS', 'ALEX', ARRAY[132563, 145289], ARRAY[6500.68, 8800.63, 3563.24]);

    Output.

      ...
      INSERT 0 1
  1. Query the customers table to ensure the data is in place.

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        phone,
                        savings_balances
                    FROM customers; 
  2. Verify the query result below. As you can see, the customers table stores two phone numbers and three bank balances for each customer using arrays.

      customer_id | first_name | last_name |      phone      |       savings_balances
     -------------+------------+-----------+-----------------+------------------------------
                1 | JOHN       | DOE       | {111423,777849} | {580.27,1600.23,4000.45}
                2 | MARY       | SMITH     | {444888,999321} | {16000.45,32000.89,28000.73}
                3 | PETER      | HENRY     | {765140,895180} | {1344.81,2000.73,3000.85}
                4 | ANN        | JAMES     | {789566,345689} | {9800.23,7800.45,4970.23}
                5 | FRANCIS    | ALEX      | {132563,145289} | {6500.68,8800.63,3563.24}
     (5 rows)

2.2. Access PostgreSQL Array Elements

In the PostgreSQL server, you can access individual array elements and perform some queries on the array columns. Unlike other applications, PostgreSQL arrays use a one-based indexing system. That is, array indexes start at 1 instead of 0. For example, the phone array column has two elements. To access the individual array elements, you should use the following syntax.

phone[1] 
phone[1] 

To better understand the PostgreSQL one-based indexing system, run the following statements against the customers table:

  1. List the names of all customers and the first element from the phone (phone[1]) column.

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        phone[1] as phone_number
                    FROM customers; 

    Output.

      customer_id | first_name | last_name | phone_number
     -------------+------------+-----------+--------------
                1 | JOHN       | DOE       | 111423
                2 | MARY       | SMITH     | 444888
                3 | PETER      | HENRY     | 765140
                4 | ANN        | JAMES     | 789566
                5 | FRANCIS    | ALEX      | 132563
     (5 rows)
  2. Return all customers and their latest bank balance from the savings_balances column (savings_balances[3]).

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances[3] as balance_cf
                    FROM customers; 

    Output.

      customer_id | first_name | last_name | balance_cf
     -------------+------------+-----------+------------
                1 | JOHN       | DOE       |    4000.45
                2 | MARY       | SMITH     |   28000.73
                3 | PETER      | HENRY     |    3000.85
                4 | ANN        | JAMES     |    4970.23
                5 | FRANCIS    | ALEX      |    3563.24
     (5 rows)

Apart from accessing individual array elements in the PostgreSQL server, you can access an array slice using the array's lower and upper bound. To do this, declare the array's lower and upper bound indexes inside two square brackets and separate the indexes by a colon.

[lower_bound_index:upper_bound_index]

If you omit the lower-bound or upper-bound index when declaring a slice, PostgreSQL uses the array's lower or upper limit index by default before executing the query.

To understand the above concepts, run the following queries:

  1. Retrieve the first two savings balances from the customers table starting from index 1 to index 2.

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances[1:2] as bank_balances
                    FROM customers; 

    Output.

      customer_id | first_name | last_name |    bank_balances
     -------------+------------+-----------+---------------------
                1 | JOHN       | DOE       | {580.27,1600.23}
                2 | MARY       | SMITH     | {16000.45,32000.89}
                3 | PETER      | HENRY     | {1344.81,2000.73}
                4 | ANN        | JAMES     | {9800.23,7800.45}
                5 | FRANCIS    | ALEX      | {6500.68,8800.63}
     (5 rows)
  2. Repeat the same query, but this time around, omit the slice's lower-bound index.

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances[:2] as bank_balances
                    FROM customers; 

    Output.

      customer_id | first_name | last_name |   bank_balances
     -------------+------------+-----------+---------------------
                1 | JOHN       | DOE       | {580.27,1600.23}
                2 | MARY       | SMITH     | {16000.45,32000.89}
                3 | PETER      | HENRY     | {1344.81,2000.73}
                4 | ANN        | JAMES     | {9800.23,7800.45}
                5 | FRANCIS    | ALEX      | {6500.68,8800.63}
     (5 rows)

2.3. Update Array Elements

To update an individual array element, you should use the following options:

  1. Recreate the entire array column's values.

  2. Specify the array index of each element when running an UPDATE command.

The following queries illustrate both of the above methods:

  1. Verify the record that you want to update. In this case, you want to update the latest bank balance of the first customer (customer_id #1).

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances
                    FROM customers
                    WHERE customer_id = 1; 
    
     The latest bank balance from the output below is `4000.45`.
    
      customer_id | first_name | last_name |     savings_balances
     -------------+------------+-----------+--------------------------
                1 | JOHN       | DOE       | {580.27,1600.23,4000.45}
     (1 row)
  2. Update the customer's bank balance from 4000.45 to 4000.33 using either of the following methods:

    • Update only the last element (savings_balances[3]) in the savings_balances column by declaring the index of the element.

        microfinance=# UPDATE customers 
                           SET savings_balances[3] = 4000.33
                       WHERE customer_id = 1; 

      Output.

            UPDATE 1
    • Update the entire savings_balances column value for the target customer.

        microfinance=# UPDATE customers 
                           SET savings_balances = '{580.27,1600.23,4000.33}'
                       WHERE customer_id = 1; 

      Output.

        UPDATE 1

After working with PostgreSQL array elements, follow the next step to learn how to filter data with array search operators.

3. Search Array Columns

The PostgreSQL server allows different methods for specifying search criteria when filtering data on a specific array column, as illustrated below:

  1. Run the command below to check if both the first (phone[1]) and second elements (phone[2]) of the phone column match the 444888 keyword.

     microfinance=# SELECT
                        customer_id,
                        first_name,
                        last_name,
                        phone       
                    FROM customers
                    WHERE phone[1] = '444888'
                    OR phone[2] = '444888';  

    Output.

      customer_id | first_name | last_name |      phone
     -------------+------------+-----------+-----------------
        2 | MARY       | SMITH     | {444888,999321}
     (1 row)
  2. Use the ANY keyword to run a search query against an array column without specifying the individual array element. The following syntax is shorter than the previous syntax, and you should use it for large arrays.

     microfinance=# SELECT 
                        customer_id,
                        first_name,
                        last_name,
                        phone      
                    FROM customers 
                    WHERE '444888' = ANY (phone);

    Output.

      customer_id | first_name | last_name |      phone
     -------------+------------+-----------+-----------------
                2 | MARY       | SMITH     | {444888,999321}
     (1 row)
  3. Check all the customers' records where the saving_balances have increased between the first, second, and third months.

     microfinance=# SELECT 
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances
                    FROM customers 
                    WHERE savings_balances[2] >  savings_balances[1]
                    AND   savings_balances[3] >  savings_balances[2];

    Output.

      customer_id | first_name | last_name |     savings_balances
     -------------+------------+-----------+---------------------------
                3 | PETER      | HENRY     | {1344.81,2000.73,3000.85}
                1 | JOHN       | DOE       | {580.27,1600.23,4000.33}
     (2 rows)
  4. Check all the customers' records where the saving_balances have decreased between the first, second, and third months.

     microfinance=# SELECT 
                        customer_id,
                        first_name,
                        last_name,
                        savings_balances
                    FROM customers 
                    WHERE savings_balances[3] < savings_balances[2]
                    AND   savings_balances[2] < savings_balances[1];

    Output.

      customer_id | first_name | last_name |     savings_balances
     -------------+------------+-----------+---------------------------
                4 | ANN        | JAMES     | {9800.23,7800.45,4970.23}
     (1 row)

You can also search array columns using the Generalized Inverted Index (GIN) operators. However, to ensure that PostgreSQL picks up a query plan with the lowest total running cost, you should create an index on the search column. Follow the steps below to complete a sample GIN search.

  1. Begin by creating an index on the phone column.

     microfinance=# CREATE INDEX idx_phone ON customers USING GIN(phone); 
  2. Search a customer's record that matches the phone number 345689 using the @> (contains) operator.

     microfinance=#  SELECT
                        customer_id,
                        first_name,
                        last_name
                    FROM customers 
                    WHERE phone @> '{"345689"}';

    Output.

      customer_id | first_name | last_name |      phone      |     savings_balances
     -------------+------------+-----------+-----------------+---------------------------
                4 | ANN        | JAMES     | {789566,345689} | {9800.23,7800.45,4970.23}
     (1 row)

4. Unnest Arrays

To execute complex array searches, the PostgreSQL server provides the UNNEST() function. This function converts individual array elements into a series of separate rows. For instance, execute the following query to retrieve the average savings from the customers table.

    microfinance=# SELECT 
                       customer_id,
                       AVG(tmp)
                   FROM customers, UNNEST(customers.savings_balances) tmp 
                   GROUP BY customer_id
                   ORDER BY customer_id;
Output.

     customer_id |          avg
    -------------+-----------------------
               1 | 2060.2766666666666667
               2 |    25334.023333333333
               3 | 2115.4633333333333333
               4 | 7523.6366666666666667
               5 | 6288.1833333333333333
    (5 rows)

Conclusion

This guide shows you how to work with the array data type on a managed PostgreSQL server. Implement the samples in this guide when working with tightly-coupled row data on your next PostgreSQL project.

Follow the links below to test more managed PostgreSQL database samples: