How to Implement Arrays with Vultr Managed Databases for PostgreSQL
Introduction
In the PostgreSQL server, the array data type allows you to define a column that holds comma-separated elements of similar, related data. Arrays are crucial when implementing database applications with tightly-coupled row data that require little or no relationships. For instance, you can use the PostgreSQL arrays in the following scenarios:
Storing multiple customers' phone numbers or email addresses.
Analyzing employees' salaries.
Storing questions and answers for online courses.
In all the above scenarios, creating lookup tables or other types of relationships would be a sign of database misdesign. This is because each record's data is totally unrelated to the other records. This guide shows you how to implement the array data type with a managed PostgreSQL database cluster on Ubuntu 20.04 server.
Prerequisites
To complete this tutorial:
Sign in to your Vultr account and locate your managed database cluster details. Then, under the Overview tab, find the Connection Details. This guide uses the following sample connection details:
username:
vultradmin
password:
EXAMPLE_POSTGRESQL_PASSWORD
host:
SAMPLE_POSTGRESQL_DB_HOST_STRING.vultrdb.com
port:
16751
1. Set Up a Sample Database
To understand how the PostgreSQL array data type works, you need to set up a sample database and a table on your managed PostgreSQL database cluster. The sample table stores customer's information. That is, their names, phone numbers, and savings accounts balances for the last three months. Because each customer can have multiple phone numbers and different bank balances during the period, you must implement the array data type to capture the information.
Follow the steps below to initialize the database structure:
Establish an
SSH
connection to your server and run the following commands to install thepostgresql-client
package. This package provides access to the PostgreSQL server via thepsql
command-line tool.$ sudo apt update $ sudo apt install -y postgresql-client
Run the following
psql
command to connect to your managed PostgreSQL database cluster. Replace the value ofh
,-p
, and-U
with the correct host, port, and username for your PostgreSQL database server.$ psql -h SAMPLE_POSTGRESQL_DB_HOST_STRING.vultrdb.com -p 16751 -U vultradmin defaultdb
Ensure you get the following password prompt. Key in your database password and press Enter to proceed.
Password for user vultradmin:
Create a sample
microfinance
database.defaultdb=> CREATE DATABASE microfinance;
Output.
CREATE DATABASE
Connect to the new
microfinance
database.defaultdb=> \c microfinance;
Output.
You are now connected to database "microfinance" as user "vultradmin".
Create a sample
customers
table. Thecustomer_id
column is thePRIMARY KEY
in this table. TheSERIAL
keyword instructs the PostgreSQL server to assign acustomer_id
for new rows. Thefirst_name
andlast_name
columns store the customers' full names using theVARCHAR(50)
data type. Because a customer can have multiple phone numbers, you're defining thephone
column using theVARCHAR(50)[]
(array) data type. Also, you're declaring asavings_balances
column using theDECIMAL(17, 2)[]
(array) data type.microfinance=# CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), phone VARCHAR[], savings_balances DECIMAL(17, 2)[] );
Output
CREATE TABLE
With the sample database and table in place, proceed to the next step and learn how to work with the PostgreSQL array elements.
2. Accessing Array Elements In PostgreSQL
The PostgreSQL server supports different array functions. This step illustrates inserting data into the array columns, accessing array elements, and updating individual array elements.
2.1. Insert Data Into the Array Columns
PostgreSQL allows two methods for inserting data into array columns:
Using curly braces (
{sample_element_1, sample_element_2, sample_element_n}
).Using the array constructor (
ARRAY [sample_element_1, sample_element_2, sample_element_n]
).
This guide implements both INSERT
methods for demonstration purposes. Run the following commands to insert sample records into the customers
table.
Curly braces
INSERT
statement:microfinance=# INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('JOHN', 'DOE', '{111423, 777849}', '{580.27, 1600.23, 4000.45}'); INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('MARY', 'SMITH', '{444888, 999321}', '{16000.45, 32000.89, 28000.73}'); INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('PETER', 'HENRY', '{765140, 895180}', '{1344.81, 2000.73, 3000.85}');
Array constructor
INSERT
statement:microfinance=# INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('ANN', 'JAMES', ARRAY[789566, 345689], ARRAY[9800.23, 7800.45, 4970.23]); INSERT INTO customers (first_name, last_name, phone, savings_balances) VALUES ('FRANCIS', 'ALEX', ARRAY[132563, 145289], ARRAY[6500.68, 8800.63, 3563.24]);
Output.
... INSERT 0 1
Query the
customers
table to ensure the data is in place.microfinance=# SELECT customer_id, first_name, last_name, phone, savings_balances FROM customers;
Verify the query result below. As you can see, the
customers
table stores two phone numbers and three bank balances for each customer using arrays.customer_id | first_name | last_name | phone | savings_balances -------------+------------+-----------+-----------------+------------------------------ 1 | JOHN | DOE | {111423,777849} | {580.27,1600.23,4000.45} 2 | MARY | SMITH | {444888,999321} | {16000.45,32000.89,28000.73} 3 | PETER | HENRY | {765140,895180} | {1344.81,2000.73,3000.85} 4 | ANN | JAMES | {789566,345689} | {9800.23,7800.45,4970.23} 5 | FRANCIS | ALEX | {132563,145289} | {6500.68,8800.63,3563.24} (5 rows)
2.2. Access PostgreSQL Array Elements
In the PostgreSQL server, you can access individual array elements and perform some queries on the array columns. Unlike other applications, PostgreSQL arrays use a one-based indexing system. That is, array indexes start at 1
instead of 0
. For example, the phone
array column has two elements. To access the individual array elements, you should use the following syntax.
phone[1]
phone[1]
To better understand the PostgreSQL one-based indexing system, run the following statements against the customers
table:
List the names of all customers and the first element from the
phone
(phone[1]
) column.microfinance=# SELECT customer_id, first_name, last_name, phone[1] as phone_number FROM customers;
Output.
customer_id | first_name | last_name | phone_number -------------+------------+-----------+-------------- 1 | JOHN | DOE | 111423 2 | MARY | SMITH | 444888 3 | PETER | HENRY | 765140 4 | ANN | JAMES | 789566 5 | FRANCIS | ALEX | 132563 (5 rows)
Return all customers and their latest bank balance from the
savings_balances
column (savings_balances[3]
).microfinance=# SELECT customer_id, first_name, last_name, savings_balances[3] as balance_cf FROM customers;
Output.
customer_id | first_name | last_name | balance_cf -------------+------------+-----------+------------ 1 | JOHN | DOE | 4000.45 2 | MARY | SMITH | 28000.73 3 | PETER | HENRY | 3000.85 4 | ANN | JAMES | 4970.23 5 | FRANCIS | ALEX | 3563.24 (5 rows)
Apart from accessing individual array elements in the PostgreSQL server, you can access an array slice using the array's lower and upper bound. To do this, declare the array's lower and upper bound indexes inside two square brackets and separate the indexes by a colon.
[lower_bound_index:upper_bound_index]
If you omit the lower-bound or upper-bound index when declaring a slice, PostgreSQL uses the array's lower or upper limit index by default before executing the query.
To understand the above concepts, run the following queries:
Retrieve the first two savings balances from the
customers
table starting from index1
to index2
.microfinance=# SELECT customer_id, first_name, last_name, savings_balances[1:2] as bank_balances FROM customers;
Output.
customer_id | first_name | last_name | bank_balances -------------+------------+-----------+--------------------- 1 | JOHN | DOE | {580.27,1600.23} 2 | MARY | SMITH | {16000.45,32000.89} 3 | PETER | HENRY | {1344.81,2000.73} 4 | ANN | JAMES | {9800.23,7800.45} 5 | FRANCIS | ALEX | {6500.68,8800.63} (5 rows)
Repeat the same query, but this time around, omit the slice's lower-bound index.
microfinance=# SELECT customer_id, first_name, last_name, savings_balances[:2] as bank_balances FROM customers;
Output.
customer_id | first_name | last_name | bank_balances -------------+------------+-----------+--------------------- 1 | JOHN | DOE | {580.27,1600.23} 2 | MARY | SMITH | {16000.45,32000.89} 3 | PETER | HENRY | {1344.81,2000.73} 4 | ANN | JAMES | {9800.23,7800.45} 5 | FRANCIS | ALEX | {6500.68,8800.63} (5 rows)
2.3. Update Array Elements
To update an individual array element, you should use the following options:
Recreate the entire array column's values.
Specify the array index of each element when running an
UPDATE
command.
The following queries illustrate both of the above methods:
Verify the record that you want to update. In this case, you want to update the latest bank balance of the first customer (
customer_id #1
).microfinance=# SELECT customer_id, first_name, last_name, savings_balances FROM customers WHERE customer_id = 1; The latest bank balance from the output below is `4000.45`. customer_id | first_name | last_name | savings_balances -------------+------------+-----------+-------------------------- 1 | JOHN | DOE | {580.27,1600.23,4000.45} (1 row)
Update the customer's bank balance from
4000.45
to4000.33
using either of the following methods:Update only the last element (
savings_balances[3]
) in thesavings_balances
column by declaring the index of the element.microfinance=# UPDATE customers SET savings_balances[3] = 4000.33 WHERE customer_id = 1;
Output.
UPDATE 1
Update the entire
savings_balances
column value for the target customer.microfinance=# UPDATE customers SET savings_balances = '{580.27,1600.23,4000.33}' WHERE customer_id = 1;
Output.
UPDATE 1
After working with PostgreSQL array elements, follow the next step to learn how to filter data with array search operators.
3. Search Array Columns
The PostgreSQL server allows different methods for specifying search criteria when filtering data on a specific array column, as illustrated below:
Run the command below to check if both the first (
phone[1]
) and second elements (phone[2]
) of thephone
column match the444888
keyword.microfinance=# SELECT customer_id, first_name, last_name, phone FROM customers WHERE phone[1] = '444888' OR phone[2] = '444888';
Output.
customer_id | first_name | last_name | phone -------------+------------+-----------+----------------- 2 | MARY | SMITH | {444888,999321} (1 row)
Use the
ANY
keyword to run a search query against an array column without specifying the individual array element. The following syntax is shorter than the previous syntax, and you should use it for large arrays.microfinance=# SELECT customer_id, first_name, last_name, phone FROM customers WHERE '444888' = ANY (phone);
Output.
customer_id | first_name | last_name | phone -------------+------------+-----------+----------------- 2 | MARY | SMITH | {444888,999321} (1 row)
Check all the customers' records where the
saving_balances
have increased between the first, second, and third months.microfinance=# SELECT customer_id, first_name, last_name, savings_balances FROM customers WHERE savings_balances[2] > savings_balances[1] AND savings_balances[3] > savings_balances[2];
Output.
customer_id | first_name | last_name | savings_balances -------------+------------+-----------+--------------------------- 3 | PETER | HENRY | {1344.81,2000.73,3000.85} 1 | JOHN | DOE | {580.27,1600.23,4000.33} (2 rows)
Check all the customers' records where the
saving_balances
have decreased between the first, second, and third months.microfinance=# SELECT customer_id, first_name, last_name, savings_balances FROM customers WHERE savings_balances[3] < savings_balances[2] AND savings_balances[2] < savings_balances[1];
Output.
customer_id | first_name | last_name | savings_balances -------------+------------+-----------+--------------------------- 4 | ANN | JAMES | {9800.23,7800.45,4970.23} (1 row)
You can also search array columns using the Generalized Inverted Index (GIN
) operators. However, to ensure that PostgreSQL picks up a query plan with the lowest total running cost, you should create an index on the search column. Follow the steps below to complete a sample GIN
search.
Begin by creating an index on the
phone
column.microfinance=# CREATE INDEX idx_phone ON customers USING GIN(phone);
Search a customer's record that matches the phone number
345689
using the@>
(contains) operator.microfinance=# SELECT customer_id, first_name, last_name FROM customers WHERE phone @> '{"345689"}';
Output.
customer_id | first_name | last_name | phone | savings_balances -------------+------------+-----------+-----------------+--------------------------- 4 | ANN | JAMES | {789566,345689} | {9800.23,7800.45,4970.23} (1 row)
4. Unnest Arrays
To execute complex array searches, the PostgreSQL server provides the UNNEST()
function. This function converts individual array elements into a series of separate rows. For instance, execute the following query to retrieve the average savings from the customers
table.
microfinance=# SELECT
customer_id,
AVG(tmp)
FROM customers, UNNEST(customers.savings_balances) tmp
GROUP BY customer_id
ORDER BY customer_id;
Output.
customer_id | avg
-------------+-----------------------
1 | 2060.2766666666666667
2 | 25334.023333333333
3 | 2115.4633333333333333
4 | 7523.6366666666666667
5 | 6288.1833333333333333
(5 rows)
Conclusion
This guide shows you how to work with the array data type on a managed PostgreSQL server. Implement the samples in this guide when working with tightly-coupled row data on your next PostgreSQL project.
Follow the links below to test more managed PostgreSQL database samples: