How to Perform Full-Text Search on a PostgreSQL Database
Introduction
Discover how to Perform Full-Text Search on a PostgreSQL Database, enabling efficient retrieval of relevant information from extensive text datasets with advanced indexing and search features.
A PostgreSQL full-text search allows you to find records that match certain keywords with a high degree of accuracy. This query technique is fast because it implements indices, advanced query functions, and rules. In a production environment, you should use full-text search to address limitations posed by pattern matching search expressions.
For instance, in an e-commerce website, a customer searches for Party Dress. A pattern matching search algorithm can't return records with partial matches like Party Ready Dress. Therefore, pattern matches are inaccurate and slow because they don't implement indices. PostgreSQL full-text search addresses these problems. It provides flexibility, accuracy, and speed to find records and rank results according to relevance.
To accomplishment this, PostgreSQL uses the to_tsvector() and the to_tsquery() functions. This guide walks you through the process of implementing a full-text search in your PostgreSQL database.
Prerequisites
To follow along with this guide, you require:
1. Set Up a Sample Database
Follow the steps below to set up a sample database, create a table, and insert some test records.
SSH to your Linux server as a non-root sudo user.
Log in to the PostgreSQL server as the postgres user.
$ sudo -u postgres psql
Enter the password and press Enter to proceed.
Create a sample e_commerce database.
postgres=# CREATE DATABASE e_commerce;
Output.
CREATE DATABASE
Switch to the new e_commerce database.
postgres=# \connect e_commerce;
Output.
You are now connected to database "e_commerce" as user "postgres".
Create a products table. This table stores the products' details including the unique product_id, product_name, and retail_price. Use the SERIAL keyword on the PRIMARY KEY column to allow the PostgreSQL server to assign new product_ids automatically for new records.
e_commerce=# CREATE TABLE products ( product_id SERIAL PRIMARY KEY, product_name VARCHAR (50), retail_price NUMERIC(5,2) );
Output.
CREATE TABLE
Insert sample data into the products table.
e_commerce=# INSERT INTO products(product_name, retail_price) VALUES ('BUTTERFLY DECORATIONS', 35.20); INSERT INTO products(product_name, retail_price) VALUES ('DELL WIRELESS MOUSE', 25.50); INSERT INTO products(product_name, retail_price) VALUES ('RIB MEAT', 4.89); INSERT INTO products(product_name, retail_price) VALUES ('LOGITECH WIRELESS MOUSE', 16.50); INSERT INTO products(product_name, retail_price) VALUES ('WIRELESS USB DONGLE FOR DELL', 4.50); INSERT INTO products(product_name, retail_price) VALUES ('FRUITWORLD WATERMELON', 3.25); INSERT INTO products(product_name, retail_price) VALUES ('HP WIRELESS MOUSE', 25.50); INSERT INTO products(product_name, retail_price) VALUES ('RIBENA JUICE', 7.70); INSERT INTO products(product_name, retail_price) VALUES ('DELL WIRELESS KEYBOARD', 25.50);
Output.
... INSERT 0 1
Ensure the sample data is in place by running a SELECT statement against the products table.
e_commerce=# SELECT product_id, product_name, retail_price FROM products;
Output.
product_id | product_name | retail_price ------------+------------------------------+-------------- 1 | BUTTERFLY DECORATIONS | 35.20 2 | DELL WIRELESS MOUSE | 25.50 3 | RIB MEAT | 4.89 4 | LOGITECH WIRELESS MOUSE | 16.50 5 | WIRELESS USB DONGLE FOR DELL | 4.50 6 | FRUITWORLD WATERMELON | 3.25 7 | HP WIRELESS MOUSE | 25.50 8 | RIBENA JUICE | 7.70 9 | DELL WIRELESS KEYBOARD | 25.50 (9 rows)
2. The Basic Syntax of to_tsvector() and to_tsquery() Functions
The power of the PostgreSQL full-text searches lies in the to_tsvector() and the to_tsquery() functions. Here is a summary of how each function works.
2.1. The to_tsvector() Function
This function calls an internal PostgreSQL parser to break down a document or column values into individual words(tokens). The function further reduces the tokens to lexemes(base words) so that a word like coming converts to come and mangoes becomes mango. In the end, the to_tsvector() function returns a tsvector list containing all base words and their positions with stop words like a and the stripped.
Run the to_tsvector() function against the phrase, "The big boys are coming to eat the mangoes after their fathers accepted their apologies" to see how it works.
e_commerce=# SELECT to_tsvector('The big boys are coming to eat the mangoes after their fathers accepted their apologies');
The above to_tsvector() command outputs the tsvector list below. All words are now stemmed to their base terms and stop words like the, are, to, and after are now excluded.
'accept':13 'apolog':15 'big':2 'boy':3 'come':5 'eat':7 'father':12 'mango':9 (1 row)
2.2. The to_tsquery() Function
The to_tsquery() function converts a keyword or keyphrase to normalized tokens. In other words, the to_tsquery() function compiles a search term into a structure that the PostgreSQL server can understand when locating documents/records in a tsvector list.
With a basic knowledge of how the two full-text search functions work, you can now test them in your database.
3. Implement the PostgreSQL Full-text Search Functions
When performing a full-text searches on a PostgreSQL database, you must implement the to_tsvector() and to_tsquery() functions together with a match operator @@.
The match operator returns a boolean value (either t or f) when you run a keyword search (tsquery) against a tsvector document.
Run a regular search using the keyword rib against your products table before implementing the full-text functions.
e_commerce=# SELECT * FROM products WHERE product_name like '%RIB%';
The regular search output below shows two records: the RIB MEAT and RIBENA JUICE. However, the above search method is not intelligent enough to know that RIB is unrelated to RIBENA JUICE.
product_id | product_name | retail_price ------------+--------------+-------------- 3 | RIB MEAT | 4.89 8 | RIBENA JUICE | 7.70 (2 rows)
Perform the same search again using the to_tsvector and to_tsquery() functions.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('RIB');
From the output below, it's clear that the full-text search is intelligent enough to know that RIBENA JUICE is not related to RIB and, therefore, returns only one record. That is the RIB MEAT.
product_id | product_name | retail_price ------------+--------------+-------------- 3 | RIB MEAT | 4.89 (1 row)
Perform another full-text search using the MOUSE keyword.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('MOUSE');
The above query returns three records, as shown below.
product_id | product_name | retail_price ------------+-------------------------+-------------- 2 | DELL WIRELESS MOUSE | 25.50 4 | LOGITECH WIRELESS MOUSE | 16.50 7 | HP WIRELESS MOUSE | 25.50 (3 rows)
4. Use the PostgreSQL Full-text Search Operators
To offer more flexibility when searching large recordsets, the PostgreSQL database server supports different search operators as detailed below.
4.1. The & Operator
Use the AND/& operator to return only the records/documents containing all the & separated list of keywords.
Run the query below to return all DELL WIRELESS products.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('DELL & WIRELESS');
The query above returns the following output.
product_id | product_name | retail_price ------------+------------------------------+-------------- 2 | DELL WIRELESS MOUSE | 25.50 5 | WIRELESS USB DONGLE FOR DELL | 4.50 9 | DELL WIRELESS KEYBOARD | 25.50 (3 rows)
4.2. The | Operator
Use the OR/| operator to return records matching one or more search keywords.
Execute the following query to return all products from the LOGITECH and HP brands.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('LOGITECH | HP');
The query above returns the following result.
product_id | product_name | retail_price ------------+-------------------------+-------------- 4 | LOGITECH WIRELESS MOUSE | 16.50 7 | HP WIRELESS MOUSE | 25.50 (2 rows)
4.3. The ! Operator
Use the NOT/! operator to exclude records containing a particular keyword.
Run the following statement to exclude all DELL products from the search result.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('!DELL');
The output below displays all products without the DELL products.
product_id | product_name | retail_price ------------+-------------------------+-------------- 1 | BUTTERFLY DECORATIONS | 35.20 3 | RIB MEAT | 4.89 4 | LOGITECH WIRELESS MOUSE | 16.50 6 | FRUITWORLD WATERMELON | 3.25 7 | HP WIRELESS MOUSE | 25.50 8 | RIBENA JUICE | 7.70 (6 rows)
4.4. The '' Operator
Use the double quotation mark '' operator to return records that match the exact search term.
Run the command below to search for the phrase DELL WIRELESS KEYBOARD.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('''DELL WIRELESS KEYBOARD''');
From the output below, only one record exactly matches the DELL WIRELESS KEYBOARD keyword.
product_id | product_name | retail_price ------------+------------------------+-------------- 9 | DELL WIRELESS KEYBOARD | 25.50 (1 row)
4.5. Combine Multiple Full-text Search Operators
You may combine multiple search operators in a single query to perform more complex full-text queries.
Run the query below to return the WIRELESS MOUSE products from the DELL or HP companies only.
e_commerce=# SELECT * FROM products WHERE to_tsvector(product_name) @@ to_tsquery('(DELL | HP) & (WIRELESS & MOUSE)');
The query returns the two products below.
product_id | product_name | retail_price ------------+---------------------+-------------- 2 | DELL WIRELESS MOUSE | 25.50 7 | HP WIRELESS MOUSE | 25.50 (2 rows)
5. Speed Up Full-text Searches with Pre-Computed Vectors
In a PostgreSQL database, the full-text functions work fast because the tsvector data type implements indices. The only difficult task that the database server does during the search is computing the vectors. To speed up operations when working with large datasets, a good practice is to pre-compute the vectors and store them alongside your records in a separate column.
With this approach, the database server computes the vectors only once unless you update the existing records. This speeds up the full-text search operations by a large margin. Follow the steps below to add pre-computed vectors into your products table.
Add a TSVECTOR data type column to the products table and name it product_name_tokens.
e_commerce=# ALTER TABLE products ADD COLUMN product_name_tokens TSVECTOR;
Output.
ALTER TABLE
Update the records in the products table to include new vector information by converting the product names using the statement to_tsvector(p.product_name).
e_commerce=# UPDATE products p SET product_name_tokens = to_tsvector(p.product_name) FROM products p2;
Output.
UPDATE 9
Query the products table to make sure you've computed the vectors successfully.
e_commerce=# SELECT product_id, product_name, retail_price, product_name_tokens FROM products;
The output below shows the records together with the pre-computed vectors. Press Q to proceed.
product_id | product_name | retail_price | product_name_tokens ------------+------------------------------+--------------+----------------------------------------- 1 | BUTTERFLY DECORATIONS | 35.20 | 'butterfli':1 'decor':2 2 | DELL WIRELESS MOUSE | 25.50 | 'dell':1 'mous':3 'wireless':2 3 | RIB MEAT | 4.89 | 'meat':2 'rib':1 4 | LOGITECH WIRELESS MOUSE | 16.50 | 'logitech':1 'mous':3 'wireless':2 5 | WIRELESS USB DONGLE FOR DELL | 4.50 | 'dell':5 'dongl':3 'usb':2 'wireless':1 6 | FRUITWORLD WATERMELON | 3.25 | 'fruitworld':1 'watermelon':2 7 | HP WIRELESS MOUSE | 25.50 | 'hp':1 'mous':3 'wireless':2 8 | RIBENA JUICE | 7.70 | 'juic':2 'ribena':1 9 | DELL WIRELESS KEYBOARD | 25.50 | 'dell':1 'keyboard':3 'wireless':2 (9 rows)
Run a query against the TSVECTOR data type column (product_name_tokens) to search for the keyword RIBENA.
e_commerce=# SELECT product_id, product_name, retail_price FROM products WHERE product_name_tokens @@ to_tsquery('RIBENA');
As you can see from the output below, your query works as expected without the need to compute the vectors during the search operation.
product_id | product_name | retail_price ------------+--------------+-------------- 8 | RIBENA JUICE | 7.70 (1 row)
Conclusion
In this guide, you've used the PostgreSQL inbuilt functions to implement full-text searches in a sample database. You're also able to use basic full-text search operators like &, |, and !. Towards the end, you've learned how to save time by pre-computing vectors when working with large datasets. Use the knowledge in this guide to power your next PostgreSQL search engine project.