How to Calculate Distances with PostgreSQL

Updated on January 3, 2023
How to Calculate Distances with PostgreSQL header image

Introduction

Many modern applications need to calculate distances - between two people, between a business and a person, and so on. Locations are typically represented as latitude-longitude pairs (lat/long coordinates). To do calculations involving distances, these are some approaches you can take (and their consequences):

  • Get the location coordinates from the database, and use an application layer function to compute the distance. This approach is easy to set up but makes it difficult to run many kinds of distance-based queries. Consider an example where you want to find restaurants within a 5-kilometer radius of an individual's current position. To do this, the application fetches the list of all restaurants and then finds the ones within the radius.
  • Write a PL/SQL function to compute the distance within the database. Because the data is processed at the database layer, this approach is operationally efficient. But setting up a PL/SQL function is time-consuming and error-prone. The next two approaches offer the same level of efficiency at a lower setup cost.
  • Use a module like PostGIS. PostGIS is a fully featured Geographic Information Systems (GIS) module. Its use is overkill when you only need to calculate distances.
  • Use the earthdistance module built into PostgreSQL. This module provides an accessible approach to calculating distances, and it suffices for most use cases. It also makes it possible to write distance-based queries.

Prerequisites

This is a hands-on guide. To benefit from it, you are expected to have prior practical experience with PostgreSQL. You are also expected to already have PostgreSQL running either on a standalone server or as an instance on Vultr Managed Databases for PostgreSQL.

It is assumed you have access to the PostgreSQL command line and you are comfortable issuing SQL commands. It is helpful to know how Common Table Expressions (CTEs) work in PostgreSQL - the examples use CTEs.

Extensions are available only in the database where they are loaded. Before starting, check that you are connected to the database in which you want to use the module. To connect to a specific database, use:

=# \c DATABASE_NAME

Note that =# refers to the PostgreSQL psql prompt.

By default, a standalone PostgreSQL installation connects to the database postgres as the user postgres. On Vultr Managed Databases for PostgreSQL, the default database is defaultdb, and the default user is vultradmin. Following this guide without explicitly connecting to a specific database will load the extension(s) into the default database. This is not a problem, only something to be aware of.

Compatibility

The instructions in this guide are tested on PostgreSQL version 14.5 running on Ubuntu 22.04 and Vultr Managed Databases for PostgreSQL. They should be compatible with all recent versions of PostgreSQL on all recent Operating Systems.

How to Install

The earthdistance module needs the cube module as a dependency. cube is also a built-in extension; you do not need to install it. Load the cube module as an extension:

=# CREATE EXTENSION cube ; 

Now load the earthdistance extension:

=# CREATE EXTENSION earthdistance ;

Check that both extensions are loaded and ready to use:

=# SELECT * FROM pg_extension ;

The list of extensions should include cube and earthdistance.

Create Test Data

Get Location Data

A typical consumer-facing web or mobile application gets the location data of the user in different ways:

  • The device, via its GPS sensors, gets the lat/long coordinates. A service like the Location API from Google Play Services or Apple's Core Location API handles this within the app. The app then shares the coordinates with the server.
  • The user enters an address. A service like Google's Geocoding API takes in the address and outputs its geographical coordinates.
  • The user application uses a service like Google's Geolocation API to get the coordinates.

For the examples in this section, get the locations using either Google Maps or a service like Itilog. On Google Maps, right-click any location to see its coordinates (latitude and longitude) in the pop-up context menu. Left-click the shown coordinates to copy them to the clipboard. On Itilog, you can type out a specific address or click a location on the map to see its coordinates in the box above. Itilog is based on Google APIs.

Create Table

Create a table to store information about places:

CREATE TABLE places (id SERIAL PRIMARY KEY, city VARCHAR (20) UNIQUE, lat FLOAT8, long FLOAT8, UNIQUE (lat, long)) ;

The table has 4 columns - an automatically generated ID field, the name of a city, and its lat/long coordinates. The uniqueness constraints in the query above are not related to the earthdistance module. They are imposed only to sanitize the data in a production database. You can skip the uniqueness constraints for testing.

Create a table with information about people:

CREATE TABLE people (id SERIAL PRIMARY KEY, name VARCHAR (20), lat FLOAT8, long FLOAT8 ) ;

Having two tables allows you to test how to make distance-based queries across different tables.

Insert Test Data

Insert a few rows with the names of places and their lat/long coordinates:

INSERT INTO places (city, lat, long) VALUES ('London', 51.5073219, -0.1276474), ('Edinburgh', 55.9533456, -3.1883749), ('Paris', 48.8588897, 2.320041), ('Amsterdam', 52.3727598, 4.8936041), ('Washington', 38.8950368, -77.0365427), ('Delhi', '28.6138954', '77.2090057'), ('Sydney', -33.8698439, 151.2082848), ('Null Island', 0, 0), ('North Pole', 90, 0), ('South Pole', -90, 0) ;

Insert a few rows in the people table, corresponding to people living in different places:

INSERT INTO people (name, lat, long) VALUES ('Londoner', 51.5073219, -0.1276474), ('Edinburgher', 55.9533456, -3.1883749), ('Parisian', 48.8588897, 2.320041), ('Amsterdammer', 52.3727598, 4.8936041), ('Washingtonian', 38.8950368, -77.0365427), ('Delhite', '28.6138954', '77.2090057'), ('Sydneysider', -33.8698439, 151.2082848) ;

How It Works

The earthdistance module

The earthdistance module is packaged as a PostgreSQL extension. It assumes the earth to be a perfect sphere. It represents points on the globe in a three-dimensional grid system. The origin of this 3D grid is at the center of the earth. Each point on the 3D grid is represented as a set of (x, y, z) coordinates. The X-Y plane contains the equator, and the Z-axis joins the poles. This 3D grid system is handled by the cube module.

Distance between points is measured along the surface of the earth - this is the great circle distance. Geographically, locations are represented by latitude and longitude pairs. Latitudes and longitudes are stored as double-precision floating point numbers (float8).

Coordinates to Cubes

Points on the earth are naturally represented as lat/long coordinates. But the earthdistance module represents points in a 3D grid. The distance-calculating functions of the module are also based on points in a 3D system. To convert the location of a point from the lat/long coordinate system to the 3D grid system, use the ll_to_earth function.

SELECT 
city, lat, long, 
    ll_to_earth(lat, long) AS earth_xyz
FROM places ;

The above query shows the latitude and longitude of each place, and its location in the 3D grid system (as computed by the ll_to_earth function). The 3D grid location is represented by a set of (x, y, z) numbers. Observe the 3D grid coordinates of Null Island, the North Pole, and the South Pole. They correspond to (R, 0, 0), (0, 0, R), and (0, 0, -R), respectively, where R is the radius of the earth. Note that a zero coordinate in the 3D grid might not be represented as exactly 0 but as a very small number of the order of 10 to the power of -10.

Caveat

In reality, the earth is not a perfect sphere. It is slightly flattened at the poles, and it bulges out at the equator. The earthdistance module neglects this and assumes the earth to be a perfect sphere. If your application needs highly accurate distances that take into account the earth's actual curvature (for example, if you want to calculate airline routes over the poles), use the PostGIS module. For most casual applications, the builtin earthdistance module is sufficient.

Use the earth() function to see the radius of the earth (as per the module's models) in meters:

SELECT earth();

Distance-based Queries on the Same Table

Distances are computed using the earth_distance function. This function takes in as input two points (represented in the 3D grid system) on the surface of the earth. Therefore, the lat/long coordinates are first converted to (x, y, z) coordinates using the ll_to_earth function.

Distance Between Two Places

In the following CTE, the first and second sub-queries get the details (lat/long coordinates and name) of two cities.

WITH 
place1 AS (
    SELECT lat, long, city
    FROM places 
    WHERE city = 'Delhi'
),
place2 AS (
    SELECT lat, long, city
    FROM places
    WHERE city = 'Paris'
)
SELECT
    place1.city AS place1,
    place2.city AS place2,

    earth_distance(
        ll_to_earth(place1.lat, place1.long),
        ll_to_earth(place2.lat, place2.long)
    ) AS distance_meters

FROM place1, place2 ;

The above query returns the distance between Delhi and Paris in meters. Notice that the calculated distance is a floating point number.

Conversion of Units

Round off the distance by casting it to the INTEGER data type. Convert it to kilometers by dividing it by 1000. To convert to miles, divide by 1609.344.

WITH 
place1 AS (
    SELECT lat, long, city
    FROM places 
    WHERE city = 'Delhi'
),
place2 AS (
    SELECT lat, long, city
    FROM places
    WHERE city = 'Paris'
)
SELECT
    place1.city AS place1,
    place2.city AS place2,

    (earth_distance(
        ll_to_earth(place1.lat, place1.long),
        ll_to_earth(place2.lat, place2.long)
    -- typecasting and converting to miles
    )/1609.344)::integer AS distance_miles

FROM place1, place2 ;

The above query shows the Delhi-Paris distance in miles. Pay attention to the order of operations and the parentheses - divide the distance in meters by 1609.344, then cast the result to INTEGER.

Places within a Radius

To find all places within a certain radius of another place, add a distance-based WHERE condition to the SELECT clause:

WITH 
place1 AS (
    SELECT lat, long, city
    FROM places 
    WHERE city = 'London'
),
place2 AS (
    SELECT lat, long, city
    FROM places
    WHERE city <> 'London'
)
SELECT
    place1.city AS place1,
    place2.city AS place2,

    earth_distance(
        ll_to_earth(place1.lat, place1.long),
        ll_to_earth(place2.lat, place2.long)
    )::integer/1000 AS distance_kilometers

FROM place1, place2
WHERE
    earth_distance(
        ll_to_earth(place1.lat, place1.long),
        ll_to_earth(place2.lat, place2.long)
    )/1000 < 1000
ORDER BY distance_kilometers ;

This query finds all places within 1000 kilometers of London. Note the inequality condition in the second sub-query.

Distance-based Queries on Different Tables

The last section showed how to run distance-based queries on the same table. In many cases, you need to run the query across multiple tables. For example, if you have one table for places and another for people, and you want to find all people within some radius of a place.

All People within a Radius

The overall structure of this query is similar to earlier ones. One of the sub-queries is used to fetch details from the people table. The other sub-query gets information from the places table. The query below finds all people within a 1000-kilometer radius of London.

WITH
place AS (
    SELECT lat, long, city
    FROM places 
    WHERE city = 'London'
),
person AS (
    SELECT lat, long, name
    FROM people
)
SELECT
    place.city AS place,
    person.name AS person,
    earth_distance(
        ll_to_earth(place.lat, place.long),
        ll_to_earth(person.lat, person.long)
    )::integer/1000 AS distance_kilometers

FROM place, person
WHERE 
    earth_distance(
        ll_to_earth(place.lat, place.long),
        ll_to_earth(person.lat, person.long)
    )::integer/1000 < 1000
ORDER BY distance_kilometers ;

Conclusion

This article discusses how to use the earthdistance module of PostgreSQL to calculate distances. The official documentation has more information about the module and some additional functions. For more advanced geographical functions, use the PostGIS module. In practice, you parametrize the queries to receive inputs from the application programming language. For example, when using Python, you might use either an ORM like SQLAlchemy or a driver like Psycopg to interface with the database.