Software Development

Fast geolocation query with PostGIS

If you need to to add a spatial information querying in your application, PostGIS is one of the most popular solution. Simply because it is well-proven, open source, and running on top of everyone’s favorite open source SQL database — PostgreSQL.

PostGIS is a spatial database extender for PostgreSQL object-relational database.

postgis.net

In this post, I will share a short example of a Python console application that load hospitals locations (214 entries from OSM) from PostGIS-enabled PostgreSQL database. We are going to get the sample data from the awesome OpenStreetMap through https://overpass-turbo.eu/

The prerequisites of these tutorial are:

  • Ubuntu 20.04 LTS (adjust the install commands if you’re using other version/distro)
  • Python 3.8 (but it seems 3.x will be fine)
  • Psycopg2 (pip install psycopg2) (you may need to install some dependencies)
  • PostgresSQL 12 (sudo apt install postgresql)
  • PostGIS 3 (sudo apt install postgis)

Getting the Data

This time we will use hospitals data in Jakarta, Indonesia (my hometown!). Particularly, we want hospitals that have name and capacity attributes. To do that, simply open https://overpass-turbo.eu/ and copy-paste OverpassQL query below and hit Run button. Finally click Export > Data > download/copy as GeoJSON. A file named export.geojson will be downloaded.

[out:json][timeout:25];
(
  way["building"="hospital"]["name"]["capacity:persons"](-6.449729258880717,106.57699584960938,-6.079107721736656,107.0624542236328);
);
out center;
Result visualization in overpass-turbo.eu

Load Data to PostGIS

To store coordinates/points in PostgreSQL, we need to create a column with geometry data type. This is a SQL script to create table for our needs:

CREATE TABLE locations (
    id varchar(255) NOT NULL,
    name varchar(255),
    capacity varchar(255),
    addr varchar(255),
    phone varchar(100),
    opening_hours varchar(100),
    website varchar(100),
    coords geometry,
    CONSTRAINT locations_pkey PRIMARY KEY (id)
)

Additionally, we want to create index to speed up search:

CREATE INDEX locations_coords_index ON locations USING GIST(coords)

Now, we can write a simple Python script and load the data from export.geojson:

import configparser
import psycopg2
import json
import sys
from io import StringIO

"""
overpass_to_postgis.ini example:

[default]
host = localhost
user = postgres
password = postgres
database = respondor
overpass_file_path = /mnt/data/Downloads/export.geojson
"""

config = configparser.ConfigParser()
config.read('overpass_to_postgis.ini')

class get_db_connection:
    def __init__(self, dbconf: dict):
        self.host = dbconf['host']
        self.database = dbconf['database']
        self.user = dbconf['user']
        self.password = dbconf['password']

    def __enter__(self):
        self.conn = psycopg2.connect(
            host=self.host,
            database=self.database,
            user=self.user,
            password=self.password)
        self.cur = self.conn.cursor()
        return self.cur

    def __exit__(self, type, value, traceback):
        self.conn.commit()
        self.cur.close()
        self.conn.close()

def safe_get(map, key):
    val = None
    if key in map:
        val = map[key]
    return str(val)

# setup database
default_config = config['default']
table_name = default_config['table']
with get_db_connection(default_config) as cur:
    # read json overpass
    s = StringIO()
    with open(default_config['overpass_file_path']) as f:
        data = json.load(f)
        for feat in data['features']:
            coord_str = ' '.join(map(str, feat['geometry']['coordinates']))
            props = feat['properties']
            values = [
                safe_get(props, '@id'),
                safe_get(props, 'name:en'),
                safe_get(props, 'capacity:persons'),
                safe_get(props, 'addr:full'),
                safe_get(props, 'phone'),
                safe_get(props, 'opening_hours'),
                safe_get(props, 'website'),
                f'POINT({coord_str})',
            ]
            s.write('\t'.join(values)+'\n')
    s.seek(0)
    
    # setup database
    cur.execute(f'DROP TABLE IF EXISTS {table_name}')
    cur.execute(f"""
    CREATE TABLE {table_name} (
        id varchar(255) NOT NULL,
        name varchar(255),
        capacity varchar(255),
        addr varchar(255),
        phone varchar(100),
        opening_hours varchar(100),
        website varchar(100),
        coords geometry,
        CONSTRAINT {table_name}_pkey PRIMARY KEY (id)
    )
    """
    )
    cur.execute(f'CREATE INDEX {table_name}_coords_index ON {table_name} USING GIST(coords)')
    cur.copy_from(s, table_name, null='None')

Query the Data

Since we have our data ready, we can run interesting queries using the geometry column.

For example, we can query distance (in meter) of the hospitals from certain location (longitude=106.8249641, latitude=-6.1753871) by using ST_DistanceSphere() function from PostGIS:

SELECT 
	id, 
	name, 
	ST_DistanceSphere(coords, 'POINT(106.8249641 -6.1753871)') as distance_meter, 
	concat(st_y(coords), ', ', st_x(coords)) as lat_long
FROM locations 
ORDER BY distance_meter ASC
LIMIT 10

The result we get will be something like this (in CSV):

"id","name","distance_meter","lat_long"
"way/496643326","Rumah Sakit THT Bedah Prof. Nizar",1150.93433842,"-6.1712414, 106.8154247"
"way/492535915","RSUD Tarakan",1669.53197806,"-6.1716313, 106.8103422"
"way/485813973","Rumah Sakit Abdi Waluyo",1676.06387318,"-6.1898132, 106.8293587"
"way/485813873","Rumah Sakit YPK Mandiri",1799.67005592,"-6.1913149, 106.8278537"
"way/485992308","Rumah Sakit Menteng Mitra Afia",1958.22172341,"-6.1868538, 106.8384083"
"way/492945012","Rumah Sakit Umum Kecamatan Tanah Abang",2090.46726163,"-6.1909239, 106.8143169"
"way/156919546","Rumah Sakit Ibu dan Anak Aries",2575.8240355,"-6.1576321, 106.8099989"
"way/121523126","Raden Saleh Medical Center",2730.37582545,"-6.1892694, 106.8453365"
"way/482346485","Rumah Sakit Mata JEC",2895.65675616,"-6.1986006, 106.8368349"
"way/496038045","Gedung Adminstrasi dan Pendidikan RSAB Harapan Kita",2924.49581687,"-6.1840605, 106.7999897"

But if we don’t really care about the actual value of the distance, this approach is not the most efficient as indicated in the document:

The trouble with this approach is that it forces the database to calculate the distance between the query geometry and every feature in the table of candidate features, then sort them all. For a large table of candidate features, it is not a reasonable approach. One way to improve performance is to add an index constraint to the search

https://postgis.net/workshops/postgis-intro/knn.html

One of the approach that we can use to find the nearest location efficiently is by using <-> operator (distance between box centers). This method utilizes the index that we created earlier:

SELECT 
	id, 
	name, 
	concat(st_y(coords), ', ', st_x(coords)) as lat_long
FROM locations 
ORDER BY coords <-> 'POINT(106.8249641 -6.1753871)'::geometry ASC LIMIT 10;

The results are exactly but supposed to be faster in case of large table. Unfortunately, since our table only has 214 rows, we can’t observe the difference in speed:

"id","name","lat_long"
"way/496643326","Rumah Sakit THT Bedah Prof. Nizar","-6.1712414, 106.8154247"
"way/485813973","Rumah Sakit Abdi Waluyo","-6.1898132, 106.8293587"
"way/492535915","RSUD Tarakan","-6.1716313, 106.8103422"
"way/485813873","Rumah Sakit YPK Mandiri","-6.1913149, 106.8278537"
"way/485992308","Rumah Sakit Menteng Mitra Afia","-6.1868538, 106.8384083"
"way/492945012","Rumah Sakit Umum Kecamatan Tanah Abang","-6.1909239, 106.8143169"
"way/156919546","Rumah Sakit Ibu dan Anak Aries","-6.1576321, 106.8099989"
"way/121523126","Raden Saleh Medical Center","-6.1892694, 106.8453365"
"way/482346485","Rumah Sakit Mata JEC","-6.1986006, 106.8368349"
"way/496038045","Gedung Adminstrasi dan Pendidikan RSAB Harapan Kita","-6.1840605, 106.7999897"

It’s fun, right? ?

The features above are only the tip of the iceberg of PostGIS. There are a lot of other awesome features that can be explored. But hopefully this post can be a useful introduction to PostGIS. Enjoy!

0 0 votes
Article Rating
yohanes.gultom@gmail.com

Share
Published by
yohanes.gultom@gmail.com

Recent Posts

Get Unverified SSL Certificate Expiry Date with Python

Getting verified SSL information with Python (3.x) is very easy. Code examples for it are…

3 years ago

Spring Data Couchbase 4 Multibuckets in Spring Boot 2

By default, Spring Data Couchbase implements single-bucket configuration. In this default implementation, all POJO (Plain…

4 years ago

Firebase Auth Emulator with Python

Last year, Google released Firebase Auth Emulator as a new component in Firebase Emulator. In…

4 years ago

Google OIDC token generation/validation

One of the authentication protocol that is supported by most of Google Cloud services is…

4 years ago

Auto speech-to-text (Indonesian) with AWS Transcribe and Python

Amazon Web Service Transcribe provides API to automatically convert an audio speech file (mp3/wav) into…

5 years ago

Splitting video with ffmpeg and Python

I had a project to build a simple website that split uploaded video into parts…

5 years ago