If you need to to add a spatial information querying in your application, PostGIS is one of the most popular solution. Simply because it is well-proven, open source, and running on top of everyone’s favorite open source SQL database — PostgreSQL.
PostGIS is a spatial database extender for PostgreSQL object-relational database.
postgis.net
In this post, I will share a short example of a Python console application that load hospitals locations (214 entries from OSM) from PostGIS-enabled PostgreSQL database. We are going to get the sample data from the awesome OpenStreetMap through https://overpass-turbo.eu/
The prerequisites of these tutorial are:
- Ubuntu 20.04 LTS (adjust the install commands if you’re using other version/distro)
- Python 3.8 (but it seems 3.x will be fine)
- Psycopg2 (
pip install psycopg2
) (you may need to install some dependencies) - PostgresSQL 12 (
sudo apt install postgresql
) - PostGIS 3 (
sudo apt install postgis
)
Getting the Data
This time we will use hospitals data in Jakarta, Indonesia (my hometown!). Particularly, we want hospitals that have name
and capacity
attributes. To do that, simply open https://overpass-turbo.eu/ and copy-paste OverpassQL query below and hit Run button. Finally click Export > Data > download/copy as GeoJSON. A file named export.geojson will be downloaded.
[out:json][timeout:25]; ( way["building"="hospital"]["name"]["capacity:persons"](-6.449729258880717,106.57699584960938,-6.079107721736656,107.0624542236328); ); out center;
Load Data to PostGIS
To store coordinates/points in PostgreSQL, we need to create a column with geometry data type. This is a SQL script to create table for our needs:
CREATE TABLE locations ( id varchar(255) NOT NULL, name varchar(255), capacity varchar(255), addr varchar(255), phone varchar(100), opening_hours varchar(100), website varchar(100), coords geometry, CONSTRAINT locations_pkey PRIMARY KEY (id) )
Additionally, we want to create index to speed up search:
CREATE INDEX locations_coords_index ON locations USING GIST(coords)
Now, we can write a simple Python script and load the data from export.geojson
:
import configparser import psycopg2 import json import sys from io import StringIO """ overpass_to_postgis.ini example: [default] host = localhost user = postgres password = postgres database = respondor overpass_file_path = /mnt/data/Downloads/export.geojson """ config = configparser.ConfigParser() config.read('overpass_to_postgis.ini') class get_db_connection: def __init__(self, dbconf: dict): self.host = dbconf['host'] self.database = dbconf['database'] self.user = dbconf['user'] self.password = dbconf['password'] def __enter__(self): self.conn = psycopg2.connect( host=self.host, database=self.database, user=self.user, password=self.password) self.cur = self.conn.cursor() return self.cur def __exit__(self, type, value, traceback): self.conn.commit() self.cur.close() self.conn.close() def safe_get(map, key): val = None if key in map: val = map[key] return str(val) # setup database default_config = config['default'] table_name = default_config['table'] with get_db_connection(default_config) as cur: # read json overpass s = StringIO() with open(default_config['overpass_file_path']) as f: data = json.load(f) for feat in data['features']: coord_str = ' '.join(map(str, feat['geometry']['coordinates'])) props = feat['properties'] values = [ safe_get(props, '@id'), safe_get(props, 'name:en'), safe_get(props, 'capacity:persons'), safe_get(props, 'addr:full'), safe_get(props, 'phone'), safe_get(props, 'opening_hours'), safe_get(props, 'website'), f'POINT({coord_str})', ] s.write('\t'.join(values)+'\n') s.seek(0) # setup database cur.execute(f'DROP TABLE IF EXISTS {table_name}') cur.execute(f""" CREATE TABLE {table_name} ( id varchar(255) NOT NULL, name varchar(255), capacity varchar(255), addr varchar(255), phone varchar(100), opening_hours varchar(100), website varchar(100), coords geometry, CONSTRAINT {table_name}_pkey PRIMARY KEY (id) ) """ ) cur.execute(f'CREATE INDEX {table_name}_coords_index ON {table_name} USING GIST(coords)') cur.copy_from(s, table_name, null='None')
Query the Data
Since we have our data ready, we can run interesting queries using the geometry column.
For example, we can query distance (in meter) of the hospitals from certain location (longitude=106.8249641, latitude=-6.1753871)
by using ST_DistanceSphere() function from PostGIS:
SELECT id, name, ST_DistanceSphere(coords, 'POINT(106.8249641 -6.1753871)') as distance_meter, concat(st_y(coords), ', ', st_x(coords)) as lat_long FROM locations ORDER BY distance_meter ASC LIMIT 10
The result we get will be something like this (in CSV):
"id","name","distance_meter","lat_long" "way/496643326","Rumah Sakit THT Bedah Prof. Nizar",1150.93433842,"-6.1712414, 106.8154247" "way/492535915","RSUD Tarakan",1669.53197806,"-6.1716313, 106.8103422" "way/485813973","Rumah Sakit Abdi Waluyo",1676.06387318,"-6.1898132, 106.8293587" "way/485813873","Rumah Sakit YPK Mandiri",1799.67005592,"-6.1913149, 106.8278537" "way/485992308","Rumah Sakit Menteng Mitra Afia",1958.22172341,"-6.1868538, 106.8384083" "way/492945012","Rumah Sakit Umum Kecamatan Tanah Abang",2090.46726163,"-6.1909239, 106.8143169" "way/156919546","Rumah Sakit Ibu dan Anak Aries",2575.8240355,"-6.1576321, 106.8099989" "way/121523126","Raden Saleh Medical Center",2730.37582545,"-6.1892694, 106.8453365" "way/482346485","Rumah Sakit Mata JEC",2895.65675616,"-6.1986006, 106.8368349" "way/496038045","Gedung Adminstrasi dan Pendidikan RSAB Harapan Kita",2924.49581687,"-6.1840605, 106.7999897"
But if we don’t really care about the actual value of the distance, this approach is not the most efficient as indicated in the document:
The trouble with this approach is that it forces the database to calculate the distance between the query geometry and every feature in the table of candidate features, then sort them all. For a large table of candidate features, it is not a reasonable approach. One way to improve performance is to add an index constraint to the search
https://postgis.net/workshops/postgis-intro/knn.html
One of the approach that we can use to find the nearest location efficiently is by using <->
operator (distance between box centers). This method utilizes the index that we created earlier:
SELECT id, name, concat(st_y(coords), ', ', st_x(coords)) as lat_long FROM locations ORDER BY coords <-> 'POINT(106.8249641 -6.1753871)'::geometry ASC LIMIT 10;
The results are exactly but supposed to be faster in case of large table. Unfortunately, since our table only has 214 rows, we can’t observe the difference in speed:
"id","name","lat_long" "way/496643326","Rumah Sakit THT Bedah Prof. Nizar","-6.1712414, 106.8154247" "way/485813973","Rumah Sakit Abdi Waluyo","-6.1898132, 106.8293587" "way/492535915","RSUD Tarakan","-6.1716313, 106.8103422" "way/485813873","Rumah Sakit YPK Mandiri","-6.1913149, 106.8278537" "way/485992308","Rumah Sakit Menteng Mitra Afia","-6.1868538, 106.8384083" "way/492945012","Rumah Sakit Umum Kecamatan Tanah Abang","-6.1909239, 106.8143169" "way/156919546","Rumah Sakit Ibu dan Anak Aries","-6.1576321, 106.8099989" "way/121523126","Raden Saleh Medical Center","-6.1892694, 106.8453365" "way/482346485","Rumah Sakit Mata JEC","-6.1986006, 106.8368349" "way/496038045","Gedung Adminstrasi dan Pendidikan RSAB Harapan Kita","-6.1840605, 106.7999897"
It’s fun, right? ?
The features above are only the tip of the iceberg of PostGIS. There are a lot of other awesome features that can be explored. But hopefully this post can be a useful introduction to PostGIS. Enjoy!