Setting up DuckDB for Geospatial Analysis

EmailTwitterLinkedInFacebookWhatsAppShare
Setting up DuckDB for Geospatial Analysis

Overview

DuckDB is a versatile analytical database that excels in handling large datasets efficiently. From a geospatial (GIS) and Parquet perspective, DuckDB stands out for its support of spatial data and its ability to seamlessly integrate with Parquet, a columnar storage file format. In this introduction, we’ll explore DuckDB’s features and capabilities in the context of GIS and Parquet data.

Following parquet dataset is used for this Tutorial
Parquet Dataset

In this tutorial, we will setup DuckDB and install spatial extensions for GeoSpatial Analysis. The tutorial is divided into 4 steps:

  1. Installing DuckDB
  2. Installing parquet and spatial extensions
  3. Loading parquet and spatial extensions
  4. Reading geo-parquet file directly in DuckDB terminal from directory.

1. Installing DuckDB:

  1. Visit https://duckdb.org/docs/installation/ link and download the zip installer for your system. It will automatically detect based on your Operating System.
  2. Once downloaded, extract it and run it. In my case, I have POP-OS linux system. Go to the extracted directory and open the terminal/command prompt in that path. And run the duckdb using ./duckdb command if it linux system. In case of window, you can directly open it. This will start duckDB session and you can write any valid duckdb queries

2. Installing parquet and spatial extensions

To install parquet and spatial extension, write install spatial; install parquet; command in duckdb terminal. It will install those extensions.
Parquet extension is required if you want load parquet dataset from directory and spatial extension is required to give geospatial power to duckDB which can read geoparquet files and provides various spatial functions to perform spatial operation

3. Loading parquet and spatial extensions

To load parquet and spatial extension, execute load spatial; load parquet; command in duckDB terminal. It will load the required extensions.

4. Reading geo-parquet file directly in DuckDB terminal from directory.

Use command as given to read parquet file CREATE TABLE biketrip AS SELECT * FROM ‘path to parquet file’; The Geo-Parquet file stores the geometry as WKB(Web Known Binary). This command will read parquet file biketrip table and you can use further to run duckdb queries. This command will create a temporary table biketrip that you can use later for executing queries

To execute spatial query, first you have to parse the geometry from WKB format to geometry. You have to use St_GeomFromWKB(geom column) function to parse it.

Now, you are set up to perform geospatial analysis using DuckDB with Parquet and spatial extensions. Feel free to explore and analyze your spatial data efficiently!

I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an email at contact@spatial-dev.guru.

1 thought on “Setting up DuckDB for Geospatial Analysis”

  1. Pingback: Geospatial Analysis using DuckDB - Spatial Dev Guru

Leave a ReplyCancel reply

Discover more from Spatial Dev Guru

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Spatial Dev Guru

Subscribe now to keep reading and get access to the full archive.

Continue reading