
Overview
DuckDB is a versatile analytical database that excels in handling large datasets efficiently. From a geospatial (GIS) and Parquet perspective, DuckDB stands out for its support of spatial data and its ability to seamlessly integrate with Parquet, a columnar storage file format. In this introduction, we’ll explore DuckDB’s features and capabilities in the context of GIS and Parquet data.
Following parquet dataset is used for this Tutorial
Parquet Dataset
In this tutorial, we will setup DuckDB and install spatial extensions for GeoSpatial Analysis. The tutorial is divided into 4 steps:
- Installing DuckDB
- Installing parquet and spatial extensions
- Loading parquet and spatial extensions
- Reading geo-parquet file directly in DuckDB terminal from directory.
1. Installing DuckDB:
- Visit https://duckdb.org/docs/installation/ link and download the zip installer for your system. It will automatically detect based on your Operating System.

- Once downloaded, extract it and run it. In my case, I have POP-OS linux system. Go to the extracted directory and open the terminal/command prompt in that path. And run the duckdb using ./duckdb command if it linux system. In case of window, you can directly open it. This will start duckDB session and you can write any valid duckdb queries

2. Installing parquet and spatial extensions
To install parquet and spatial extension, write install spatial; install parquet; command in duckdb terminal. It will install those extensions.
Parquet extension is required if you want load parquet dataset from directory and spatial extension is required to give geospatial power to duckDB which can read geoparquet files and provides various spatial functions to perform spatial operation
3. Loading parquet and spatial extensions
To load parquet and spatial extension, execute load spatial; load parquet; command in duckDB terminal. It will load the required extensions.
4. Reading geo-parquet file directly in DuckDB terminal from directory.
Use command as given to read parquet file CREATE TABLE biketrip AS SELECT * FROM ‘path to parquet file’; The Geo-Parquet file stores the geometry as WKB(Web Known Binary). This command will read parquet file biketrip table and you can use further to run duckdb queries. This command will create a temporary table biketrip that you can use later for executing queries
To execute spatial query, first you have to parse the geometry from WKB format to geometry. You have to use St_GeomFromWKB(geom column) function to parse it. 
Now, you are set up to perform geospatial analysis using DuckDB with Parquet and spatial extensions. Feel free to explore and analyze your spatial data efficiently!
I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an email at contact@spatial-dev.guru.

Pingback: Geospatial Analysis using DuckDB - Spatial Dev Guru