Geospatial Analysis using DuckDB

EmailTwitterLinkedInFacebookWhatsAppShare
Geospatial Analysis using DuckDB

This tutorial will guide you through a step-by-step geospatial analysis of bike sharing data using DuckDB.

Prerequisites:

  • DuckDB installed and accessible.
  • A sample dataset containing bike sharing data in Parquet format.
  • Basic understanding of SQL and geospatial concepts.

To setup DuckDB along with spatial extension, please follow this tutorial Setting up DuckDB for GeoSpatial Analysis.

Following parquet dataset is used for this Tutorial
Parquet Dataset

1. Load extensions and parquet data:

The provided code is loading spatial and parquet data and creating a table named biketrip with additional columns for transformed pickup and dropoff points. Here’s an explanation for each part of the code:

1. Load Spatial and Parquet Data

LOAD spatial; LOAD parquet;

These commands are likely loading necessary spatial functions and libraries (spatial) and enabling support for reading Parquet files (parquet). This ensures that DuckDB can handle spatial data and read data in the Parquet format.

2. Create biketrip Table

Explanation:

  • CREATE TABLE biketrip AS: This creates a new table named biketrip.
  • SELECT *: This selects all columns from the source table.
  • st_transform(st_point(...), 'EPSG:4326', 'EPSG:3857') AS pickup_point: This transforms the latitude and longitude columns of the starting station into a geometry point in the Web Mercator projection (EPSG:3857).
  • st_transform(st_point(...), 'EPSG:4326', 'EPSG:3857') AS dropoff_point: Similarly, this transforms the latitude and longitude columns of the ending station into a geometry point in the Web Mercator projection (EPSG:3857).
  • FROM 'data/202003-citibike-tripdata.parquet': This specifies the source Parquet file from which the data is loaded.

Note:

  • Make sure to replace the actual path of your Parquet file in the FROM clause.
  • The projection used here is Web Mercator (EPSG:3857), but you may choose a different one depending on your analysis requirements and visualization tools.

This code essentially creates a new table (biketrip) with transformed spatial points for pickup and dropoff locations, allowing for further geospatial analysis using the Web Mercator projection.

2. Identify popular start and end stations:

The provided SQL query retrieves a list of bike start stations/end stations and the count of trips originating from each station. It groups the data by the start station name/end station name, counts the trips for each station, and presents the result in descending order based on the trip count. This query helps identify the most popular bike start/end stations.

3. Explore trip patterns by user type

The SQL query selects the user type (usertype) and calculates the average trip duration and average distance traveled for each user type. It groups the data by user type, providing insights into the typical ride durations and distances for different user categories.

4. Analyzing Efficiency: Trip Duration vs. Linear Distance

In this query:

  • ST_DISTANCE(pickup_point, dropoff_point) AS linear_distance calculates the linear distance between pickup and dropoff points using the spatial distance function.
  • tripduration / linear_distance AS efficiency_ratio calculates the efficiency ratio by dividing the trip duration by the linear distance.

This will help you assess how efficient the trips are in terms of travel time compared to the linear distance. A higher efficiency ratio could indicate more direct routes or faster travel times, while a lower ratio may suggest less efficient routes or longer travel times. Remember that negative differences may indicate issues with the data or specific cases that need further investigation.

5. Analyzing Bike Trips Within a 500-Meter Radius

The provided SQL query selects all columns from the biketrip table for bike trips whose pickup points fall within a 500-meter radius of a specified geographical location. Here’s an explanation of the code:

Explanation:

  • SELECT Clause: Retrieves all columns (*) from the biketrip table.
  • FROM Clause: Specifies the source table (biketrip) from which the data is being queried.
  • WHERE Clause: Filters rows based on the condition specified in the ST_WITHIN function.
  • ST_WITHIN Function: Checks if the pickup point is within the specified geometry.
  • ST_BUFFER Function: Creates a buffer around a specified point to define the search radius. In this case, it uses the ST_TRANSFORM function to convert the specified point from EPSG:4326 to EPSG:3857 (Web Mercator) and then creates a 500-meter buffer around it.

Purpose:

The query allows you to focus on bike trips whose pickup points are within a 500-meter radius of the specified geographical location (latitude 40.71146364, longitude -74.00552427). This analysis can provide insights into the distribution and patterns of bike trips around the given area.

I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an email at contact@spatial-dev.guru.

2 thoughts on “Geospatial Analysis using DuckDB”

Leave a ReplyCancel reply

Discover more from Spatial Dev Guru

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Spatial Dev Guru

Subscribe now to keep reading and get access to the full archive.

Continue reading