GeoSpatial Analysis using GeoPandas in python

GeoSpatial Analysis using GeoPandas in python

In this tuorial, we will do geospatial analysis using python on UK Accidents and UK Administrative datasets.

Geopandas is a library for working with geospatial data in Python. It builds on the popular Pandas library and extends its functionality to handle geospatial data. Some of the key features of Geopandas include:

Handling of shapefiles, geojson, and other geospatial data formats
Integration with other popular Python libraries such as Matplotlib and Scikit-learn
Easy handling and manipulation of geospatial data, including operations like merging, grouping, and transforming data
Support for spatial join operations, spatial indexing, and other geospatial analysis tasks

Here are some examples of tasks you can perform using Geopandas:

Visualizing and plotting geospatial data on maps
Performing spatial join operations to combine data from different sources
Calculating summary statistics for spatial data, such as mean, median, or standard deviation
Performing spatial analysis tasks, such as proximity analysis or spatial clustering

First, we will set up conda environment and install required python GIS packages for this task. Make sure conda is installed on your system. Use following commands to create a conda environment and to install python libraries.

(base) geoknight@pop-os:~$conda create -n spatial-dev.guru python=3.10
(base) geoknight@pop-os:~$conda activate spatial-dev.guru
(spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge geopandas
(spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge pygeos

(base) geoknight@pop-os:~$conda create -n spatial-dev.guru python=3.10

(base) geoknight@pop-os:~$conda activate spatial-dev.guru

(spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge geopandas

(spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge pygeos

Geospatial analysis using GeoPandas on UK Accidents and UK Administrative datasets

Click here to get source code and datasets used in this tutorial.

The below code reads in two datasets: a dataset of UK accidents, and a shapefile of administrative regions in the UK. The following steps are performed on the accidents dataset:

Convert the longitude and latitude columns to numeric data types.
Convert the longitude and latitude columns into Point geometries.
Set the coordinate reference system (CRS) to EPSG:4326 (WGS 84) for the accidents dataset.
Filter out any invalid geometries or empty geometries.
Convert the CRS to EPSG:3857 (Web Mercator) for the accidents dataset.

1. A sample point is then created and a buffer of 1000 meters is created around it. The code then filters the accidents dataset to find all accidents within this buffer.

2. Next, the shapefile of UK administrative regions is read in and its CRS is also converted to EPSG:3857. A spatial join is then performed between the accidents dataset and the administrative regions shapefile, resulting in a merged GeoDataFrame. This GeoDataFrame contains all the accidents point with region based value from UK administrative regions. Using spatial join, ADMIN_REGI column is added to uk_accidents data set.

3. Finally, the merged GeoDataFrame is grouped by the administrative region and the number of accidents in each region is counted. The result is a series of the number of accidents in each administrative region.

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

#Read datasets
uk_accidents = pd.read_csv('data/uk_accidents/Accident_Information.tar.xz', low_memory=False)


# Convert Long Lat into numeric type
uk_accidents['Longitude'] = pd.to_numeric(uk_accidents['Longitude'])
uk_accidents['Latitude'] = pd.to_numeric(uk_accidents['Latitude'])


#  Convert Longitude Latitude into Point Geometry
uk_accidents = gpd.GeoDataFrame(uk_accidents, geometry = gpd.points_from_xy(x=uk_accidents['Longitude'], y=uk_accidents['Latitude']))
uk_accidents = uk_accidents.set_crs('EPSG:4326')

# Remove invalid geometries if any
uk_accidents = uk_accidents[uk_accidents.is_valid]
uk_accidents = uk_accidents[~uk_accidents.is_empty]
uk_accidents = uk_accidents.to_crs('EPSG:3857')


# Perform spatial queries, such as finding accidents within a certain distance from a specific location:
# Creating a sample point
point = Point((-13411.1, 6711284.0))

# Creating a buffer of 1 KM around the point
buffer = point.buffer(1000)

within_buffer = uk_accidents[uk_accidents.within(buffer)]


# Load the administrative regions shapefile
uk_admins = gpd.read_file('data/uk_admin_region/uk_admins_region.shp')
uk_admins = uk_admins.to_crs('EPSG:3857')

# Perform a spatial join between the accidents GeoDataFrame and the administrative regions shapefile
merged = gpd.sjoin(uk_accidents, uk_admins, how="inner", predicate='within')

# Group the merged GeoDataFrame by the administrative region and count the number of accidents in each region
grouped = merged.groupby('ADMIN_REGI').count()['Accident_Index']

import pandas as pd

import geopandas as gpd

from shapely.geometry import Point

#Read datasets

uk_accidents = pd.read_csv('data/uk_accidents/Accident_Information.tar.xz', low_memory=False)

# Convert Long Lat into numeric type

uk_accidents['Longitude'] = pd.to_numeric(uk_accidents['Longitude'])

uk_accidents['Latitude'] = pd.to_numeric(uk_accidents['Latitude'])

# Convert Longitude Latitude into Point Geometry

uk_accidents = gpd.GeoDataFrame(uk_accidents, geometry = gpd.points_from_xy(x=uk_accidents['Longitude'], y=uk_accidents['Latitude']))

uk_accidents = uk_accidents.set_crs('EPSG:4326')

# Remove invalid geometries if any

uk_accidents = uk_accidents[uk_accidents.is_valid]

uk_accidents = uk_accidents[~uk_accidents.is_empty]

uk_accidents = uk_accidents.to_crs('EPSG:3857')

# Perform spatial queries, such as finding accidents within a certain distance from a specific location:

# Creating a sample point

point = Point((-13411.1, 6711284.0))

# Creating a buffer of 1 KM around the point

buffer = point.buffer(1000)

within_buffer = uk_accidents[uk_accidents.within(buffer)]

# Load the administrative regions shapefile

uk_admins = gpd.read_file('data/uk_admin_region/uk_admins_region.shp')

uk_admins = uk_admins.to_crs('EPSG:3857')

# Perform a spatial join between the accidents GeoDataFrame and the administrative regions shapefile

merged = gpd.sjoin(uk_accidents, uk_admins, how="inner", predicate='within')

# Group the merged GeoDataFrame by the administrative region and count the number of accidents in each region

grouped = merged.groupby('ADMIN_REGI').count()['Accident_Index']

I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an email at contact@spatial-dev.guru.

Geospatial analysis using GeoPandas on UK Accidents and UK Administrative datasets

Click here to get source code and datasets used in this tutorial.

Related Posts

Leave a ReplyCancel reply

Discover more from Spatial Dev Guru

Discover more from Spatial Dev Guru