In this tuorial, we will do geospatial analysis using python on UK Accidents and UK Administrative datasets.
Geopandas is a library for working with geospatial data in Python. It builds on the popular Pandas library and extends its functionality to handle geospatial data. Some of the key features of Geopandas include:
- Handling of shapefiles, geojson, and other geospatial data formats
- Integration with other popular Python libraries such as Matplotlib and Scikit-learn
- Easy handling and manipulation of geospatial data, including operations like merging, grouping, and transforming data
- Support for spatial join operations, spatial indexing, and other geospatial analysis tasks
Here are some examples of tasks you can perform using Geopandas:
- Visualizing and plotting geospatial data on maps
- Performing spatial join operations to combine data from different sources
- Calculating summary statistics for spatial data, such as mean, median, or standard deviation
- Performing spatial analysis tasks, such as proximity analysis or spatial clustering
First, we will set up conda environment and install required python GIS packages for this task. Make sure conda is installed on your system. Use following commands to create a conda environment and to install python libraries.
1 2 3 4 |
(base) geoknight@pop-os:~$conda create -n spatial-dev.guru python=3.10 (base) geoknight@pop-os:~$conda activate spatial-dev.guru (spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge geopandas (spatial-dev.guru) geoknight@pop-os:~$conda install -c conda-forge pygeos |
Geospatial analysis using GeoPandas on UK Accidents and UK Administrative datasets
Click here to get source code and datasets used in this tutorial.
The below code reads in two datasets: a dataset of UK accidents, and a shapefile of administrative regions in the UK. The following steps are performed on the accidents dataset:
- Convert the longitude and latitude columns to numeric data types.
- Convert the longitude and latitude columns into Point geometries.
- Set the coordinate reference system (CRS) to EPSG:4326 (WGS 84) for the accidents dataset.
- Filter out any invalid geometries or empty geometries.
- Convert the CRS to EPSG:3857 (Web Mercator) for the accidents dataset.
1. A sample point is then created and a buffer of 1000 meters is created around it. The code then filters the accidents dataset to find all accidents within this buffer.
2. Next, the shapefile of UK administrative regions is read in and its CRS is also converted to EPSG:3857. A spatial join is then performed between the accidents dataset and the administrative regions shapefile, resulting in a merged GeoDataFrame. This GeoDataFrame contains all the accidents point with region based value from UK administrative regions. Using spatial join, ADMIN_REGI column is added to uk_accidents data set.
3. Finally, the merged GeoDataFrame is grouped by the administrative region and the number of accidents in each region is counted. The result is a series of the number of accidents in each administrative region.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import pandas as pd import geopandas as gpd from shapely.geometry import Point #Read datasets uk_accidents = pd.read_csv('data/uk_accidents/Accident_Information.tar.xz', low_memory=False) # Convert Long Lat into numeric type uk_accidents['Longitude'] = pd.to_numeric(uk_accidents['Longitude']) uk_accidents['Latitude'] = pd.to_numeric(uk_accidents['Latitude']) # Convert Longitude Latitude into Point Geometry uk_accidents = gpd.GeoDataFrame(uk_accidents, geometry = gpd.points_from_xy(x=uk_accidents['Longitude'], y=uk_accidents['Latitude'])) uk_accidents = uk_accidents.set_crs('EPSG:4326') # Remove invalid geometries if any uk_accidents = uk_accidents[uk_accidents.is_valid] uk_accidents = uk_accidents[~uk_accidents.is_empty] uk_accidents = uk_accidents.to_crs('EPSG:3857') # Perform spatial queries, such as finding accidents within a certain distance from a specific location: # Creating a sample point point = Point((-13411.1, 6711284.0)) # Creating a buffer of 1 KM around the point buffer = point.buffer(1000) within_buffer = uk_accidents[uk_accidents.within(buffer)] # Load the administrative regions shapefile uk_admins = gpd.read_file('data/uk_admin_region/uk_admins_region.shp') uk_admins = uk_admins.to_crs('EPSG:3857') # Perform a spatial join between the accidents GeoDataFrame and the administrative regions shapefile merged = gpd.sjoin(uk_accidents, uk_admins, how="inner", predicate='within') # Group the merged GeoDataFrame by the administrative region and count the number of accidents in each region grouped = merged.groupby('ADMIN_REGI').count()['Accident_Index'] |
I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an email at contact@spatial-dev.guru.