Polygonize Raster using Rioxarray and GeoPandas

Polygonize raster using Rioxrray and GeoPandas

Rioxarray is python library which combines xarray and rasterio library for raster analysis. Xarray is open source and python packages that makes working with labelled multi dimensional array very efficient. RasterIO is open source and python package based on gdal to read/write and analyze raster data. Rioxarray extends xarray with rasterio accessor.

One of the advantage of xarray is that it can be integrated with Dask for parallel computing in local machines as well as in cluster machines. Well, there are lot of applications of rioxarray/xarray in raster domain. We will discuss those applications in another tutorials.

Raster Polygonize

In this tutorial, we will use rioxarray and geopandas to polygonize the raster the data. Code is given at the end of tutorials. Basically, we will have 4 steps to polygonize the raster data.

Read raster file using rioxarray.
Compute xy coordinates of centroids of every pixel.
Create buffer polygon for every xy coordinates based on pixel size.
Perform union on all the polygons

Lets discuss the the code that we have written to polygonize the raster. The code is given at the end. Please refer there when I mentioned any line number.

Import essential python packages that is needed for this task. Import numpy, pandas, geopandas, rioxarray and shapely.
Read the raster data using rioxarray and store it in dem variable. See line number 14 in below code snippet. In our case, we are reading DEM file that represent the elevation profile of a area of interest. The height and width of the raster used is 1413 and 1099 respectively. UTM projection of raster used is EPSG:26718. And pixel is size10 meter by 10 meter(10×10). It has only single band raster. Pixel value represent elevation height in meters.
When you read the raster data, rioxarray will automatically compute the x-axis and y-axis coordinates of raster. Only thing we need to is create pair of xy coordinates. We will use numpy meshgrid to create the pairs. See the below image for reference and line number 16 in below code snippet.
At this stage, we have centroid xy coordinates of every pixel and elevation values of every pixel in x,y and elevation variables respectively. These all variables are numpy array and they have same dimension as of raster data that is 1413x1099(height and width) These all are 2 dimensional numpy array.
x,y and elevation variables are 2-dimensional array. We will use flatten method to make them 1-dimensional array. See line number 17 in below code snippet. Then, we will convert these variables to pandas frame. We will not vectorize all the pixels here. We will only vectorize those pixels which has values more than 170 meters. We will filter the newly created pandas frame based on this 170 threshold. See line number from 20-22 in below code snippet.
Now we have only those pixel coordinates in pandas frame whose values are greater than 170. We will convert this pandas frame to Geodataframe based on xy coordinates. See line number 23.
So far, we have GeoDataframe of xy coordinates whose geometry is of point type. But our end result should be a polygon representing the those pixels which has a values greater than 170. To convert this point data to polygon, we will create buffer square of 5 meters for every point. And we are using 5 meters buffer because pixel size is 10 meter. This will create a polygon geometries of 10 by 10 meter for every point. See line number from 23-25.
We will use multiprocessing to perform union on chunks of geometries for parallel computing . Here we are using size of 10000 as chunk size. We have performed union for every 10000 geometries. And then we will perform total union on these unioned geometries and finally save then to shapefile on disk. See line number from 28-26 in below code snippet.

import numpy as np
import pandas as pd
import geopandas as gpd
import rioxarray as rx
from multiprocessing import Pool, cpu_count
from shapely.ops import unary_union


def union(x):
    return unary_union(x)

if __name__ == "__main__":
    print("Reading raster...")
    dem = rx.open_rasterio('abbottstown_pa/abbottstown_pa.dem')
    x, y, elevation = dem.x.values, dem.y.values, dem.values
    x, y = np.meshgrid(x, y)
    x, y, elevation = x.flatten(), y.flatten(), elevation.flatten()

    print("Converting to GeoDataFrame...")
    dem_pd = pd.DataFrame.from_dict({'elevation': elevation, 'x': x, 'y': y})
    dem_threshold = 170
    dem_pd = dem_pd[dem_pd['elevation'] > dem_threshold]
    dem_vector = gpd.GeoDataFrame(geometry=gpd.GeoSeries.from_xy(dem_pd['x'], dem_pd['y'], crs=dem.rio.crs))
    dem_vector = dem_vector.buffer(5, cap_style=3)
    dem_vector = dem_vector.to_crs('EPSG:4326')
    geom_arr = []

    # Converting GeoSeries to list of geometries
    geoms = list(dem_vector)

    # Converting geometries list to nested list of geometries
    for i in range(0, len(geoms), 10000):
        geom_arr.append(geoms[i:i+10000])

    # Creating multiprocessing pool to perform union operation of chunks of geometries
    with Pool(cpu_count()) as p:
        geom_union = p.map(union, geom_arr)

    # Perform union operation on returned unioned geometries
    total_union = unary_union(geom_union)

    # Creating GeoDataFrame for total_union
    union_vector_gdf = gpd.GeoDataFrame(geometry=gpd.GeoSeries(total_union))

    # Saving GeoDataFrame to shapefile
    union_vector_gdf.to_file(f"""dem_{dem_threshold}.shp""", crs='EPSG:4326')

import numpy as np

import pandas as pd

import geopandas as gpd

import rioxarray as rx

from multiprocessing import Pool, cpu_count

from shapely.ops import unary_union

def union(x):

return unary_union(x)

if __name__ == "__main__":

print("Reading raster...")

dem = rx.open_rasterio('abbottstown_pa/abbottstown_pa.dem')

x, y, elevation = dem.x.values, dem.y.values, dem.values

x, y = np.meshgrid(x, y)

x, y, elevation = x.flatten(), y.flatten(), elevation.flatten()

print("Converting to GeoDataFrame...")

dem_pd = pd.DataFrame.from_dict({'elevation': elevation, 'x': x, 'y': y})

dem_threshold = 170

dem_pd = dem_pd[dem_pd['elevation'] > dem_threshold]

dem_vector = gpd.GeoDataFrame(geometry=gpd.GeoSeries.from_xy(dem_pd['x'], dem_pd['y'], crs=dem.rio.crs))

dem_vector = dem_vector.buffer(5, cap_style=3)

dem_vector = dem_vector.to_crs('EPSG:4326')

geom_arr = []

# Converting GeoSeries to list of geometries

geoms = list(dem_vector)

# Converting geometries list to nested list of geometries

for i in range(0, len(geoms), 10000):

geom_arr.append(geoms[i:i+10000])

# Creating multiprocessing pool to perform union operation of chunks of geometries

with Pool(cpu_count()) as p:

geom_union = p.map(union, geom_arr)

# Perform union operation on returned unioned geometries

total_union = unary_union(geom_union)

# Creating GeoDataFrame for total_union

union_vector_gdf = gpd.GeoDataFrame(geometry=gpd.GeoSeries(total_union))

# Saving GeoDataFrame to shapefile

union_vector_gdf.to_file(f"""dem_{dem_threshold}.shp""", crs='EPSG:4326')

Polygonize Raster using Rioxarray and GeoPandas

Raster Polygonize

We also offer freelancing services. Please email us at contact@spatial-dev.guru for any query.

Leave a ReplyCancel reply

Raster Polygonize

We also offer freelancing services. Please email us at contact@spatial-dev.guru for any query.

Related Posts

Leave a ReplyCancel reply

Discover more from Spatial Dev Guru

Discover more from Spatial Dev Guru