Rioxarray is python library which combines xarray and rasterio library for raster analysis. Xarray is open source and python packages that makes working with labelled multi dimensional array very efficient. RasterIO is open source and python package based on gdal to read/write and analyze raster data. Rioxarray extends xarray with rasterio accessor.

One of the advantage of xarray is that it can be integrated with Dask for parallel computing in local machines as well as in cluster machines. Well, there are lot of applications of rioxarray/xarray in raster domain. We will discuss those applications in another tutorials.

### Raster Polygonize

In this tutorial, we will use rioxarray and geopandas to polygonize the raster the data. Code is given at the end of tutorials. Basically, we will have 4 steps to polygonize the raster data.

- Read raster file using rioxarray.
- Compute xy coordinates of centroids of every pixel.
- Create buffer polygon for every xy coordinates based on pixel size.
- Perform union on all the polygons

Lets discuss the the code that we have written to polygonize the raster. The code is given at the end. Please refer there when I mentioned any line number.

- Import essential python packages that is needed for this task. Import numpy, pandas, geopandas, rioxarray and shapely.
- Read the raster data using rioxarray and store it in
**dem**variable. See line number 14 in below code snippet. In our case, we are reading**DEM**file that represent the elevation profile of a area of interest. The height and width of the raster used is**1413 and 1099**respectively. UTM projection of raster used is**EPSG:26718**. And pixel is**size10 meter by 10 meter(10×10)**. It has only single band raster. Pixel value represent elevation height in meters. - When you read the raster data, rioxarray will automatically compute the x-axis and y-axis coordinates of raster. Only thing we need to is create pair of xy coordinates. We will use numpy meshgrid to create the pairs. See the below image for reference and line number 16 in below code snippet.
- At this stage, we have centroid xy coordinates of every pixel and elevation values of every pixel in
**x,y and elevation variables**respectively. These all variables are numpy array and they have same dimension as of raster data that is**1413**x**1099**(height and width) These all are 2 dimensional numpy array. **x,y and elevation variables**are 2-dimensional array. We will use**flatten**method to make them 1-dimensional array. See line number 17 in below code snippet. Then, we will convert these variables to pandas frame. We will not vectorize all the pixels here. We will only vectorize those pixels which has values more than 170 meters. We will filter the newly created pandas frame based on this 170 threshold. See line number from 20-22 in below code snippet.- Now we have only those pixel coordinates in pandas frame whose values are greater than 170. We will convert this pandas frame to Geodataframe based on xy coordinates. See line number 23.
- So far, we have GeoDataframe of xy coordinates whose geometry is of point type. But our end result should be a polygon representing the those pixels which has a values greater than 170. To convert this point data to polygon, we will create buffer square of 5 meters for every point. And we are using 5 meters buffer because pixel size is 10 meter. This will create a polygon geometries of 10 by 10 meter for every point. See line number from 23-25.
- We will use multiprocessing to perform union on chunks of geometries for parallel computing . Here we are using size of 10000 as chunk size. We have performed union for every 10000 geometries. And then we will perform total union on these unioned geometries and finally save then to shapefile on disk. See line number from 28-26 in below code snippet.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
import numpy as np import pandas as pd import geopandas as gpd import rioxarray as rx from multiprocessing import Pool, cpu_count from shapely.ops import unary_union def union(x): return unary_union(x) if __name__ == "__main__": print("Reading raster...") dem = rx.open_rasterio('abbottstown_pa/abbottstown_pa.dem') x, y, elevation = dem.x.values, dem.y.values, dem.values x, y = np.meshgrid(x, y) x, y, elevation = x.flatten(), y.flatten(), elevation.flatten() print("Converting to GeoDataFrame...") dem_pd = pd.DataFrame.from_dict({'elevation': elevation, 'x': x, 'y': y}) dem_threshold = 170 dem_pd = dem_pd[dem_pd['elevation'] > dem_threshold] dem_vector = gpd.GeoDataFrame(geometry=gpd.GeoSeries.from_xy(dem_pd['x'], dem_pd['y'], crs=dem.rio.crs)) dem_vector = dem_vector.buffer(5, cap_style=3) dem_vector = dem_vector.to_crs('EPSG:4326') geom_arr = [] # Converting GeoSeries to list of geometries geoms = list(dem_vector) # Converting geometries list to nested list of geometries for i in range(0, len(geoms), 10000): geom_arr.append(geoms[i:i+10000]) # Creating multiprocessing pool to perform union operation of chunks of geometries with Pool(cpu_count()) as p: geom_union = p.map(union, geom_arr) # Perform union operation on returned unioned geometries total_union = unary_union(geom_union) # Creating GeoDataFrame for total_union union_vector_gdf = gpd.GeoDataFrame(geometry=gpd.GeoSeries(total_union)) # Saving GeoDataFrame to shapefile union_vector_gdf.to_file(f"""dem_{dem_threshold}.shp""", crs='EPSG:4326') |