
In this tutorial, we’ll walk through how to process and save raster chunks in parallel using Xarray and Dask. This technique is particularly useful when working with large raster datasets where chunking and parallel processing can significantly improve efficiency.
Prerequisites
Before we begin, ensure you have the following Python libraries installed:
xarraydaskrioxarray(for saving rasters)
You can install them using pip:
|
1 2 |
pip install xarray dask rioxarray |
Step 1: Import Libraries
|
1 2 3 |
import xarray as xr import dask |
Xarray is great for working with labeled multi-dimensional arrays, and Dask provides parallel computing capabilities to process large datasets efficiently.
Step 2: Read the Raster Dataset
|
1 2 3 |
# Read raster data ds = xr.open_dataset('cdnh43e_v3r1/study_area.tif') |
Here, we load the raster dataset using open_dataset. Ensure the file format is supported by Xarray and that the dataset is properly structured.
Step 3: Define Chunk Sizes
|
1 2 3 |
# Add chunk information ds_chunk = ds.chunk({'x': 100, 'y': 100}) |
Chunking divides the dataset into smaller, manageable pieces. In this case, we specify a chunk size of 100×100 pixels.
Step 4: Save Chunks as TIFF Files
Define a function to save each chunk as an individual TIFF file:
|
1 2 3 4 5 6 7 8 9 |
# Function to save each chunk as a TIFF file def save_chunk(chunk, block_id): _id = next(block_id) # Generate a unique block ID if _id != 0: print(f"ID: {_id}, Size: {chunk.sizes}") # Save the chunk as a raster file chunk.band_data.rio.to_raster(f"files/{_id}.tiff") return chunk |
The save_chunk function:
- Takes a chunk and a unique block ID generator as inputs.
- Saves the chunk as a raster file using
rio.to_raster. - Optionally prints metadata like chunk size.
Step 5: Create a Block ID Iterator
|
1 2 3 |
# Iterator for creating block IDs block_id = iter(range(1000000000000)) # Generates unique IDs for chunks |
This iterator assigns a unique ID to each chunk. You can adjust the range as needed.
Step 6: Process and Save Chunks in Parallel
Use Dask’s map_blocks to apply the save_chunk function to each chunk:
|
1 2 3 |
# Parallel processing with Dask ds_chunk.map_blocks(save_chunk, template=ds_chunk).compute() |
Here’s how it works:
map_blocksapplies thesave_chunkfunction to each chunk.template=ds_chunkensures that the output matches the input dataset structure..compute()triggers the Dask computation, processing all chunks in parallel.
Step 7: Run the Script
Run the script, and it will process the raster in chunks, saving each as an individual TIFF file in the files directory.
Full Source Code:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# import libraries import xarray as xr import dask # Read data ds = xr.open_dataset('cdnh43e_v3r1/study_area.tif') # Add chunk info ds_chunk = ds.chunk({'x': 100, 'y': 100}) # Save chunk as tiff def save_chunk(chunk, block_id): _id = next(block_id) if _id != 0: print("ID: ", _id, "Size: ", chunk.sizes, "Centroid: ", centroid) chunk.band_data.rio.to_raster(f"files/{_id}.tiff") return chunk # iterator for creating block id of chunks block_id = iter(range(1000000000000)) # dask multiprocessing ds_chunk.map_blocks(save_chunk, args=[block_id], template=ds_chunk).compute() |
Key Advantages
- Efficient Processing: Dask parallelizes the task, leveraging multi-core CPUs.
- Chunk Management: By specifying chunk sizes, memory usage is optimized.
- Scalability: The method works seamlessly for large raster datasets.
Output
After execution, you’ll have TIFF files saved in the files directory, each representing a 100×100 pixel chunk of the original raster.
Conclusion
Using Xarray and Dask for parallel processing of raster data is a powerful technique for handling large datasets. By dividing the raster into manageable chunks and processing them in parallel, you can save significant time and computational resources.
Feel free to adapt this workflow to your specific needs, such as modifying chunk sizes or incorporating additional processing steps!
I hope this tutorial will create a good foundation for you. If you want tutorials on another GIS topic or you have any queries, please send an mail at contact@spatial-dev.guru.
