Data I/O and GISData container
This module provides consolidated I/O for preprocessed MOBIDIC data and a container class for managing the complete preprocessed dataset.
Overview
After preprocessing GIS data, processing the river network, and optionally processing reservoirs, MOBIDICpy consolidates all data into a single GISData object that can be saved to and loaded from disk. This approach:
- Simplifies data management: Single object contains all preprocessed data
- Ensures consistency: Grids, network, and metadata stay synchronized
- Enables caching: Save expensive preprocessing results for reuse
- Facilitates sharing: Package preprocessed data for model runs
Classes
GISData container
Container for preprocessed GIS data.
This class holds all preprocessed spatial data including grids, river network, reservoirs, and hillslope-reach mapping. It provides methods to save/load consolidated preprocessed data.
Attributes:
| Name | Type | Description |
|---|---|---|
grids |
Dictionary of 2D numpy arrays containing raster data |
|
metadata |
Dictionary containing grid metadata (transform, CRS, resolution, etc.) |
|
network |
GeoDataFrame with processed river network |
|
reservoirs |
Reservoirs object with reservoir data (optional) |
|
hillslope_reach_map |
2D array mapping each cell to its downstream reach |
|
config |
MOBIDIC configuration used for preprocessing |
Source code in mobidic/preprocessing/preprocessor.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
__init__(grids, metadata, network, hillslope_reach_map, config, reservoirs=None)
Initialize GISData container.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grids
|
dict[str, ndarray]
|
Dictionary of 2D numpy arrays with grid data |
required |
metadata
|
dict[str, Any]
|
Dictionary with grid metadata |
required |
network
|
GeoDataFrame
|
Processed river network GeoDataFrame |
required |
hillslope_reach_map
|
ndarray
|
2D array with reach assignments |
required |
config
|
MOBIDICConfig
|
MOBIDIC configuration |
required |
reservoirs
|
Optional[Reservoirs]
|
Optional Reservoirs object |
None
|
Source code in mobidic/preprocessing/preprocessor.py
load(gisdata_path, network_path, reservoirs_path=None)
classmethod
Load preprocessed data from files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gisdata_path
|
str | Path
|
Path to gridded data file (NetCDF) |
required |
network_path
|
str | Path
|
Path to river network file (GeoParquet) |
required |
reservoirs_path
|
Optional[str | Path]
|
Optional path to reservoirs file (GeoParquet) |
None
|
Returns:
| Type | Description |
|---|---|
GISData
|
GISData object with loaded data |
Source code in mobidic/preprocessing/preprocessor.py
save(gisdata_path, network_path, reservoirs_path=None)
Save preprocessed data to files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gisdata_path
|
str | Path
|
Path to save gridded data (NetCDF format) |
required |
network_path
|
str | Path
|
Path to save river network (GeoParquet format) |
required |
reservoirs_path
|
Optional[str | Path]
|
Optional path to save reservoirs (GeoParquet format) |
None
|
Source code in mobidic/preprocessing/preprocessor.py
Functions
Save and load functions
Save preprocessed gridded data to NetCDF file.
This function saves all grids (DTM, flow direction, soil parameters, etc.), metadata, and hillslope-reach mapping to a single NetCDF file using xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gisdata
|
GISData
|
GISData object containing preprocessed data |
required |
output_path
|
str | Path
|
Path to output NetCDF file |
required |
Examples:
>>> from mobidic import run_preprocessing, load_config
>>> config = load_config("Arno.yaml")
>>> gisdata = run_preprocessing(config)
>>> from mobidic.preprocessing.io import save_gisdata
>>> save_gisdata(gisdata, "Arno_gisdata.nc")
Source code in mobidic/preprocessing/io.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
Load preprocessed GIS data from NetCDF and GeoParquet files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gisdata_path
|
str | Path
|
Path to gridded data NetCDF file |
required |
network_path
|
str | Path
|
Path to river network GeoParquet file |
required |
Returns:
| Type | Description |
|---|---|
GISData
|
GISData object containing loaded data |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If either file does not exist |
Examples:
>>> from mobidic.preprocessing.io import load_gisdata
>>> gisdata = load_gisdata("Arno_gisdata.nc", "Arno_network.parquet")
Source code in mobidic/preprocessing/io.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | |
Save processed river network to file.
Saves the river network to either GeoParquet (default, recommended) or Shapefile format. GeoParquet is more efficient, has no field name limitations, and preserves data types better.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
network
|
GeoDataFrame
|
Processed river network GeoDataFrame |
required |
output_path
|
str | Path
|
Path to output file |
required |
format
|
str
|
Output format, either ‘parquet’ (default) or ‘shapefile’ |
'parquet'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If format is not supported |
Examples:
>>> from mobidic import process_river_network
>>> network = process_river_network("river_network.shp")
>>> from mobidic.preprocessing.io import save_network
>>> save_network(network, "river_network.parquet") # Default: parquet
>>> save_network(network, "river_network.shp", format="shapefile") # Shapefile
Source code in mobidic/preprocessing/io.py
Load processed river network from GeoParquet file.
Loads river network data from GeoParquet format only (the recommended format). For loading shapefiles, use geopandas.read_file() directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
network_path
|
str | Path
|
Path to network GeoParquet file (.parquet) |
required |
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
GeoDataFrame with river network |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If network file does not exist |
ValueError
|
If file is not a .parquet file |
Examples:
>>> from mobidic.preprocessing.io import load_network
>>> network = load_network("river_network.parquet")
Source code in mobidic/preprocessing/io.py
Save reservoirs data to file.
Saves reservoir data to GeoParquet format, including all reservoir metadata, stage-storage curves, regulation curves, and spatial information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reservoirs
|
Reservoirs
|
Reservoirs object containing reservoir data |
required |
output_path
|
str | Path
|
Path to output file |
required |
format
|
str
|
Output format (only ‘parquet’ is supported) |
'parquet'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If format is not supported |
Examples:
>>> from mobidic import process_reservoirs
>>> reservoirs = process_reservoirs(...)
>>> from mobidic.preprocessing.io import save_reservoirs
>>> save_reservoirs(reservoirs, "reservoirs.parquet")
Source code in mobidic/preprocessing/io.py
Load reservoirs data from file.
Loads reservoir data from GeoParquet format, including all reservoir metadata, stage-storage curves, regulation curves, and spatial information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
str | Path
|
Path to reservoirs Parquet file |
required |
Returns:
| Type | Description |
|---|---|
Reservoirs
|
Reservoirs object with loaded data |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If reservoir file does not exist |
ValueError
|
If file is not a .parquet file |
Examples:
>>> from mobidic.preprocessing.io import load_reservoirs
>>> reservoirs = load_reservoirs("reservoirs.parquet")
Source code in mobidic/preprocessing/io.py
Usage examples
Example 1: complete preprocessing workflow
from mobidic import load_config, run_preprocessing, GISData
# Load configuration
config = load_config("config.yaml")
# Run preprocessing
gisdata = run_preprocessing(config)
# Save preprocessed data (including reservoirs if configured)
gisdata.save(
gisdata_path="output/gisdata.nc",
network_path="output/network.parquet",
reservoirs_path="output/reservoirs.parquet" # Optional
)
# Later, load preprocessed data
loaded_gisdata = GISData.load(
gisdata_path="output/gisdata.nc",
network_path="output/network.parquet",
reservoirs_path="output/reservoirs.parquet" # Optional
)
# Access components
print(f"Grid variables: {list(loaded_gisdata.grids.keys())}")
print(f"Network reaches: {len(loaded_gisdata.network)}")
print(f"Grid shape: {loaded_gisdata.metadata['shape']}")
if loaded_gisdata.reservoirs:
print(f"Reservoirs: {len(loaded_gisdata.reservoirs)}")
Example 2: working with GISData
from mobidic import GISData
import numpy as np
# Create GISData object
gisdata = GISData()
# Add grid data
gisdata.grids['dtm'] = np.random.rand(100, 100)
gisdata.grids['flow_dir'] = np.random.randint(1, 9, (100, 100))
gisdata.grids['ks'] = np.random.rand(100, 100) * 10
# Add network
gisdata.network = network_gdf # GeoDataFrame from process_river_network()
# Add metadata
gisdata.metadata = {
'shape': (100, 100),
'resolution': 100.0,
'crs': 'EPSG:32632',
'transform': affine_transform,
}
# Save
gisdata.save("output/gisdata.nc", "output/network.parquet")
Example 3: saving network only
from mobidic import process_river_network, save_network, load_network
# Process network
network = process_river_network(
shapefile_path="data/river_network.shp",
join_single_tributaries=True,
)
# Save as GeoParquet (recommended)
save_network(network, "output/network.parquet", format="parquet")
# Or save as Shapefile
save_network(network, "output/network.shp", format="shapefile")
# Load network later
loaded_network = load_network("output/network.parquet")
Example 4: working with reservoirs
from mobidic import process_reservoirs, save_reservoirs, load_reservoirs, GISData
# Process reservoir data from shapefiles and CSVs
reservoirs = process_reservoirs(
res_shape_path="reservoirs/reservoirs.shp",
stage_storage_path="reservoirs/stage_storage.csv",
regulation_curves_path="reservoirs/regulation_curves.csv",
regulation_schedule_path="reservoirs/regulation_schedule.csv",
initial_volumes_path="reservoirs/initial_volumes.csv", # Optional
grid_transform=gisdata.metadata['transform'],
grid_shape=gisdata.metadata['shape'],
network=gisdata.network,
)
# Save reservoirs to GeoParquet
save_reservoirs(reservoirs, "output/reservoirs.parquet")
# Load reservoirs later
loaded_reservoirs = load_reservoirs("output/reservoirs.parquet")
# Access reservoir data
for reservoir in loaded_reservoirs:
print(f"Reservoir {reservoir.id}: {reservoir.name}")
print(f" Basin pixels: {len(reservoir.basin_pixels)}")
print(f" Inlet reaches: {reservoir.inlet_reaches}")
print(f" Outlet reach: {reservoir.outlet_reach}")
print(f" Initial volume: {reservoir.initial_volume} m³")
print(f" Stage-storage curve: {len(reservoir.stage_storage_curve)} points")
# Include in GISData
gisdata.reservoirs = reservoirs
gisdata.save(
gisdata_path="output/gisdata.nc",
network_path="output/network.parquet",
reservoirs_path="output/reservoirs.parquet"
)
File formats
NetCDF for grid data
Grid data is saved in NetCDF4 format with:
Structure: - Each grid variable is a 2D data variable (y, x dimensions) - Coordinate variables for x and y - Comprehensive metadata (CRS, transform, resolution)
Compression: - zlib compression (level 4 by default) - Chunking optimized for spatial access patterns
Advantages: - Self-describing format with embedded metadata - Efficient compression for large grids - CF-compliant for interoperability - Supports NaN for nodata values
GeoParquet for network data
River network data is saved in GeoParquet format with:
Advantages: - Very fast read/write performance - Excellent compression ratios - Preserves all attribute types (int, float, list, etc.) - Native support for complex geometries - Column-oriented storage for efficient queries
Requirements:
- Requires pyarrow package: pip install pyarrow
Fallback: - If pyarrow not available, can use Shapefile format - Shapefile has limitations (attribute names, data types)
GeoParquet for reservoir data
Reservoir data is saved in GeoParquet format with:
Structure: - One row per reservoir with all associated data - Polygon geometry for reservoir basin - Stage-storage curve as nested DataFrame - Regulation curves and schedules as dictionaries - Basin pixels as list of linear indices - Inlet/outlet reach references
Advantages: - Efficient storage of nested data structures - Preserves Python data types (lists, dicts, DataFrames) - Fast read/write performance - Column-oriented storage for selective loading
Special handling: - Dictionary keys are zero-padded strings (“000”, “001”, etc.) for Parquet compatibility - DataFrames are serialized to Parquet bytes and stored as binary columns
Data consistency
The GISData class ensures consistency between components:
Grid validation
All grids must have the same shape:
gisdata = GISData()
gisdata.grids['dtm'] = np.zeros((100, 100))
gisdata.grids['ks'] = np.zeros((100, 150)) # Different shape - will raise error on save
Metadata requirements
Required metadata fields:
- shape: Tuple (nrows, ncols)
- resolution: Float or tuple (x_res, y_res)
- crs: String (e.g., “EPSG:32632”)
- transform: Affine transform object
Network validation
The network must be a GeoDataFrame with specific required columns:
- mobidic_id: Integer reach identifiers
- geometry: LineString geometries
- upstream_1, upstream_2, downstream: Topology references
- strahler_order, calc_order: Ordering information
- length_m, width_m: Geometric parameters
Spatial reference handling
CRS representation
The CRS is stored in multiple places:
- NetCDF grids: As global attribute
crs(WKT or EPSG code) - GeoDataFrame: Native CRS property
- Metadata dict: As string for convenience
These should all be consistent and are validated on save/load.
Affine transform
The affine transform maps pixel coordinates to geographic coordinates:
from affine import Affine
# Example: 100m resolution, origin at (600000, 4800000)
transform = Affine(100.0, 0.0, 600000.0,
0.0, -100.0, 4900000.0)
# Transform pixel (row, col) to (x, y)
x, y = transform * (col, row)
The transform is stored in the metadata dict and embedded in NetCDF grid variables.
Integration with preprocessing
The preprocessing workflow automatically creates and populates GISData:
from mobidic import load_config, run_preprocessing
config = load_config("config.yaml")
# This function internally:
# 1. Creates GISData object
# 2. Reads all raster files specified in config
# 3. Processes river network
# 4. Computes hillslope-reach mapping
# 5. Processes reservoirs (if configured)
# 6. Populates grids, network, reservoirs, and metadata
gisdata = run_preprocessing(config)
# Save for later use
gisdata.save(config.paths.gisdata, config.paths.network)