River network processing

The river network processing module provides a toolkit for building and analyzing river network topology, including Strahler ordering, reach joining, and routing parameter calculation.

Overview

River network processing is a required preprocessing step that:

Builds network topology from a shapefile geometry
Enforces binary tree structure (maximum 2 upstream tributaries per reach)
Computes Strahler stream ordering
Optionally joins single-tributary reaches to simplify the network
Calculates hydraulic and routing parameters (width, lag time, storage coefficient)
Determines optimal calculation order for routing algorithms
Exports processed networks for use in simulations

Functions

River network processing

`mobidic.preprocessing.river_network.process_river_network(shapefile_path, join_single_tributaries=True, routing_params=None)`

Process river network shapefile to create a complete network structure.

This function reads a river network shapefile, builds the network topology, enforces binary tree structure (max 2 upstream tributaries per reach), computes Strahler stream ordering, optionally joins single-tributary reaches, and calculates routing parameters.

Parameters:

Name	Type	Description	Default
`shapefile_path`	`str \| Path`	Path to river network shapefile	required
`join_single_tributaries`	`bool`	If True, joins reaches with single tributaries (default: True)	`True`
`routing_params`	`dict \| None`	Dictionary with routing parameters: - wcel: Wave celerity [m/s] (default: 5.0) - Br0: Width of 1st order streams [m] (default: 1.0) - NBr: Width-order exponent (default: 1.5) - n_Man: Manning coefficient [s/m^(1/3)] (default: 0.03)	`None`

Returns:

Type	Description
`GeoDataFrame`	GeoDataFrame with processed network including topology and routing parameters

Raises:

Type	Description
`FileNotFoundError`	If shapefile does not exist
`ValueError`	If network is invalid

Examples:

>>> network = process_river_network(
...     "river_network.shp",
...     routing_params={"wcel": 3.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}
... )

Source code in mobidic/preprocessing/river_network.py

def process_river_network(
    shapefile_path: str | Path,
    join_single_tributaries: bool = True,
    routing_params: dict | None = None,
) -> gpd.GeoDataFrame:
    """Process river network shapefile to create a complete network structure.

    This function reads a river network shapefile, builds the network topology,
    enforces binary tree structure (max 2 upstream tributaries per reach),
    computes Strahler stream ordering, optionally joins single-tributary reaches,
    and calculates routing parameters.

    Args:
        shapefile_path: Path to river network shapefile
        join_single_tributaries: If True, joins reaches with single tributaries (default: True)
        routing_params: Dictionary with routing parameters:
            - wcel: Wave celerity [m/s] (default: 5.0)
            - Br0: Width of 1st order streams [m] (default: 1.0)
            - NBr: Width-order exponent (default: 1.5)
            - n_Man: Manning coefficient [s/m^(1/3)] (default: 0.03)

    Returns:
        GeoDataFrame with processed network including topology and routing parameters

    Raises:
        FileNotFoundError: If shapefile does not exist
        ValueError: If network is invalid

    Examples:
        >>> network = process_river_network(
        ...     "river_network.shp",
        ...     routing_params={"wcel": 3.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}
        ... )
    """
    logger.info(f"Processing river network from {shapefile_path}")

    # Read shapefile
    gdf = gpd.read_file(shapefile_path)

    # Build network topology
    logger.debug("Building network topology")
    network = _build_network_topology(gdf)

    # Enforce binary tree structure (max 2 upstream tributaries per reach)
    logger.debug("Enforcing binary tree structure")
    network = _enforce_binary_tree(network)

    # Compute Strahler ordering
    logger.debug("Computing Strahler stream order")
    network = _compute_strahler_order(network)

    # Optionally join single-tributary reaches
    if join_single_tributaries:
        logger.debug("Joining single-tributary reaches")
        network = _join_single_tributaries(network)

    # Calculate routing parameters
    if routing_params is None:
        routing_params = {"wcel": 5.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}

    logger.debug("Calculating routing parameters")
    network = _calculate_routing_parameters(network, routing_params)

    # Compute calculation order (topological order)
    logger.debug("Computing calculation order")
    network = _compute_calculation_order(network)

    logger.success(f"River network processed successfully: {len(network)} reaches")

    return network

River network I/O

`mobidic.preprocessing.io.save_network(network, output_path, format='parquet')`

Save processed river network to file.

Saves the river network to either GeoParquet (default, recommended) or Shapefile format. GeoParquet is more efficient, has no field name limitations, and preserves data types better.

Parameters:

Name	Type	Description	Default
`network`	`GeoDataFrame`	Processed river network GeoDataFrame	required
`output_path`	`str \| Path`	Path to output file	required
`format`	`str`	Output format, either ‘parquet’ (default) or ‘shapefile’	`'parquet'`

Raises:

Type	Description
`ValueError`	If format is not supported

Examples:

>>> from mobidic import process_river_network
>>> network = process_river_network("river_network.shp")
>>> from mobidic.preprocessing.io import save_network
>>> save_network(network, "river_network.parquet")  # Default: parquet
>>> save_network(network, "river_network.shp", format="shapefile")  # Shapefile

Source code in mobidic/preprocessing/io.py

def save_network(network: gpd.GeoDataFrame, output_path: str | Path, format: str = "parquet") -> None:
    """Save processed river network to file.

    Saves the river network to either GeoParquet (default, recommended) or Shapefile format.
    GeoParquet is more efficient, has no field name limitations, and preserves data types better.

    Args:
        network: Processed river network GeoDataFrame
        output_path: Path to output file
        format: Output format, either 'parquet' (default) or 'shapefile'

    Raises:
        ValueError: If format is not supported

    Examples:
        >>> from mobidic import process_river_network
        >>> network = process_river_network("river_network.shp")
        >>> from mobidic.preprocessing.io import save_network
        >>> save_network(network, "river_network.parquet")  # Default: parquet
        >>> save_network(network, "river_network.shp", format="shapefile")  # Shapefile
    """
    output_path = Path(output_path)
    output_path.parent.mkdir(parents=True, exist_ok=True)

    if format == "parquet":
        logger.info(f"Saving river network to GeoParquet: {output_path}")
        network.to_parquet(output_path)
        logger.success(f"Network saved to GeoParquet: {output_path}")
        logger.debug(f"File size: {output_path.stat().st_size / 1024:.2f} KB")
    elif format == "shapefile":
        import warnings

        logger.info(f"Saving river network to Shapefile: {output_path}")
        # Suppress shapefile field name truncation warnings (10 char limit is expected)
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message=".*Column names longer than 10 characters.*")
            warnings.filterwarnings("ignore", message=".*Normalized/laundered field name.*")
            network.to_file(output_path)
        logger.success(f"Network saved to Shapefile: {output_path}")
        logger.debug(f"File size: {output_path.stat().st_size / 1024:.2f} KB")
    else:
        raise ValueError(f"Unsupported format: {format}. Use 'parquet' or 'shapefile'")

`mobidic.preprocessing.io.load_network(network_path)`

Load processed river network from GeoParquet file.

Loads river network data from GeoParquet format only (the recommended format). For loading shapefiles, use geopandas.read_file() directly.

Parameters:

Name	Type	Description	Default
`network_path`	`str \| Path`	Path to network GeoParquet file (.parquet)	required

Returns:

Type	Description
`GeoDataFrame`	GeoDataFrame with river network

Raises:

Type	Description
`FileNotFoundError`	If network file does not exist
`ValueError`	If file is not a .parquet file

Examples:

>>> from mobidic.preprocessing.io import load_network
>>> network = load_network("river_network.parquet")

Source code in mobidic/preprocessing/io.py

def load_network(network_path: str | Path) -> gpd.GeoDataFrame:
    """Load processed river network from GeoParquet file.

    Loads river network data from GeoParquet format only (the recommended format).
    For loading shapefiles, use geopandas.read_file() directly.

    Args:
        network_path: Path to network GeoParquet file (.parquet)

    Returns:
        GeoDataFrame with river network

    Raises:
        FileNotFoundError: If network file does not exist
        ValueError: If file is not a .parquet file

    Examples:
        >>> from mobidic.preprocessing.io import load_network
        >>> network = load_network("river_network.parquet")
    """
    network_path = Path(network_path)

    if not network_path.exists():
        raise FileNotFoundError(f"Network file not found: {network_path}")

    if network_path.suffix != ".parquet":
        raise ValueError(
            f"Only .parquet files supported. Got: {network_path.suffix}. "
            "Use geopandas.read_file() to load other formats."
        )

    logger.info(f"Loading river network from GeoParquet: {network_path}")
    network = gpd.read_parquet(network_path)
    logger.success(f"River network loaded: {len(network)} reaches")

    return network

Examples

Complete river network processing

from mobidic import process_river_network, save_network

# Process river network from shapefile
network = process_river_network(
    shapefile_path="data/river_network.shp",
    join_single_tributaries=True,
    routing_params={
        "wcel": 2.5,  # Celerity: 2.5 m/s
        "Br0": 5.0,   # Base width: 5 m
        "NBr": 0.5,   # Width exponent
        "n_Man": 0.03  # Manning coefficient
    }
)

# Inspect the processed network
print(f"Total reaches: {len(network)}")
print(f"Max Strahler order: {network['strahler_order'].max()}")
print(f"Total length: {network['length_m'].sum() / 1000:.1f} km")

# Export to GeoParquet (recommended)
save_network(network, "output/network.parquet", format="parquet")

# Or export to Shapefile (for backward compatibility)
save_network(network, "output/network.shp", format="shapefile")

Loading a processed Network

from mobidic import load_network

# Load previously processed network
network = load_network("output/network.parquet")

# Access network properties
terminal_reaches = network[network['downstream'].isna()]
print(f"Number of outlets: {len(terminal_reaches)}")

headwater_reaches = network[network['strahler_order'] == 1]
print(f"Number of headwater streams: {len(headwater_reaches)}")

Analyzing river network topology

from mobidic import process_river_network
import pandas as pd

network = process_river_network("data/river_network.shp")

# Get summary statistics by Strahler order
summary = network.groupby('strahler_order').agg({
    'length_m': ['count', 'sum', 'mean'],
    'width_m': 'mean',
    'lag_time_s': 'mean'
})

print(summary)

# Find reaches with multiple upstream tributaries
junctions = network[network['upstream_2'].notna()]
print(f"Number of junctions: {len(junctions)}")

# Trace upstream from a specific reach
def get_upstream_network(network, mobidic_id):
    """Recursively find all upstream reaches."""
    upstream = []
    reach = network[network['mobidic_id'] == mobidic_id].iloc[0]

    for us_field in ['upstream_1', 'upstream_2']:
        us_id = reach[us_field]
        if pd.notna(us_id):
            upstream.append(us_id)
            upstream.extend(get_upstream_network(network, us_id))

    return upstream

# Get all reaches upstream of reach 100
upstream_reaches = get_upstream_network(network, 100)
print(f"Reaches upstream of reach 100: {len(upstream_reaches)}")

River network schema

Processed networks contain the following fields:

Field	Type	Description
`mobidic_id`	int	Internal reach identifier (0-indexed for topology)
`geometry`	LineString	Reach geometry
`upstream_1`	int/NaN	First upstream reach ID (references mobidic_id)
`upstream_2`	int/NaN	Second upstream reach ID if junction (references mobidic_id)
`downstream`	int/NaN	Downstream reach ID (references mobidic_id, NaN for outlets)
`strahler_order`	int	Strahler stream order (1 = headwater)
`calc_order`	int	Calculation order for routing (lower first)
`length_m`	float	Reach length in meters
`width_m`	float	Channel width in meters
`lag_time_s`	float	Lag time in seconds (used as K in routing)
`n_manning`	float	Manning roughness coefficient

Note: All original shapefile fields are preserved in the output. The mobidic_id field is added for internal topology management and is separate from any original ID fields in the shapefile.

Strahler ordering

The Strahler stream order is a hierarchical classification system:

First-order streams (order 1): Headwater streams with no upstream tributaries
Higher-order streams: When two streams of order n join, the result is order n+1
Unequal joining: When streams of different orders join, the downstream reach inherits the higher order

Example network:

    1 ─┐
       ├─ 2 ─┐
    1 ─┘     │
             ├─ 3 ─┐
          2 ─┘     │
                   ├─ 4 (outlet)
                3 ─┘

Routing parameters

Calculated parameters for channel routing:

Channel width: B = Br0 × order^NBr
Uses power-law relationship with Strahler order
Br0: Base width for first-order streams
NBr: Width scaling exponent
Lag time: τ = L / wcel
Time for a flow disturbance to travel the reach length
L: Reach length (m)
wcel: Wave celerity (m/s)
Storage coefficient: K = 0.5 × exp(-L/10000)
Dimensionless attenuation factor
Decreases exponentially with reach length
Manning coefficient: n_manning
Roughness parameter for friction calculations
Can be specified or estimated from channel properties

Processing pipeline

The river network processing follows these steps in order:

Read shapefile: Load river network geometry and attributes
Build topology: Identify upstream/downstream connections by matching reach endpoints
Enforce binary tree: Add fictitious short reaches for nodes with >2 tributaries
Compute Strahler order: Assign hierarchical stream orders
Join single tributaries (optional): Merge linear reach sequences
Calculate routing parameters: Compute width, lag time, storage coefficient
Determine calculation order: Set optimal processing sequence for routing

Binary tree enforcement

Networks may have junctions where more than 2 reaches join. MOBIDIC requires binary trees (maximum 2 upstream tributaries per reach) for its routing algorithms. The _enforce_binary_tree() function:

Identifies nodes with >2 upstream reaches
Adds fictitious short reaches (0.1m length) to split non-binary junctions
Updates topology to maintain network connectivity

Example: A 3-way junction (reaches A, B, C joining at node N flowing to reach D):

Before:              After:
  A ─┐                 A ─┐
  B ─┼─ N ─> D         B ─┼─ N1 ─> F ─> N ─> D
  C ─┘                 C ─┘         (fictitious)

This step typically adds a small number of fictitious reaches.

River network topology

Supported Structures

Binary trees: Maximum 2 upstream tributaries per reach (enforced automatically)
Multiple outlets: Supports networks with more than one terminal reach
Disconnected networks: Warns if multiple unconnected sub-networks exist

Topology Validation

The processing pipeline validates:

Reach connectivity (no dangling reaches)
Binary tree structure (automatically enforced)
Circular references (detects loops)
Duplicate reach IDs

Reach Joining

When join_single_tributaries=True:

Identifies reaches with exactly one upstream and one downstream reach
Merges these into the downstream reach
Recalculates length, routing parameters, and geometry
Simplifies the network while preserving topology

This is useful for: - Reducing computational load - Focusing on hydrologically significant junctions - Matching observational data at specific locations

Note: Joining is applied after binary tree enforcement, so some fictitious reaches may be merged.

File Formats

GeoParquet (Recommended)

Pros: Fast I/O, efficient compression, preserves all data types, widely supported
Cons: Requires pyarrow (default) or fastparquet
Use: Default format for processed networks

Shapefile (Legacy)

Pros: Universal compatibility, GIS software support
Cons: Slower I/O, field name limitations (10 chars), limited data types
Use: When compatibility with legacy tools is required

Notes

All lengths are in meters, times in seconds
Coordinate reference system (CRS) is preserved from input shapefile
Original shapefile fields are preserved; mobidic_id is added for topology
Geometry must be LineString (not MultiLineString)
Network must be topologically connected (warns if not)
Fictitious reaches (added during binary tree enforcement) contain only MOBIDIC-specific fields