Skip to content

River network processing

The river network processing module provides a toolkit for building and analyzing river network topology, including Strahler ordering, reach joining, and routing parameter calculation.

Overview

River network processing is a required preprocessing step that:

  • Builds network topology from a shapefile geometry
  • Enforces binary tree structure (maximum 2 upstream tributaries per reach)
  • Computes Strahler stream ordering
  • Optionally joins single-tributary reaches to simplify the network
  • Calculates hydraulic and routing parameters (width, lag time, storage coefficient)
  • Determines optimal calculation order for routing algorithms
  • Exports processed networks for use in simulations

Functions

River network processing

Process river network shapefile to create a complete network structure.

This function reads a river network shapefile, builds the network topology, enforces binary tree structure (max 2 upstream tributaries per reach), computes Strahler stream ordering, optionally joins single-tributary reaches, and calculates routing parameters.

Parameters:

Name Type Description Default
shapefile_path str | Path

Path to river network shapefile

required
join_single_tributaries bool

If True, joins reaches with single tributaries (default: True)

True
routing_params dict | None

Dictionary with routing parameters: - wcel: Wave celerity [m/s] (default: 5.0) - Br0: Width of 1st order streams [m] (default: 1.0) - NBr: Width-order exponent (default: 1.5) - n_Man: Manning coefficient [s/m^(1/3)] (default: 0.03)

None

Returns:

Type Description
GeoDataFrame

GeoDataFrame with processed network including topology and routing parameters

Raises:

Type Description
FileNotFoundError

If shapefile does not exist

ValueError

If network is invalid

Examples:

>>> network = process_river_network(
...     "river_network.shp",
...     routing_params={"wcel": 3.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}
... )
Source code in mobidic/preprocessing/river_network.py
def process_river_network(
    shapefile_path: str | Path,
    join_single_tributaries: bool = True,
    routing_params: dict | None = None,
) -> gpd.GeoDataFrame:
    """Process river network shapefile to create a complete network structure.

    This function reads a river network shapefile, builds the network topology,
    enforces binary tree structure (max 2 upstream tributaries per reach),
    computes Strahler stream ordering, optionally joins single-tributary reaches,
    and calculates routing parameters.

    Args:
        shapefile_path: Path to river network shapefile
        join_single_tributaries: If True, joins reaches with single tributaries (default: True)
        routing_params: Dictionary with routing parameters:
            - wcel: Wave celerity [m/s] (default: 5.0)
            - Br0: Width of 1st order streams [m] (default: 1.0)
            - NBr: Width-order exponent (default: 1.5)
            - n_Man: Manning coefficient [s/m^(1/3)] (default: 0.03)

    Returns:
        GeoDataFrame with processed network including topology and routing parameters

    Raises:
        FileNotFoundError: If shapefile does not exist
        ValueError: If network is invalid

    Examples:
        >>> network = process_river_network(
        ...     "river_network.shp",
        ...     routing_params={"wcel": 3.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}
        ... )
    """
    logger.info(f"Processing river network from {shapefile_path}")

    # Read shapefile
    gdf = gpd.read_file(shapefile_path)

    # Build network topology
    logger.debug("Building network topology")
    network = _build_network_topology(gdf)

    # Enforce binary tree structure (max 2 upstream tributaries per reach)
    logger.debug("Enforcing binary tree structure")
    network = _enforce_binary_tree(network)

    # Compute Strahler ordering
    logger.debug("Computing Strahler stream order")
    network = _compute_strahler_order(network)

    # Optionally join single-tributary reaches
    if join_single_tributaries:
        logger.debug("Joining single-tributary reaches")
        network = _join_single_tributaries(network)

    # Calculate routing parameters
    if routing_params is None:
        routing_params = {"wcel": 5.0, "Br0": 1.0, "NBr": 1.5, "n_Man": 0.03}

    logger.debug("Calculating routing parameters")
    network = _calculate_routing_parameters(network, routing_params)

    # Compute calculation order (topological order)
    logger.debug("Computing calculation order")
    network = _compute_calculation_order(network)

    logger.success(f"River network processed successfully: {len(network)} reaches")

    return network

River network I/O

Save processed river network to file.

Saves the river network to either GeoParquet (default, recommended) or Shapefile format. GeoParquet is more efficient, has no field name limitations, and preserves data types better.

Parameters:

Name Type Description Default
network GeoDataFrame

Processed river network GeoDataFrame

required
output_path str | Path

Path to output file

required
format str

Output format, either ‘parquet’ (default) or ‘shapefile’

'parquet'

Raises:

Type Description
ValueError

If format is not supported

Examples:

>>> from mobidic import process_river_network
>>> network = process_river_network("river_network.shp")
>>> from mobidic.preprocessing.io import save_network
>>> save_network(network, "river_network.parquet")  # Default: parquet
>>> save_network(network, "river_network.shp", format="shapefile")  # Shapefile
Source code in mobidic/preprocessing/io.py
def save_network(network: gpd.GeoDataFrame, output_path: str | Path, format: str = "parquet") -> None:
    """Save processed river network to file.

    Saves the river network to either GeoParquet (default, recommended) or Shapefile format.
    GeoParquet is more efficient, has no field name limitations, and preserves data types better.

    Args:
        network: Processed river network GeoDataFrame
        output_path: Path to output file
        format: Output format, either 'parquet' (default) or 'shapefile'

    Raises:
        ValueError: If format is not supported

    Examples:
        >>> from mobidic import process_river_network
        >>> network = process_river_network("river_network.shp")
        >>> from mobidic.preprocessing.io import save_network
        >>> save_network(network, "river_network.parquet")  # Default: parquet
        >>> save_network(network, "river_network.shp", format="shapefile")  # Shapefile
    """
    output_path = Path(output_path)
    output_path.parent.mkdir(parents=True, exist_ok=True)

    if format == "parquet":
        logger.info(f"Saving river network to GeoParquet: {output_path}")
        network.to_parquet(output_path)
        logger.success(f"Network saved to GeoParquet: {output_path}")
        logger.debug(f"File size: {output_path.stat().st_size / 1024:.2f} KB")
    elif format == "shapefile":
        import warnings

        logger.info(f"Saving river network to Shapefile: {output_path}")
        # Suppress shapefile field name truncation warnings (10 char limit is expected)
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message=".*Column names longer than 10 characters.*")
            warnings.filterwarnings("ignore", message=".*Normalized/laundered field name.*")
            network.to_file(output_path)
        logger.success(f"Network saved to Shapefile: {output_path}")
        logger.debug(f"File size: {output_path.stat().st_size / 1024:.2f} KB")
    else:
        raise ValueError(f"Unsupported format: {format}. Use 'parquet' or 'shapefile'")

Load processed river network from GeoParquet file.

Loads river network data from GeoParquet format only (the recommended format). For loading shapefiles, use geopandas.read_file() directly.

Parameters:

Name Type Description Default
network_path str | Path

Path to network GeoParquet file (.parquet)

required

Returns:

Type Description
GeoDataFrame

GeoDataFrame with river network

Raises:

Type Description
FileNotFoundError

If network file does not exist

ValueError

If file is not a .parquet file

Examples:

>>> from mobidic.preprocessing.io import load_network
>>> network = load_network("river_network.parquet")
Source code in mobidic/preprocessing/io.py
def load_network(network_path: str | Path) -> gpd.GeoDataFrame:
    """Load processed river network from GeoParquet file.

    Loads river network data from GeoParquet format only (the recommended format).
    For loading shapefiles, use geopandas.read_file() directly.

    Args:
        network_path: Path to network GeoParquet file (.parquet)

    Returns:
        GeoDataFrame with river network

    Raises:
        FileNotFoundError: If network file does not exist
        ValueError: If file is not a .parquet file

    Examples:
        >>> from mobidic.preprocessing.io import load_network
        >>> network = load_network("river_network.parquet")
    """
    network_path = Path(network_path)

    if not network_path.exists():
        raise FileNotFoundError(f"Network file not found: {network_path}")

    if network_path.suffix != ".parquet":
        raise ValueError(
            f"Only .parquet files supported. Got: {network_path.suffix}. "
            "Use geopandas.read_file() to load other formats."
        )

    logger.info(f"Loading river network from GeoParquet: {network_path}")
    network = gpd.read_parquet(network_path)
    logger.success(f"River network loaded: {len(network)} reaches")

    return network

Examples

Complete river network processing

from mobidic import process_river_network, save_network

# Process river network from shapefile
network = process_river_network(
    shapefile_path="data/river_network.shp",
    join_single_tributaries=True,
    routing_params={
        "wcel": 2.5,  # Celerity: 2.5 m/s
        "Br0": 5.0,   # Base width: 5 m
        "NBr": 0.5,   # Width exponent
        "n_Man": 0.03  # Manning coefficient
    }
)

# Inspect the processed network
print(f"Total reaches: {len(network)}")
print(f"Max Strahler order: {network['strahler_order'].max()}")
print(f"Total length: {network['length_m'].sum() / 1000:.1f} km")

# Export to GeoParquet (recommended)
save_network(network, "output/network.parquet", format="parquet")

# Or export to Shapefile (for backward compatibility)
save_network(network, "output/network.shp", format="shapefile")

Loading a processed Network

from mobidic import load_network

# Load previously processed network
network = load_network("output/network.parquet")

# Access network properties
terminal_reaches = network[network['downstream'].isna()]
print(f"Number of outlets: {len(terminal_reaches)}")

headwater_reaches = network[network['strahler_order'] == 1]
print(f"Number of headwater streams: {len(headwater_reaches)}")

Analyzing river network topology

from mobidic import process_river_network
import pandas as pd

network = process_river_network("data/river_network.shp")

# Get summary statistics by Strahler order
summary = network.groupby('strahler_order').agg({
    'length_m': ['count', 'sum', 'mean'],
    'width_m': 'mean',
    'lag_time_s': 'mean'
})

print(summary)

# Find reaches with multiple upstream tributaries
junctions = network[network['upstream_2'].notna()]
print(f"Number of junctions: {len(junctions)}")

# Trace upstream from a specific reach
def get_upstream_network(network, mobidic_id):
    """Recursively find all upstream reaches."""
    upstream = []
    reach = network[network['mobidic_id'] == mobidic_id].iloc[0]

    for us_field in ['upstream_1', 'upstream_2']:
        us_id = reach[us_field]
        if pd.notna(us_id):
            upstream.append(us_id)
            upstream.extend(get_upstream_network(network, us_id))

    return upstream

# Get all reaches upstream of reach 100
upstream_reaches = get_upstream_network(network, 100)
print(f"Reaches upstream of reach 100: {len(upstream_reaches)}")

River network schema

Processed networks contain the following fields:

Field Type Description
mobidic_id int Internal reach identifier (0-indexed for topology)
geometry LineString Reach geometry
upstream_1 int/NaN First upstream reach ID (references mobidic_id)
upstream_2 int/NaN Second upstream reach ID if junction (references mobidic_id)
downstream int/NaN Downstream reach ID (references mobidic_id, NaN for outlets)
strahler_order int Strahler stream order (1 = headwater)
calc_order int Calculation order for routing (lower first)
length_m float Reach length in meters
width_m float Channel width in meters
lag_time_s float Lag time in seconds (used as K in routing)
n_manning float Manning roughness coefficient

Note: All original shapefile fields are preserved in the output. The mobidic_id field is added for internal topology management and is separate from any original ID fields in the shapefile.

Strahler ordering

The Strahler stream order is a hierarchical classification system:

  1. First-order streams (order 1): Headwater streams with no upstream tributaries
  2. Higher-order streams: When two streams of order n join, the result is order n+1
  3. Unequal joining: When streams of different orders join, the downstream reach inherits the higher order

Example network:

    1 ─┐
       ├─ 2 ─┐
    1 ─┘     │
             ├─ 3 ─┐
          2 ─┘     │
                   ├─ 4 (outlet)
                3 ─┘

Routing parameters

Calculated parameters for channel routing:

  • Channel width: B = Br0 × order^NBr
  • Uses power-law relationship with Strahler order
  • Br0: Base width for first-order streams
  • NBr: Width scaling exponent

  • Lag time: τ = L / wcel

  • Time for a flow disturbance to travel the reach length
  • L: Reach length (m)
  • wcel: Wave celerity (m/s)

  • Storage coefficient: K = 0.5 × exp(-L/10000)

  • Dimensionless attenuation factor
  • Decreases exponentially with reach length

  • Manning coefficient: n_manning

  • Roughness parameter for friction calculations
  • Can be specified or estimated from channel properties

Processing pipeline

The river network processing follows these steps in order:

  1. Read shapefile: Load river network geometry and attributes
  2. Build topology: Identify upstream/downstream connections by matching reach endpoints
  3. Enforce binary tree: Add fictitious short reaches for nodes with >2 tributaries
  4. Compute Strahler order: Assign hierarchical stream orders
  5. Join single tributaries (optional): Merge linear reach sequences
  6. Calculate routing parameters: Compute width, lag time, storage coefficient
  7. Determine calculation order: Set optimal processing sequence for routing

Binary tree enforcement

Networks may have junctions where more than 2 reaches join. MOBIDIC requires binary trees (maximum 2 upstream tributaries per reach) for its routing algorithms. The _enforce_binary_tree() function:

  • Identifies nodes with >2 upstream reaches
  • Adds fictitious short reaches (0.1m length) to split non-binary junctions
  • Updates topology to maintain network connectivity

Example: A 3-way junction (reaches A, B, C joining at node N flowing to reach D):

Before:              After:
  A ─┐                 A ─┐
  B ─┼─ N ─> D         B ─┼─ N1 ─> F ─> N ─> D
  C ─┘                 C ─┘         (fictitious)

This step typically adds a small number of fictitious reaches.

River network topology

Supported Structures

  • Binary trees: Maximum 2 upstream tributaries per reach (enforced automatically)
  • Multiple outlets: Supports networks with more than one terminal reach
  • Disconnected networks: Warns if multiple unconnected sub-networks exist

Topology Validation

The processing pipeline validates:

  • Reach connectivity (no dangling reaches)
  • Binary tree structure (automatically enforced)
  • Circular references (detects loops)
  • Duplicate reach IDs

Reach Joining

When join_single_tributaries=True:

  1. Identifies reaches with exactly one upstream and one downstream reach
  2. Merges these into the downstream reach
  3. Recalculates length, routing parameters, and geometry
  4. Simplifies the network while preserving topology

This is useful for: - Reducing computational load - Focusing on hydrologically significant junctions - Matching observational data at specific locations

Note: Joining is applied after binary tree enforcement, so some fictitious reaches may be merged.

File Formats

  • Pros: Fast I/O, efficient compression, preserves all data types, widely supported
  • Cons: Requires pyarrow (default) or fastparquet
  • Use: Default format for processed networks

Shapefile (Legacy)

  • Pros: Universal compatibility, GIS software support
  • Cons: Slower I/O, field name limitations (10 chars), limited data types
  • Use: When compatibility with legacy tools is required

Notes

  • All lengths are in meters, times in seconds
  • Coordinate reference system (CRS) is preserved from input shapefile
  • Original shapefile fields are preserved; mobidic_id is added for topology
  • Geometry must be LineString (not MultiLineString)
  • Network must be topologically connected (warns if not)
  • Fictitious reaches (added during binary tree enforcement) contain only MOBIDIC-specific fields