Skip to content

GIS data I/O

The GIS I/O module provides functions for reading geospatial data in various formats, including raster (GeoTIFF) and vector (Shapefile) files.

Overview

MOBIDICpy uses industry-standard libraries for geospatial data handling:

  • Rasterio for raster data (GeoTIFF)
  • GeoPandas for vector data (Shapefiles, GeoJSON, etc.)

All functions provide comprehensive logging and error handling, converting nodata values to NaN for consistent numerical processing.

Functions

Raster I/O

Read raster grid file and return data array with corner coordinates. This function replicates the behavior of MATLAB’s function for GeoTIFF files.

Reads raster files in GeoTIFF (.tif, .tiff) format. Returns the data array and corner coordinates adjusted to cell centers.

Parameters:

Name Type Description Default
gridname str | Path

Path to the grid file (.tif, .tiff).

required

Returns:

Type Description
dict[str, Any]

Dictionary containing: - ‘data’: 2D numpy array with raster values (NaN for nodata) - ‘xllcorner’: X coordinate of lower-left corner (cell center) - ‘yllcorner’: Y coordinate of lower-left corner (cell center) - ‘cellsize’: Cell size in map units

Raises:

Type Description
FileNotFoundError

If the grid file does not exist.

ValueError

If the file format is not supported.

RuntimeError

If there are errors reading the file.

Notes
  • Corner coordinates are adjusted to cell centers (0.5 * cellsize offset)
  • Data is flipped vertically to match MATLAB convention
  • Very small values (< -1e32) are converted to NaN

Examples:

>>> result = grid_to_matrix('elevation.tif')
>>> data = result['data']
>>> print(f"Shape: {data.shape}, Resolution: {result['cellsize']}m")
Source code in mobidic/preprocessing/gis_reader.py
def grid_to_matrix(gridname: str | Path) -> dict[str, Any]:
    """Read raster grid file and return data array with corner coordinates. This function
    replicates the behavior of MATLAB's function for GeoTIFF files.

    Reads raster files in GeoTIFF (.tif, .tiff)
    format. Returns the data array and corner coordinates adjusted to cell centers.

    Args:
        gridname: Path to the grid file (.tif, .tiff).

    Returns:
        Dictionary containing:
            - 'data': 2D numpy array with raster values (NaN for nodata)
            - 'xllcorner': X coordinate of lower-left corner (cell center)
            - 'yllcorner': Y coordinate of lower-left corner (cell center)
            - 'cellsize': Cell size in map units

    Raises:
        FileNotFoundError: If the grid file does not exist.
        ValueError: If the file format is not supported.
        RuntimeError: If there are errors reading the file.

    Notes:
        - Corner coordinates are adjusted to cell centers (0.5 * cellsize offset)
        - Data is flipped vertically to match MATLAB convention
        - Very small values (< -1e32) are converted to NaN

    Examples:
        >>> result = grid_to_matrix('elevation.tif')
        >>> data = result['data']
        >>> print(f"Shape: {data.shape}, Resolution: {result['cellsize']}m")
    """
    gridname = Path(gridname)
    suffix_lower = gridname.suffix.lower()

    logger.info(f"Reading grid file: {gridname}")

    # GeoTIFF format
    if suffix_lower in [".tif", ".tiff"]:
        if not gridname.exists():
            logger.error(f"File not found: {gridname}")
            raise FileNotFoundError(f"File not found: {gridname}")

        try:
            with rasterio.open(gridname) as src:
                # Read data and convert to float
                matgr = src.read(1).astype(float)

                # Get nodata value and convert to NaN
                nodata = src.nodata
                if nodata is not None:
                    matgr[matgr == nodata] = np.nan

                # Get spatial information
                transform = src.transform
                cellsize = transform[0]  # Assuming square pixels
                xllcorner = src.bounds.left
                yllcorner = src.bounds.bottom

                # Get CRS
                crs = src.crs

        except Exception as e:
            logger.error(f"Error reading GeoTIFF file {gridname}: {e}")
            raise RuntimeError(f"Error reading GeoTIFF file: {e}") from e

        # Flip vertically (to match MATLAB convention)
        matgr = np.flipud(matgr)

        # Filter nodata values (e.g., -3.4e38)
        matgr[matgr < -1e32] = np.nan

        # Adjust to cell center
        xllcorner += 0.5 * cellsize
        yllcorner += 0.5 * cellsize

        logger.info(f"GeoTIFF read: shape={matgr.shape}, cellsize={cellsize}")

        return {
            "data": matgr,
            "xllcorner": xllcorner,
            "yllcorner": yllcorner,
            "cellsize": cellsize,
            "crs": crs,
        }

    else:
        logger.error(f"Unsupported file format: {suffix_lower}")
        raise ValueError(f"Unsupported file format: {suffix_lower}. Must be .tif or .tiff")

Vector I/O

Read a shapefile and return a GeoDataFrame.

Parameters:

Name Type Description Default
filepath str | Path

Path to the shapefile (.shp).

required
crs str | None

Optional CRS to reproject the data. If None, keeps original CRS.

None

Returns:

Type Description
GeoDataFrame

GeoDataFrame containing the shapefile data.

Raises:

Type Description
FileNotFoundError

If the shapefile does not exist.

Exception

If the file cannot be read by geopandas.

Examples:

>>> river_network = read_shapefile("example/shp/Arno_river_network.shp")
>>> reach_ids = river_network['REACH_ID']
Source code in mobidic/preprocessing/gis_reader.py
def read_shapefile(
    filepath: str | Path,
    crs: str | None = None,
) -> gpd.GeoDataFrame:
    """Read a shapefile and return a GeoDataFrame.

    Args:
        filepath: Path to the shapefile (.shp).
        crs: Optional CRS to reproject the data. If None, keeps original CRS.

    Returns:
        GeoDataFrame containing the shapefile data.

    Raises:
        FileNotFoundError: If the shapefile does not exist.
        Exception: If the file cannot be read by geopandas.

    Examples:
        >>> river_network = read_shapefile("example/shp/Arno_river_network.shp")
        >>> reach_ids = river_network['REACH_ID']
    """
    filepath = Path(filepath)

    # Check if file exists
    if not filepath.exists():
        logger.error(f"Shapefile not found: {filepath}")
        raise FileNotFoundError(f"Shapefile not found: {filepath}")

    logger.info(f"Reading shapefile: {filepath}")

    try:
        # Read shapefile
        gdf = gpd.read_file(filepath)

        logger.debug(f"Shapefile loaded: {len(gdf)} features, CRS={gdf.crs}")

        # Reproject if requested
        if crs is not None:
            logger.debug(f"Reprojecting from {gdf.crs} to {crs}")
            gdf = gdf.to_crs(crs)

        logger.success(f"Successfully read shapefile: {len(gdf)} features, CRS={gdf.crs}")

        return gdf

    except Exception as e:
        logger.error(f"Failed to read shapefile {filepath}: {e}")
        raise RuntimeError(f"Failed to read shapefile {filepath}: {e}") from e

Examples

Reading a Raster

from mobidic import grid_to_matrix
import numpy as np

# Read a Digital Terrain Model
dtm = grid_to_matrix("path/to/dtm.tif")

# Access raster data and metadata
print(f"Shape: {dtm['data'].shape}")
print(f"Cell size: {dtm['cellsize']} m")
print(f"Lower-left corner: ({dtm['xllcorner']}, {dtm['yllcorner']})")
print(f"CRS: {dtm['crs']}")

# Access the data array (nodata converted to NaN)
elevation = dtm['data']
mean_elevation = np.nanmean(elevation)
print(f"Mean elevation: {mean_elevation:.2f} m")

Reading a shapefile

from mobidic import read_shapefile

# Read river network shapefile
network = read_shapefile("path/to/river_network.shp")

# Access as GeoDataFrame
print(f"Number of reaches: {len(network)}")
print(f"CRS: {network.crs}")
print(network.head())

# Reproject to a different CRS
network_utm = read_shapefile(
    "path/to/river_network.shp",
    crs="EPSG:32632"  # WGS84 / UTM zone 32N
)

Return values

Raster data

grid_to_matrix() returns a dictionary with:

  • data (numpy.ndarray): 2D array of raster values, nodata converted to NaN (flipped vertically to match MATLAB convention)
  • xllcorner (float): X coordinate of lower-left corner (cell center)
  • yllcorner (float): Y coordinate of lower-left corner (cell center)
  • cellsize (float): Cell size in map units
  • crs (rasterio.crs.CRS): Coordinate reference system

Vector data

read_shapefile() returns a geopandas.GeoDataFrame with geometry and attribute columns.

Error handling

Both functions provide detailed error messages and logging:

  • File not found: FileNotFoundError with clear message
  • Invalid format: RasterioIOError or RuntimeError with details
  • CRS issues: Warnings when CRS is missing or reprojection fails
  • Logging: All operations logged using loguru (DEBUG, INFO, SUCCESS, ERROR levels)