Source code for symfluence.core.config.config_loader

# SPDX-License-Identifier: GPL-3.0-or-later
# Copyright (C) 2024-2026 SYMFLUENCE Team <dev@symfluence.org>

"""
Configuration loading, normalization, and validation for SYMFLUENCE.

This module provides the core configuration loading pipeline that transforms raw YAML
configuration files into validated, type-safe configuration dictionaries. It handles
key normalization, backwards compatibility via aliases, type coercion, environment
variable overrides, and user-friendly validation error messages.

Configuration Flow:
    1. **Load**: Read YAML configuration file from disk
    2. **Normalize**: Apply key aliases and type coercion (normalize_config)
    3. **Override**: Apply environment variable overrides (_load_env_overrides)
    4. **Validate**: Validate against Pydantic schema (validate_config)
    5. **Use**: Configuration ready for SYMFLUENCE components

Key Functionality:
    Normalization (normalize_config):
        - Converts all keys to uppercase for consistency
        - Applies alias mappings for backwards compatibility
        - Coerces string values to appropriate types (bool, int, float, list)
        - Handles legacy CONFLUENCE naming → SYMFLUENCE

    Validation (validate_config):
        - Checks for required fields (8 mandatory keys)
        - Validates field types using Pydantic models
        - Validates enum values (literal choices)
        - Provides detailed error messages with suggestions

    Type Coercion (_coerce_value):
        - Booleans: 'true', 'yes', '1' → True; 'false', 'no', '0' → False
        - None: 'none', 'null', '' → None
        - Numbers: Automatic int/float detection
        - Lists: Comma-separated strings → list of items
        - Pass-through: Other types unchanged

    Environment Overrides (_load_env_overrides):
        - Reads SYMFLUENCE_* environment variables
        - Strips prefix and normalizes keys
        - Applies type coercion
        - Overrides file-based configuration

    Error Formatting (_format_validation_error):
        - Groups errors by type (missing, invalid, other)
        - Suggests similar field names for typos (fuzzy matching)
        - Shows expected vs actual values
        - Links to documentation and templates

Alias Mapping:
    Normalization aliases are defined in ``legacy_aliases.NORMALIZATION_ALIASES``
    and provide backwards compatibility::

        GR_SPATIAL → GR_SPATIAL_MODE
        OPTIMISATION_METHODS → OPTIMIZATION_METHODS (UK → US spelling)
        OPTIMISATION_TARGET → OPTIMIZATION_TARGET
        OPTIMIZATION_ALGORITHM → ITERATIVE_OPTIMIZATION_ALGORITHM
        CONFLUENCE_DATA_DIR → SYMFLUENCE_DATA_DIR (legacy name)
        CONFLUENCE_CODE_DIR → SYMFLUENCE_CODE_DIR
        DOMAIN_DISCRETIZATION → SUB_GRID_DISCRETIZATION

Required Configuration Fields:
    Core:
        - SYMFLUENCE_DATA_DIR: Data directory path
        - SYMFLUENCE_CODE_DIR: Code directory path
        - DOMAIN_NAME: Basin/domain identifier
        - EXPERIMENT_ID: Experiment/run identifier

    Temporal:
        - EXPERIMENT_TIME_START: Simulation start time (YYYY-MM-DD HH:MM)
        - EXPERIMENT_TIME_END: Simulation end time (YYYY-MM-DD HH:MM)

    Spatial:
        - DOMAIN_DEFINITION_METHOD: Delineation method (lumped/TBL/distribute)
        - SUB_GRID_DISCRETIZATION: Discretization approach (lumped/elevation/...)

    Model:
        - HYDROLOGICAL_MODEL: Model name (SUMMA/FUSE/GR/...)
        - FORCING_DATASET: Forcing dataset name (ERA5/CONUS404/...)

Environment Variable Support:
    All configuration keys can be overridden via environment variables using
    the SYMFLUENCE_ prefix::

        export SYMFLUENCE_DOMAIN_NAME="test_basin"
        export SYMFLUENCE_EXPERIMENT_ID="run_001"
        export SYMFLUENCE_HYDROLOGICAL_MODEL="SUMMA"

    Environment variables are:
    - Normalized using same rules as file-based config
    - Type-coerced automatically
    - Applied after file loading (highest precedence)

Validation Error Handling:
    When validation fails, the module provides structured error messages::

        ======================================================================
        Configuration Validation Failed
        ======================================================================

        Missing Required Fields:
        ----------------------------------------------------------------------
          ✗ DOMAIN_NAME

          Tip: Use 'symfluence config list' to see available templates

        Invalid Field Values:
        ----------------------------------------------------------------------
          ✗ HYDROLOGICAL_MODEL: Input should be 'SUMMA', 'FUSE', ...
            Expected: One of ['SUMMA', 'FUSE', 'GR', 'HYPE', 'NGEN', ...]
            Got: summa

        Possible Typos (Did you mean?):
        ----------------------------------------------------------------------
          'DOMAINNAME' → 'DOMAIN_NAME'
          'FORCINGDATASET' → 'FORCING_DATASET'

        ======================================================================

Usage Example:
    Basic configuration loading::

        >>> from symfluence.core.config.config_loader import (
        ...     normalize_config, validate_config
        ... )
        >>> import yaml
        >>>
        >>> # Load raw config from YAML
        >>> with open('config.yaml') as f:
        ...     raw_config = yaml.safe_load(f)
        >>>
        >>> # Normalize keys and values
        >>> normalized = normalize_config(raw_config)
        >>>
        >>> # Validate and get type-safe config
        >>> config = validate_config(normalized)
        >>>
        >>> # Use in SYMFLUENCE
        >>> domain_name = config['DOMAIN_NAME']
        >>> model = config['HYDROLOGICAL_MODEL']

    Type coercion examples::

        >>> from symfluence.core.config.config_loader import _coerce_value
        >>>
        >>> _coerce_value('true')
        True
        >>> _coerce_value('3.14')
        3.14
        >>> _coerce_value('1,2,3,4')
        ['1', '2', '3', '4']
        >>> _coerce_value('none')
        None

    Alias normalization::

        >>> from symfluence.core.config.config_loader import _normalize_key
        >>>
        >>> _normalize_key('gr_spatial')
        'GR_SPATIAL_MODE'
        >>> _normalize_key('confluence_data_dir')
        'SYMFLUENCE_DATA_DIR'

Integration:
    This module is used by:
    - CLI commands: config validation before workflow execution
    - Project initialization: Template-based configuration setup
    - Configuration manager: Runtime config access and modification
    - All preprocessors: Require validated configuration

    The module integrates with:
    - core.config.models.SymfluenceConfig: Pydantic schema definition
    - core.config.defaults.ModelDefaults/ForcingDefaults: Model/forcing-specific overrides
    - core.exceptions.ConfigurationError: Custom exception types

Error Recovery:
    Common configuration errors and fixes:

    1. Missing field:
       Error: "Missing required configuration keys: DOMAIN_NAME"
       Fix: Add DOMAIN_NAME: "my_basin" to config.yaml

    2. Invalid enum value:
       Error: "HYDROLOGICAL_MODEL: Input should be 'SUMMA', 'FUSE', ..."
       Fix: Use exact case-sensitive value (e.g., SUMMA not summa)

    3. Type mismatch:
       Error: "EXPERIMENT_TIME_START: Input should be a valid datetime"
       Fix: Use format YYYY-MM-DD HH:MM (e.g., 2015-01-01 00:00)

    4. Typo in key:
       Suggestion: 'DOMAINNAME' → 'DOMAIN_NAME'
       Fix: Use suggested key with underscore

Notes:
    - All keys are normalized to UPPERCASE for consistency
    - Type coercion is best-effort; validation catches type errors
    - Environment variables override file-based configuration
    - Alias mapping ensures backwards compatibility with legacy configs
    - Validation uses Pydantic for type safety and schema enforcement
    - Error messages include actionable suggestions and documentation links

See Also:
    - core.config.models.SymfluenceConfig: Pydantic configuration schema
    - core.config.defaults: ModelDefaults/ForcingDefaults helpers
    - core.config.config_manager.ConfigManager: Configuration access interface
    - core.exceptions.ConfigurationError: Configuration-related exceptions
    - cli.commands.config_commands: CLI configuration management commands
"""
from __future__ import annotations

import os
from difflib import get_close_matches
from typing import Any, Dict

from pydantic import ValidationError

from symfluence.core.config.legacy_aliases import NORMALIZATION_ALIASES
from symfluence.core.config.models import SymfluenceConfig


[docs] def normalize_config(config: Dict[str, Any]) -> Dict[str, Any]: """ Normalize configuration keys using aliases and perform type coercion. Args: config: Dictionary of configuration settings Returns: New dictionary with normalized keys and coerced values """ normalized = {} for k, v in config.items(): norm_key = _normalize_key(k) normalized[norm_key] = _coerce_value(v) return normalized
[docs] def validate_config(config: Dict[str, Any]) -> Dict[str, Any]: """ Validate configuration using Pydantic model. Args: config: Dictionary of configuration settings Returns: Validated configuration dictionary Raises: ValueError: If configuration is invalid """ required_fields = [ 'SYMFLUENCE_DATA_DIR', 'SYMFLUENCE_CODE_DIR', 'DOMAIN_NAME', 'EXPERIMENT_ID', 'EXPERIMENT_TIME_START', 'EXPERIMENT_TIME_END', 'DOMAIN_DEFINITION_METHOD', 'SUB_GRID_DISCRETIZATION', 'HYDROLOGICAL_MODEL', 'FORCING_DATASET', ] missing = [key for key in required_fields if not config.get(key)] if missing: raise ValueError(f"Missing required configuration keys: {', '.join(missing)}") try: # We filter out None values to let Pydantic defaults/validators handle them # or raise errors for required fields clean_config = {k: v for k, v in config.items() if v is not None} model = SymfluenceConfig(**clean_config) return model.model_dump() except ValidationError as e: # Format error with actionable suggestions error_msg = _format_validation_error(e, config) raise ValueError(error_msg) from e
def _load_env_overrides() -> Dict[str, Any]: """ Load configuration overrides from environment variables. """ env_overrides = {} prefix = "SYMFLUENCE_" for env_key, env_value in os.environ.items(): if env_key.startswith(prefix): config_key = env_key[len(prefix):] norm_key = _normalize_key(config_key) env_overrides[norm_key] = _coerce_value(env_value) return env_overrides def _normalize_key(key: str) -> str: key_upper = key.upper() return NORMALIZATION_ALIASES.get(key_upper, key_upper) def _coerce_value(value: Any) -> Any: """Helper to attempt basic coercion for values.""" if not isinstance(value, str): return value stripped = value.strip() lower = stripped.lower() if lower in ('true', 'yes', '1'): return True if lower in ('false', 'no', '0'): return False if lower in ('none', 'null', ''): return None # Try number try: if "." in stripped: return float(stripped) return int(stripped) except ValueError: pass # Handle comma-separated lists if "," in stripped: return [item.strip() for item in stripped.split(",")] return stripped def _format_validation_error(error: ValidationError, config: Dict[str, Any]) -> str: """ Format Pydantic ValidationError with helpful suggestions. Args: error: Pydantic ValidationError config: Configuration dict that failed validation Returns: Formatted error message with suggestions """ error_lines = ["=" * 70] error_lines.append("Configuration Validation Failed") error_lines.append("=" * 70) missing_fields = [] invalid_values = [] other_errors = [] # Get all valid field names from the model valid_fields = set(SymfluenceConfig.model_fields.keys()) for err in error.errors(): field_name = str(err['loc'][0]) if err['loc'] else 'unknown' error_type = err['type'] error_msg = err['msg'] if error_type == 'missing': missing_fields.append(field_name) elif 'literal' in error_type.lower() or 'type' in error_type.lower(): invalid_values.append((field_name, error_msg, err.get('ctx', {}))) else: other_errors.append((field_name, error_msg)) # Format missing fields if missing_fields: error_lines.append("\nMissing Required Fields:") error_lines.append("-" * 70) for field in missing_fields: error_lines.append(f" ✗ {field}") error_lines.append("") error_lines.append(" Tip: Use 'symfluence config list' to see available templates") # Format invalid values with suggestions if invalid_values: error_lines.append("\nInvalid Field Values:") error_lines.append("-" * 70) for field, msg, ctx in invalid_values: error_lines.append(f" ✗ {field}: {msg}") # Add expected values if available in context if 'expected' in ctx: error_lines.append(f" Expected: {ctx['expected']}") # Add actual value if provided in config if field in config: error_lines.append(f" Got: {config[field]}") # Format other validation errors if other_errors: error_lines.append("\nValidation Errors:") error_lines.append("-" * 70) for field, msg in other_errors: error_lines.append(f" ✗ {field}: {msg}") if field in config: error_lines.append(f" Current value: {config[field]}") # Check for potential typos in config keys config_keys = set(k.upper() for k in config.keys()) unknown_keys = config_keys - valid_fields if unknown_keys: suggestions = {} for unknown in unknown_keys: matches = get_close_matches(unknown, valid_fields, n=3, cutoff=0.6) if matches: suggestions[unknown] = matches if suggestions: error_lines.append("\nPossible Typos (Did you mean?):") error_lines.append("-" * 70) for wrong_key, correct_options in suggestions.items(): options_display = ", ".join([f"'{opt}'" for opt in correct_options]) error_lines.append(f" '{wrong_key}' → {options_display}") # Add helpful footer error_lines.append("") error_lines.append("=" * 70) error_lines.append("For configuration help:") error_lines.append(" • List templates: symfluence config list") error_lines.append( " • Example configs: src/symfluence/resources/config_templates/examples/*_tutorial.yaml" ) error_lines.append(" • Docs: https://github.com/CH-Earth/SUMMA") error_lines.append("=" * 70) return "\n".join(error_lines)