This wetland vegetation map of the Great Cumbung Swamp was produced using a machine learning-based classification framework that integrates multi-source satellite and terrain with a cluster-guided training approach (Wen et al., 2025).
Inputs and training data
Inputs included Sentinel-1 synthetic aperture radar (SAR) time series, Sentinel-2 optical time series, and hydro-morphological variables derived from a gap-filled 5 m LiDAR digital elevation model (DEM) and hydrologically enforced shuttle radar topography mission (SRTM) DEM. 
To capture the high spatial and seasonal variability of wetland vegetation, K-means clustering was used to guide sample selection. Clusters were reviewed by an expert vegetation ecologist against high-resolution aerial and drone imagery, topographic context, and existing field data, and then assigned to plant community types (PCTs) where appropriate. The verified clusters formed the basis of the training dataset for a Random Forest classifier which used 48 predictors (spectral, temporal, structural, terrain). Model outputs were produced at three hierarchical class levels: NSW Vegetation Formations (L1: 9 classes), Functional (L2: 14 classes) and PCTs (L3: 23 classes).
Post-processing and manual edits
Following classification, model outputs were post-processed to enhance spatial coherence while preserving hydrologically meaningful patches. Steps included edge-aware smoothing and progressive gap-filling/merging with class-specific minimum mapping units (MMU): < 0.1 ha for non-woody wetland PCTs and < 0.2 ha for woody wetland PCTs. Outputs were then manually edited by an expert vegetation ecologist to resolve any residual artifacts and boundary issues.
Model accuracy assessment
The following metrics are the raw model output (before post-processing and editing) performance for each class level in Wen et al. (2025) (reported on internal independent test set). Metrics include Overall Accuracy (OA), Cohens Kappa (κ) and Matthews Correlation Coefficient (MCC):
- 
NSW Vegetation Formations (L1): OA ≈ 97 %, κ ≈ 0.96, MCC ≈ 0.96;
 
- 
Functional (L2): OA ≈ 94 %, κ ≈ 0.93, MCC ≈ 0.93;
 
- 
PCTs (L3): OA ≈ 93 %, κ ≈ 0.91, MCC ≈ 0.89
 
Class hierarchy
Labels were assigned at PCT level using the NSW BioNet Vegetation Classification (https://vegetation.bionet.nsw.gov.au/) and then aligned to the NSW framework’s Vegetation Class and Formation levels (https://www.environment.nsw.gov.au/topics/animals-and-plants/biodiversity/nsw-bionet/the-nsw-vegetation-classification-framework). For water management reporting, each wetland PCT was aligned to a Monitoring, Evaluation and Reporting (MER) Functional Group consistent with the Lachlan Long-Term Water Plan (LTWP) (https://www.environment.nsw.gov.au/sites/default/files/lachlan-long-term-water-plan).
Key fields dictionary
‘PCT_ID’ (PCT Code); ‘PCT_Desc’ (PCT Name); ‘Veg_Class’ (NSW Vegetation Class); ‘Veg_Format’ (NSW Vegetation Formation); ‘MER_FG’ (MER Functional Group for LTWP reporting); ‘Hectares’ (polygon area); ‘DN’ (classifier code) and ‘Functional’ (model specific functional group per Wen et al., 2025). Context classes (‘Bare ground’, ‘Cleared/Disturbed’, ‘Open water’, ‘Dam') are included for completeness and accuracy assessment.
Intended use
Baseline for environmental water planning, MER reporting under the LTWP, conservation management, and long-term monitoring at landscape and site scales. Not intended for statutory site assessment without targeted field verification.
Input data limitations
Cloud, inundation state and sensor geometry may influence satellite image quality and contribute to classification error; LiDAR and ancillary datasets may differ in acquisition date from satellite inputs. Localised errors in source DEMs/orthophoto errors can propagate to terrain-derived predictors. 
Validation scope
The above model accuracy metrics are from internal hold-out testing (80/20 train-test split) and repeated cross-validation of the expert-labelled dataset in Wen et al. (2025). A withheld, ground-based validation dataset collected independent of model training will be used to validate the final post-processed and edited map product; those results will be provided in future versions to supplement the raw model accuracy values for reporting purposes. Users requiring statutory-grade evidence should conduct targeted field verification.
Versioning
This version is v1.0 (release date: 2025-10-30). Results are versioned; Identified errors will be corrected in subsequent releases with an accompanying changelog. 
Acknowledgements
This mapping project was funded by the NSW Water for the Environment Program.
Related publication
Wen, L., Ryan, S., Powell, M., and Ling, J.E. (2025). From Clusters to Communities: Enhancing Wetland Vegetation Mapping Using Unsupervised and Supervised Synergy. Remote Sensing, 17(13): 2279. https://doi.org/10.3390/rs17132279