Evaluation of Consistency and Reliability Among Four Brain Segmentation Tools for Cortical Thickness Measurement

Emekli, EMRE; Emekli, Esra; Demirel, Burak; Toprak, Uğur

doi:10.1016/j.jneumeth.2026.110758

Evaluation of Consistency and Reliability Among Four Brain Segmentation Tools for Cortical Thickness Measurement

Emekli E., Emekli E., Demirel B. C., Toprak U.

JOURNAL OF NEUROSCIENCE METHODS, cilt.2026, ss.1-19, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 2026
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.jneumeth.2026.110758
Dergi Adı: JOURNAL OF NEUROSCIENCE METHODS
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), BIOSIS, EMBASE, MEDLINE
Sayfa Sayıları: ss.1-19
Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

Background

Accurate assessment of cortical thickness is crucial for understanding structural brain changes in both clinical and research contexts. With the increasing use of automated segmentation tools in neuroimaging, differences in the methodologies and performance of these tools have become a critical consideration.

New method

This study evaluates the consistency and interchangeability of cortical thickness measurements obtained from four widely used automated segmentation software platforms: FreeSurfer (FS), FastSurfer (FaS), BrainSuite (BS), and volBrain (volB). Forty healthy adults underwent T1-weighted MRI, and cortical thickness was assessed in 32 bilateral regions using each tool. Statistical analyses included the Friedman test, Wilcoxon signed-rank test, and intraclass correlation coefficient (ICC).

Results

Significant differences in cortical thickness measurements were observed across all software packages. FS and FaS demonstrated the greatest consistency. FS measurements showed no correlation with any other software, while FaS showed moderate agreement with volB and BS. Seven cortical regions lacked any inter-software consistency. Intra-software reliability was high for FS, FaS, and BS, whereas volB demonstrated moderate consistency.

Comparison with existing methods

Previous studies have typically focused on single-software pipelines or pairwise comparisons. Our approach provides a broader, systematic evaluation across four major segmentation tools, highlighting methodological discrepancies that are not fully documented in existing literature.

Conclusions

These findings underscore the methodological risks of using different segmentation software interchangeably, particularly for sensitive morphometric metrics such as cortical thickness. The results emphasize the need for increased methodological transparency and standardization in neuroimaging studies to ensure reproducibility and comparability of findings.