Comparative forensic analysis of acidic soil pollution using artificial neural network based on next-generation sequencing and terminal restriction fragment length polymorphism
Strong acids can induce severe geochemical disruptions in soil by directly damaging microbial communities through toxicity, pH reduction, corrosion, and oxidative stress. With increasing awareness of acid contamination in soils, this study aimed to identify pollution sources such as HCl, HF, HNO3, and H2SO4 by analyzing 16S rRNA gene profiles of acidophilic microorganisms. Upon acid exposure, soil pH rapidly declined to between 1.8 and 2.0. Next-generation sequencing (NGS) and terminal restriction fragment length polymorphism (T-RFLP) analyses revealed a reduction in Proteobacteria and a corresponding increase in acidophilic Firmicutes. Clustering analysis showed distinct microbial community structures depending on the acid type. T-RFLP data provided clearer group separation than NGS. However, accurate identification of specific contaminants remained challenging. A machine learning model employing artificial neural networks achieved 94.4 percent accuracy in predicting acid types using species-level NGS data. When applied to T-RFLP data, the model reached 86.9 percent accuracy. This was similar to the predictive performance observed using genus-and family-level classifications from NGS. Augmenting the T-RFLP dataset further improved model accuracy. These findings demonstrate that integrating machine learning with molecular microbial profiling offers a promising approach for monitoring and identifying sources of acidic soil contamination.