Synthetic data-driven prediction and interpretation of biomethane production in direct interspecies electron transfer-stimulated anaerobic digestion process
Direct interspecies electron transfer (DIET) simulation has emerged as an effective strategy to enhance methane production in anaerobic digestion (AD). However, reliable prediction and interpretation of DIET-AD performance remain challenging due to nonlinearity and limited experimental datasets. In this study, we propose an integrated, synthetic data-assisted machine learning (ML) framework to predict methane enhancement in magnetitemediated DIET-AD systems, combining data augmentation, robust modeling, and experimental validation. The synthetic dataset was generated by generative adversarial network (GAN), which was rigorously validated from statistical, principal component, clustering analysis, and Kolmogorov-Smirov test to ensure consistency with original experimental data. Multiple ML models, including ensemble-based, neural network-based, and attentionbased algorithms, were systematically evaluated. Among them, an attention-based algorithm showed superior performance and robustness, achieving an average coefficient of determination (R2) of 0.92, root mean squared error (RMSE) of 6.30, and mean absolute error (MAE) of 4.04 on test data. Explainable artificial intelligence analysis using SHapley Additive exPlanations (SHAP) indicated that substrate concentration and magnetite dosage were dominant drivers of methane enhancement. Experimental biomethane potential (BMP) tests further confirmed model reliability, with observed enhancements of 5.1% and 15.8% falling within the predicted 95% confidence intervals of 3.5-6.6% and 15.6-21.3%, respectively. Therefore, this framework contributes to practical insight into a transferable strategy for robust prediction, interpretation, and experimental guidance of nonlinear bioprocesses under data-scarce conditions.