Air quality modeling is critical for understanding PM2.5 sources and pollution dynamics, providing a scientific basis for regulatory and policy applications. This study presents a comparative evaluation of three chemical transport models (CTMs) for simulating surface-level PM2.5 and its chemical composition, including secondary aerosols, in Northeast Asia across the seasons of 2019. All CTMs reproduced the broad spatial and temporal patterns of PM2.5 reasonably well, supported by shared anthropogenic emissions and consistent meteorological inputs. However, notable discrepancies likely resulted from a combination of factors, including differences in chemical mechanisms, potentially outdated or inconsistent emission inventories for carbonaceous, sulfur, and nitrogenous compounds, and the presence or absence of process modules such as pcSOA in CMAQ and wet scavenging in WRF-Chem. CMAQ showed the most balanced performance, particularly in Korea, accurately simulating PM2.5 mass and chemical components with realistic seasonal variability in secondary aerosols. WRFChem, with online coupling of meteorology and atmospheric chemistry, effectively simulated temporal variability but unusually overestimated PM2.5 in summer. GEOS-Chem captured long-range transport and background concentrations of PM2.5 associated with biomass burning and dust, although these results are specific to the model configurations used in this study, and was limited in resolving urban-scale variability and detailed dust processes. Our findings highlight distinct model behaviors and emphasize the importance of carefully considering model characteristics relative to the specific research or policy objectives. Improving emission inventories, refining chemical and physical process representations, and advancing multi-model approaches may enhance model performance and support both scientific and policy objectives.