A comprehensive assessment of empirical potentials for carbon materials

Carbon materials and their unique properties have been extensively studied by molecular dynamics, thanks to the wide range of available carbon bond order potentials (CBOPs). Recently, with the increase in popularity of machine learning (ML), potentials such as Gaussian approximation potential (GAP), trained using ML, can accurately predict results for carbon. However, selecting the right potential is crucial as each performs differently for different carbon allotropes, and these differences can lead to inaccurate results. This work compares the widely used CBOPs and the GAP-20 ML potential with density functional theory results, including lattice constants, cohesive energies, defect formation energies, van der Waals interactions, thermal stabilities, and mechanical properties for different carbon allotropes. We find that GAP-20 can more accurately predict the structure, defect properties, and formation energies for a variety of crystalline phase carbon compared to CBOPs. Importantly, GAP-20 can simulate the thermal stability of C 60 and the fracture of carbon nanotubes and graphene accurately, where CBOPs struggle. However, similar to CBOPs, GAP-20 is unable to accurately account for van der Waals interactions. Despite this, we find that GAP-20 outperforms all CBOPs assessed here and is at present the most suitable potential for studying thermal and mechanical properties for pristine and defective carbon.

Density functional theory (DFT) models the electronic structure of materials and has the advantage of high precision, but it is too computationally expensive for large-scale molecular dynamics (MD) simulations. Conversely, carbon bond order potentials (CBOPs) can simulate large-scale systems over time, by using empirical potentials, but with lower accuracy. Therefore, significant research has been devoted to developing empirical potentials for carbon, which can simulate large-scale systems while maintaining high accuracy. [25][26][27][28][29][30][31][32][33][34] In 1988, Tersoff was the first to propose a multi-body potential model for carbon materials employing a bond order formalism, namely, the Tersoff potential, 25 which gives a highly accurate description of the structure and energy of amorphous carbon. Brenner et al. developed reactive empirical bond order potentials Brenner-I (REBO-I) and Brenner-II (REBO-II) based on the Tersoff potential with the addition of torsional interactions. 26,30 Both Tersoff and REBO potentials are good at representing covalent bonds but ignore long-range interactions, such as van der Waals interaction. Later, Stuart et al. added a Lennard-Jones (LJ) term to the REBO potential to make up for the missing long-range interaction, creating the adaptive intermolecular reactive bond order (AIREBO) potential. 28 However, AIREBO exhibits an unphysically divergent powerlaw repulsion when under pressure, which was solved by O'Connor et al., who used quantum chemistry to fit a Morse potential instead of a LJ potential. 33 Furthermore, Lindsay and Broido optimized the Tersoff and REBO parameter sets to improve the lattice dynamics and phonon dispersion accuracy, considerably improving the calculated thermal transport properties for graphene and CNTs compared to experiments. 32 Los and Fasolino built a long-range bond-order potential (LCBOP) for carbon, 31 which gives a good description of the diamond to the graphite phase transformation. van Duin et al. developed the ReaxFF potential to study largescale reactive chemical systems for hydrocarbons, 29 and in 2015, Srinivasan et al. reparameterized the original ReaxFF potential to create ReaxFF C2013 , which more accurately describes condensed carbon phases. 34 Justo et al. developed the environment-dependent interatomic potential (EDIP) for silicon as an improvement over the Tersoff potential. 35 Marks and Jiang et al. extended EDIP for carbon and silicon-carbon systems. 27,36 CBOPs have been used to explore the mechanical properties of carbon materials ever since they were developed. However, some CBOPs have been shown to cause a non-physical force enhancement in tensile test simulations, and a 1.92-2.0 Å C-C cutoff was suggested to avoid this phenomenon. [37][38][39][40] He et al. 38 carried out MD simulations on the Stone-Wales (SW) defect in graphene and revealed that defect orientation would dominate the mechanical properties. Song et al. 39 explored the fracture behavior of polycrystalline graphene via MD simulation, showing that the fracture first occurs at the grain boundary. Xu et al. 40 performed MD simulations on defective graphene and found that graphene with a large number of vacancy defects exhibits super-ductility.
In recent years, machine learning (ML) has been applied to the development of interatomic potentials. [41][42][43][44][45][46][47][48][49][50][51][52] Unlike the CBOPs mentioned above, machine learning interatomic potentials (ML-IAPs) do not rely on fixed mathematical expressions but instead learn the mathematical representation of the potential energy surface (PES) via training, which allows for more accurate predictions of energy, forces, and so on. For example, Deringer and Csányi built a Gaussian approximation potential (GAP) for liquid and amorphous carbon that showed close-to-DFT accuracy. 43 Wen and Tadmor employed a neural network to model short-range interactions and a theoretically motivated analytical term to model long-range dispersion in their hybrid neural network potential (hNN-Grx) for multilayer graphene, which can be used to study thermal conductivity in monolayer graphene and friction in bilayer graphene. 49 Wen and Tadmor also developed a set of Dropout Uncertainty Neural Network (DUNN) potentials for carbon to predict uncertainty for static (e.g., energy) and dynamical (e.g., phonon dispersion) properties. 51 Hedman et al. used deep learning to train neural network potentials on the nine-carbon allotrope dataset (CA-9), which reproduced ab initio results with high accuracy. 52 Finally, Rowe et al. developed the GAP-17 potential for graphene using the GAP model 46 and, most recently, the GAP-20 potential for various crystalline phase carbon and amorphous carbon, 50 including dispersion corrections and long-range interactions. Although current ML-IAPs have shown excellent accuracy in predicting static and dynamic properties of carbon allotropes, there is still a clear lack of testing related to the prediction of mechanical properties.
Although previous work has compared classic potential and DFT, they are limited to amorphous carbon and elastic properties of graphene. 53,54 Thus, in this work, we compare GAP-20 and widely used CBOPs (Tersoff, mo-Tersoff, AIREBO, mo-AIREBO, ReaxFF C2013 , EDIP, and LCBOP) with DFT in terms of lattice constants, cohesive energies, defect properties, van der Waals (vdW) interaction, thermal stability, and mechanical properties for the carbon allotropes diamond, graphite, graphene, and CNTs. As a result of our comprehensive testing, we propose the most suitable potential for simulating crystalline phases of carbon.

MODELS AND COMPUTATIONAL METHOD
The crystalline phases of carbon studied in this work are shown in Fig. 1. We employ periodic boundary conditions (PBCs) for all systems, with 15 Å of vacuum separating the graphene sheets in the z-direction and at least 10 Å of vacuum perpendicular to the (10,0) and (5,5) CNT axis to prevent self-interaction. C 60 is placed in a cubic simulation box with a side length of 20 Å.

Classical potentials
We select several widely used and representative potentials for testing, summarized in Table I. mo-AIREBO is the modified AIREBO where the C-C bond cutoff is set to 2.0 Å, and this form is mostly used to simulate the fracture behavior of carbon materials as mentioned above.

Simulation details
All simulations with the potentials listed in Table I were performed using the large-scale atomic/molecular massively parallel simulator (LAMMPS). 55 Lattice relaxation was performed on the PBC direction of all models with the conjugate gradient (CG) algorithm, and the convergence of energy and force is set to 10 −10 eV and 10 −8 eV/Å, respectively. In the tensile test, tensile strain is applied in the axial direction of the CNT and the armchair and zigzag directions of graphene, and the cell stress is obtained after relaxation. The fully relaxed structure is then taken as the initial structure for the next strain step. Here, the effective thickness of graphene and the CNT wall is considered to be 3.34 Å to get effective area S eff = l 0 × 3.34Å 2 and effective stress σ e f f = σ bulk × S 0 S e f f , where l 0 represents the graphene edge length and circumference of carbon  nanotubes and S 0 is the area of the simulation box perpendicular to the stretching direction. 18 In the thermal stability test, the Newton equations of motion were integrated using the Verlet algorithm with a time step of 0.5 fs. The temperature was set and maintained at 100-5000 K using the Nosé-Hoover thermostat in the NVT ensemble. 56,57 The entire system is simulated for 10 ps where the potential energy for the last 5 ps is collected and analyzed. All DFT calculations were carried out using the Vienna ab initio simulation package (VASP). 58,59 The generalized gradient approximation in the Perdew-Burke-Ernzerhof functional is used for the exchange-correlation function. 60 DFT-D3 was also introduced to ensure that the van der Waals was correctly described. 61 The planewave energy cutoff is 400 and 600 eV for structure optimization and van der Waals interaction calculation, respectively (the detailed plane-wave energy cutoff test is summarized in Table S1). The SCF convergence criterion was set to 10 −4 eV and relaxation was performed until the forces acting on the atoms was smaller than 10 −2 eV/Å. The Brillouin zone was represented by a Γ centered kpoint mesh with a grid spacing of 2π × 0.04 Å −1 . Furthermore, a 10 ps long ab initio molecular dynamics simulation, with a time step of 1.0 fs, in the NVT ensemble 56,62 was employed to test the thermal stability of C 60 .

Lattice constant
We begin by employing the potentials summarized in Table I to predict the lattice constants of CNTs, graphene, graphite, and diamond. The calculated lattice constants are presented in Table II. To quantify the potentials relative to the DFT, we calculate the relative error and the accuracy grade, PLC.
As per the results in Table II, most of the potentials tested can accurately determine lattice constants of the carbon allotropes tested. GAP-20, LCBOP, and ReaxFF C2013 optimized CNTs and graphene with an error of less than 0.5% relative to DFT. However, for the diamond lattice, GAP-20 and ReaxFF C2013 are not as accurate as the Tersoff, AIREBO, LCBOP, and EDIP potentials. The lattice constant in the c direction of graphite, dc, represents the double interlayer distance in bulk graphite and is mainly affected by long-range interactions. Here, relaxation with ReaxFF C2013 results in a value close to that of PBE-D3, indicating that the equilibrium position of the long-range interaction is similar, while GAP-20, mo-AIREBO, and AIREBO agree within ∼3%. Compared to the other lattice constants, the choice of exchange-correlation functional plays a larger role, using optB88-vdW 50 results in a dc of 6.65 Å, higher than 6.51 Å of PBE-D3 and closer to the values obtained by GAP-20 and the AIREBO potentials rather than ReaxFF C2013 . Absent values for the Tersoff potentials and EDIP are due to them not containing van der Waals (vdW) interactions; hence, they should not be used alone to optimize graphite. Many previous works have used additional terms such as Lennard-Jones or Kolmogorov-Crespi (K-C) to solve the problem of vdW interactions. 65 In general, all potentials can predict the lattice constant of crystalline carbon materials very well. GAP-20 and ReaxFF C2013 are the best of the ones tested here with overall accuracies of 99.20% and 99.66%, respectively.

Cohesive energy and formation energy
The cohesive energy describes the aggregation of atoms to form a crystal. It is also a key parameter for phase transition simulations, where its magnitude affects the difficulty of a phase transition. In this work, the cohesive energy per atom, Ec, is calculated by the following equation: where E is the total energy of crystalline phase carbon, Eatom is the energy of a single carbon atom in vacuum, and n is the number of carbon atoms in the crystalline phase. The cohesive energy per atom calculated by Eq. (1) for all potentials listed in Table I is summarized in Fig. 2(a). The experimentally measured cohesive energy of graphite and diamond is −7.374 and −7.346 eV, respectively, 66 and for most of the carbon allotropes tested here, the cohesive energy calculated using the CBOPs is around −7.4 eV. ReaxFF C2013 predicts, however, cohesive energies of around −5.0 eV due to the high single atom energy of Eatom = −2.06 eV, which is much larger than that of DFT. Compared to the CBOPs, the cohesive energies obtained using GAP-20 are much closer to the DFT results, giving it a high accuracy of about 95%.
To compare the accuracy in energy for the different potentials, we calculated the formation energy of carbon allotropes relative to graphite, where E is the total energy of the carbon allotrope, E graphite is the energy of a single atom in graphite, and n is the number of carbon atoms in the allotrope. As shown in Fig. 2(b), the formation energy, as calculated by Eq. (2), has the following order for most potentials: graphite < graphene < diamond < (10,0) CNT < (5,5) CNT < C 60 , in agreement with previous work. 50 Graphene and graphite have similar thermodynamically stable phases, where the formation energy of graphene (0.05 eV per atom) is lowest compared to diamond, C 60 , and the CNTs. Due to the lack of long-range interaction, CBOPs, such as Tersoff and EDIP, cannot correctly distinguish between graphite and graphene. For diamond, the GAP-20 result (0.12 eV) is closest to DFT (0.09 eV), while all other potentials (except mo-Tersoff) underestimate the formation energy of diamond. mo-Tersoff shows an abnormally high formation energy of 1.4 eV per atom.
For the (5,5) and (10,0) CNTs, the GAP-20 and ReaxFF C2013 results agree very well with DFT, both around 0.18 and 0.23 eV, respectively. The other potentials underestimate the value compared to DFT. EDIP calculates E f to be almost 0 eV for both CNTs, which is most likely because it was parameterized to describe Si-C systems, not one-dimensional carbon. For C 60 , the formation energy calculated by LCBOP, ReaxFFC 2013 , and GAP-20 is around 0.4 eV, close to the DFT value of 0.43 eV. EDIP also underestimates the formation energy of C 60 by 0.2 eV per atom, while Tersoff, mo-Tersoff, and AIREBO overestimate it.
In general, the CBOPs tested here overestimate the cohesive energy of carbon allotropes, while GAP-20 is in good agreement with DFT. Although some CBOPs, such as LCBOP (for C 60 ) and ReaxFF C2013 (for CNTs and graphene), can predict formation energy accurately, they exhibit an inability to generalize as they are specifically designed for a particular system or set of systems.

Defect formation energy and structures
Carbon materials, especially low-dimensional ones, often have defects as a result of their synthesis process. 67,68 With the advancement of simulation methods, it is possible to explore defect formation and healing mechanisms, [69][70][71] but it is important to have an accurate potential for the application. Therefore, in this part, we compare the defect formation energy and geometry of different defects using the potentials in Table I. The defect formation energy, E df , is obtained using the following equation: where E defect is the energy of the defective structure, Eatom is the energy of a single atom in perfect graphene, E perfect is the total energy of the defect-free structure, and m is the number of the introduced (positive) or removed (negative) atoms.

ARTICLE scitation.org/journal/apm
A 10 × 6 × 1 supercell of graphene is used to eliminate the defect self-interaction, and for each perfect structure, the lattice constants and atomic degrees of freedom are fully relaxed until the structure optimization reaches convergence. For the defective structures, only the atomic degrees of freedom are relaxed until convergence is reached.
Single vacancy (SV) is the most common point defect in graphene, where one atom in the graphene lattice is missing. It has been observed in experiments via STM 72 and TEM. 73 The structure and formation energy of the SV defect, relaxed with different potentials, are shown in Fig. 3 and Fig. S1. The formation energy for one SV defect obtained by DFT is 7.74 eV, which agrees well with the 7.7 eV obtained using the optB88-vdW functional report by Rowe et al. 50 AIREBO and LCBOP result in 7.65 and 7.60 eV, respectively, and the GAP-20 and EDIP are also within 1 eV of DFT, all lower than the reference value. Surprisingly, mo-Tersoff significantly underestimates the SV formation energy, resulting in 0.53 eV, and ReaxFF C2013 overestimated it by 2.3 eV, resulting in 10.2 eV. Thus, ReaxFF C2013 and mo-Tersoff are not suitable for describing the SV defects in graphene.
The geometry and formation energy are important properties for defects. To quantify the difference in structure, consider the symmetry of the SV defect, and we select three atom distances (d 1 , d 2 , and d 3 ) shown in Fig. 3(a) to characterize the defect geometry. The results are summarized in Table III and show that d 3 (2.46 Å for the defect-free structure) varies between 2.5 and 2.6 Å for most potentials, indicating that the three adjacent atoms do not undergo large displacement after one atom is removed. In contrast, d 3 for ReaxFF C2013 and mo-Tersoff is larger than 2.9 Å. This noticeable difference causes the vacancy by the missing atom to become more triangular.
To measure the accuracy of the potentials compared to DFT, we combine the formation energy and geometry to estimate the  accuracy of a single vacancy defect relative to DFT, P SV . In Table III, the formation energy and geometry are weighted equally at 50%.
Here, AIREBO and LCBOP show very high P SV of 98.92% and 98.10%, respectively, followed by GAP-20, Tersoff, and EDIP around 90%, while ReaxFF C2013 and mo-Tersoff have a lower P SV . Double vacancy (DV) defect is the removal of two adjacent atoms. As for the SV defect, we show the defect geometry in Fig. 4 and Fig. S2 and summarize the formation energy and the atom distance around the defect in Table IV. As shown in Figs. 4(a) and 4(b), after optimization by DFT and GAP-20, the DV defect will form a 5-8-5 structure where the length of d 3 in the pentagon is close to 1.89 Å. For the other potentials, the length of d 3 is greater than 2.3 Å, which exceeds the C-C bond length. Except for d 3 , the bond lengths of d 1 and d 2 for the CBOPs are close to the bond length obtained via DFT. For the DV defect, the formation energy obtained by DFT is 7.55 eV, which is lower than for the SV defects. However, the formation energy obtained by most potentials, except mo-Tersoff, is higher than the DFT reference value.
The accuracy of the potentials for the DV defect, P DV (evaluated with the same criteria as for the SV defect), is presented in Table IV.
Here, AIREBO and LCBOP, which perform best for the SV defect, cannot accurately model the DV defect nor can any other potential apart from GAP-20. Achieving a high accuracy of 96.70% as it reproduces the 5-8-5 structure obtained with DFT, the key relationship between energy and structure implies that only GAP-20 is sufficient among the potentials assessed here to model DVs in graphene.
Stone-Wales defects, unlike SV and DV defects, have no missing atoms. Instead, the graphene lattice has been reconstructed such that one C-C bond is rotated by 90 ○ , which results in four hexagons  V. Defect formation energy, atomic distance, and C-C-C bond angle for the Stone-Wales defect heptagon. Here, the atomic distances d 1 and d 2 and the bond angle θ characterize the defect geometry and P SW is the accuracy of a Stone-Wales defect.
Atomic transformed into two pentagons and two heptagons (5-7-7-5), as shown in Fig. 5 and Fig. S3. The formation energy E df , bond lengths d 1 and d 2 , and angle θ, respectively, as well as the accuracy relative to DFT P SW are presented in Table V.
The pentagons and heptagons predicted by DFT are like regular polygons, and the SW formation energy is 4.99 eV. Potentials such as GAP-20 and EDIP are close to DFT both in terms of structure and formation energy. However, mo-AIREBO fails to optimize the 5-7-7-5 structure and returns a highly overestimated formation energy, E f = 27.94 eV, despite mo-AIREBO and AIREBO performing very similarly for the SV and DV defects. mo-AIREBO is unable to describe the SW defect due to the modification of the C-C bond cutoff, making it unsuitable for modeling significant changes in the atomic environment.
Among the potentials compared to DFT, GAP-20 and LCBOP perform the best, while EDIP and ReaxFF C2013 maintain accuracies above 90%. Although the Tersoff potential accurately predicts the SW defect structure, it overestimates the formation energy.

Ad-atom(s)
The hexagonal structure of graphene is very stable, and the insertion of additional carbon atoms is energetically expensive. The formation energy and structural information of a single ad-atom defect are presented in Table VI. A single carbon atom may be embedded at the bridge or top position, as shown in Figs. 6(a) and 6(b). All potentials optimize the ad-atom to be embedded at the bridge position, except for AIREBO. The formation energy obtained via DFT is 6.54 eV, while the other potentials range from 4.21 to 9.79 eV. Surprisingly, mo-AIREBO achieved a high accuracy of 92.03%.
Adding two carbon atoms to graphene is more complex since they will first form a dimer and then be embedded in a hexagon to form a new 7-5-5-7 structure, with a change in the graphene curvature, as shown in Figs. 6(d) and 6(e) and Fig. S4. After optimization with DFT, an ad-dimer formation energy of 6.18 eV is obtained, with GAP-20 resulting in a value of 5.91 eV. GAP-20 also reproduces TABLE VII. Defect formation energy, atomic distances, and C-C-C bond angles around the ad-dimer defect. Here, the three atomic distances d 1 , d 2 , and d 3 and the two bond angles θ 1 and θ 2 characterize the defect geometry and P add-2 is the accuracy of an ad-dimer defect.

Atom distance (Å)
Angle (deg) Interestingly, AIREBO reproduces the structure of the ad-atom and ad-dimer defects but underestimates the formation energy. While mo-AIREBO cannot reproduce the structures but more accurately matches the energies calculated by DFT. The formation energy and structure of the ad-dimer defects are summarized in Table VII.
We have shown that CBOPs can describe some but not all defects well. For example, Tersoff hasa 93% accuracy in SV defects, but it performs poorly in describing the larger structural changes associated with SW, resulting in only 48.82% accuracy. The reason for this is that CBOPs often only consider crystal properties at the beginning of their development, with parameters obtained for defect-free structures. Thus, there will be large errors in describing the properties of defects. On the contrary, the training set for GAP-20 contains defect-free and defective structures; therefore, it can more accurately describe the structure and formation energy of defects.

ARTICLE
scitation.org/journal/apm different edges will affect the growth of graphene. [74][75][76] Here, we define the edge energy, E edge , as the energy density of the edge given by the following equation: where Ecut is the energy of the graphene or CNTs with the exposed edge, E perfect is the energy of the periodic structure, and l is the length (perimeter) of the exposed edge. All edge energies for graphene and CNTs calculated using Eq. (4) are summarized in Table VIII. The armchair edge energy, E AC edge = 1.01 eV/Å, for graphene is lower than the zigzag edge energy, E ZZ edge = 1.33 eV/Å, as predicted by DFT and consistent with the results obtained by Baran et al., 77 Li et al., 78 and Hedman and Larsson 79,80 Among the potentials, only GAP-20 and ReaxFF C2013 successfully predict this trend (E AC edge < E ZZ edge ). EDIP, LCBOP, ReacFF C2013 , and GAP-20 are within 5% of the DFT value for the armchair edge energy, but only GAP-20 is within 10% of the DFT calculated zigzag edge energy, showing that a high accuracy in energy is needed to correctly describe the relationship between armchair and zigzag edges. The edges of (10,0) and (5,5) CNTs are zigzag and armchair, respectively, where E (5,5) edge < E (10,0) edge as predicted by DFT and consistent with the trend of edge energies in graphene. Only GAP-20 and AIREBO can accurately reproduce this trend for the (5,5) and (10,0) edge energies. For the (5,5) edge energy, EDIP predicts the same value as DFT, while most potentials slightly overestimate it, except for Tersoff and ReaxFF C2013 (which give higher values) and mo-Tersoff  For all graphene and CNT edges examined, GAP-20 has an accuracy of 89.83%, while Tersoff, AIREBO, and LCBOP also perform impressively, slightly worse than GAP-20 but still with more than 80% accurate.

van der Waals interaction
The van der Waals interaction dominates the mobility of stacked graphene layers where the direction and difficulty of slip depend on the potential energy surface (PES). Here, the vdW interaction for bi-layer graphene, E vdW , is calculated with the following equation: where E bi is the energy of bi-layer graphene, E 0 is the energy of a single graphene sheet, and N is the number of atoms in the bi-layer system. Both AA stacking and AB stacking are considered, and the vdW interaction for different inter-layer distances, d, is plotted in Fig. 7(a). In the AIREBO potential, the vdW interaction is modeled by a Lennard-Jones term, which leads to an exaggerated inter-layer interaction in the repulsive region and an underestimation in the attractive region. LCBOP also underestimates the attractive part but describes the repulsive part more accurate than AIREBO, especially for the AB stacking. Among the CBOPs, the curve of ReaxFF C2013 in Fig. 7(a) is always below the DFT curve, indicating that it underestimates the repulsive interaction and overestimates the attractive interaction. Notably, the GAP-20 potential has two local minima for both the AA and AB stacking, as shown in Fig. 7(a), which is far from the optB88-vdw results used in the GAP-20 training set, indicating that the potential may have been overfitted during training. Figure 7(b) shows the energy difference between AA stacking and AB stacking, exhibits a continuous decrease for DFT and CBOPs, while GAP-20 shows a non-continuous decrease due to the two-local minima. In addition, the E AA−AB obtained by the CBOPs drops to almost 0 meV at 3.2 Å. This small energy difference indicates that the CBOPs can no longer distinguish between AA stacking and AB stacking in bilayer graphene after this distance compared to PBE-D3 and optB88-vdw, which fall to 0 meV after 4 Å.
The calculated PES for bilayer graphene is reported in Figs. 7(c)-7(e) and Fig. S5 for a fixed interlayer distance of 3.4 Å. It is worth noting that when using the current DFT-D3 method for PES scanning, the global minimum is distributed around AB stacking. The shape of the PES is reproduced with the CBOPs, but with an order of magnitude lower energy barrier. A lower energy barrier makes the layers in graphite more prone to sliding. Although GAP-20 produces a barrier height similar to DFT, the shape of the PES is very different compared to DFT and the CBOPs. The sliding path between AB stacking and AA stacking has a higher barrier, which causes the layers in graphite to be locked. The CBOPs tested here benefit from an added long-range interaction, but they are still relatively inaccurate and cannot distinguish between AA stacking and AB stacking in bilayer graphene beyond 3.2 Å separation. Although

ARTICLE
scitation.org/journal/apm the training set used for the GAP-20 potential was created using optB88-vdw, the trained potential cannot accurately reproduce the PES of bi-layer graphene present here.

Thermal stability
MD simulations with temperature ramping were performed on C 60 to compare the temperature responses between the potentials and DFT. The potential energy per atom has a two-stage relationship with temperature, as shown in Fig. 8(a). First, the potential energy increases linearly with temperature and then abruptly rises, indicating the breaking of C-C bonds and the simultaneous release of energy. The temperature at which this occurs is considered as the melting point and is summarized in Fig. 8(b). As seen in Fig. 8(a), mo-Tersoff exhibits an abnormal relationship where the potential energy abruptly falls instead of rising, indicating that energy is gained when C-C bonds are broken, a clearly unphysical behavior, making it unsuitable for thermal studies. DFT predicts that C 60 remains intact at 4500 K and decomposes at 5000 K, with a slightly deformed structure at 4500 K, as shown in Fig. 8(c). Most potentials assessed here predict C 60 melting at 4000 K, with mo-AIREBO and mo-Tersoff being unsuitable for thermal studies due to changes in their original parameters causing premature melting. From a structural point of view, mo-AIREBO and mo-Tersoff are very similar, showing C 60 decomposing to amorphous carbon at 2500 and 2000 K, respectively. As the temperature rises, the structure turns to a long carbon chain at 3000 K. GAP-20 results in C 60 melting and forming carbon chains above 4500 K. The atomic structures for the rest of the CBOPs are given in Fig. S6.

Mechanical properties
Here, we employ GAP-20 and CBOPs to perform uniaxial tensile tests on graphene and (5,5) and (10,0) CNTs to screen the most suitable potential for modeling mechanical properties compared to DFT. We consider the strain energy, calculated via where Es and E 0 are the potential energy with and without the strain and n is the number of atoms in the structure. Tensile strain is applied along the axial direction of (5,5) and (10,0) CNTs, as shown in Figs. 9(a) and 9(d). From the DFT obtained stress-strain curve of the (5,5) CNT in Fig. 9(b), a fracture strength of 90 GPa at 33% strain is observed. GAP-20 most closely reproduces this behavior, making the (5,5) CNT completely fracture at 26% strain with a strength of 120 GPa, close to the theoretical and experimental results reported by Zhang et al. 81 and Lourie et al. 82 The AIREBO potential exhibits two stages in the (5,5) tensile test; before 30% strain, the stress-strain curve is similar to DFT. Then, the stress unphysically increases until the tube fractures and the same phenomenon can be seen for the other CBOPs (dashed lines). Changing the C-C bond cutoff like in the mo-AIREBO results in the normal fracture behavior. As seen in Fig. 9(e), the mechanical properties of the (10,0) tube are like that of the (5,5) tube where the facture strain predicted by DFT and GAP-20 is around 20%. For both the (5,5) and (10,0) tubes, mo-AIREBO shows a slow decrease in the stress before fracture, indicating the breaking of individual C-C bonds. At the point of fracture, the strain energy reaches a maximum and then abruptly decreases, as shown in Figs. 9(c) and 9(f), indicating that the structure of the CNT has been destroyed. Snapshots of the CNT after fracture are plotted in Fig. 10, showing that the CBOPs cannot accurately reproduce the DFT fracture results. Unlike the CBOPs, GAP-20 can accurately describe the C-C bond break and fracture of the CNT into halves without overstretching the CNT, as observed for CBOPs.
The tensile test procedure for graphene is slightly different than for the CNTs. Here, the zigzag (armchair) direction will be relaxed when the strain is acting along on the armchair (zigzag) direction to ensure uniaxial load, as shown in Fig. 11(a). When the strain acts in the armchair direction, Figs. 11(b) and 11(c), DFT predicts fracture at 25% strain with a strength of 115 GPa, which agrees well with previous published results (130 and 121 GPa for experimental and DFT results, respectively). 18,83 Similar to the CNT tensile tests, all CBOPs aside from mo-AIREBO show an unphysical increase in stress before fracture and cannot accurately reproduce the mechanical behavior of graphene. GAP-20 agrees well with DFT, with a minor overestimation of the fracture strength, while mo-AIREBO overestimates it and predicts fracture to occur at almost 40% strain along the armchair direction rather than 25%.   Figs. 12(a) and 12(b), revealing that the CBOPs cause graphene to be over-stretched, thus overestimating the ductility and incorrectly preventing fracture. Tersoff and mo-Tersoff result in a structural transformation and deformation, while ReaxFF C2013 sees the formation of multiple defects. In contrast, GAP-20 caused the graphene to undergo brittle fracture at a reasonable strain, consistent with DFT, while mo-AIREBO only reproduced a DFT-like behavior for the uniaxial tensile test in the armchair direction of graphene.
To evaluate the accuracy of the predicted mechanical properties for the various potentials, the fracture strain, fracture strength, critical strain energy, and Young's modulus are summarized in Tables IX and X. One of the reasons why CBOPs cannot model the fracture behavior correctly is due to the C-C bond cutoff, which defines the extent of the covalent bond. This causes C-C bonds that should be broken during the tensile test to still maintain a covalent interaction. We demonstrate that CBOPs cannot correctly describe the breaking of C-C bonds, which leads to unphysical results. However, they still predict mechanical properties such as Young's modulus, within the elastic deformation region, close to previous experimental results (1.0 TPa) and theoretical predictions (1.04 TPa) for graphene. [83][84][85][86][87] The GAP-20 training set contains structures placed under varying degrees of stress, which is sufficient for the trained potential to learn the bond-breaking process. Furthermore, mo-AIREBO and GAP-20 produce results similar to DFT in the elastic and plastic regions, making them suitable for simulating mechanical properties, especially the fracture behavior. However, in combination with the defects and thermal stability test results shown here, GAP-20 is the more suitable potential  for simulating mechanical properties of the defect system at high temperatures.

Accumulated accuracy of the potentials
For each property tested in this work, we plot the accuracy of each CBOP and GAP-20 in Fig. 13 and compare their overall   Fig. 14. Clearly, GAP-20 performs better than all CBOPs studied here, especially in the simulation of cohesive energy, defects, thermal stability, and mechanical properties. Among them, the accurate prediction of mechanical properties is lacking for CBOPs, and GAP-20 makes up for this deficiency with high accuracy.
From our tests, we find that all potentials can predict the crystal structure, but only GAP-20 is accurate within 95% of the DFTcalculated cohesive energy. In reproducing the formation energy and optimizing the structures of common graphene point defects (SV, DV, and SW), only GAP-20 and LCBOP reach 90% accuracy compared to DFT. Notably, mo-AIREBO cannot accurately obtain the 5-7-7-5 structure and significantly overestimates the formation energy of SW defects. For the ad-atom defect, the accuracy of GAP-20, LCBOP, and ReaxFF C2013 is higher than 85%. Although mo-AIREBO most accurately deals with the ad-atom defect, it cannot reproduce the induced curvature graphene from the ad-dimer defect such as EDIP and mo-Tersoff. GAP-20 can accurately predict the edge energies of graphene and the (5,5) CNT and agrees that E edge AC < E edge ZZ , but like the rest of the CBOPs, it underestimates the edge energy of the (10,0) CNT. In the thermal stability test, mo-Tersoff and mo-AIREBO underestimate the melting point of C 60 significantly to 2000 K, while most of the potentials observe C 60 withstand temperatures above 3000 K, with GAP-20 best reproducing the potential energy increase with temperature. We find that neither the CBOPs nor GAP-20 can describe van der Waals interactions correctly. CBOPs underestimate the interaction energy by an order of magnitude compared to DFT, and GAP-20 cannot reproduce the shape of the potential energy surface obtained with DFT, although it produces comparable energy barriers. Furthermore, we find that CBOPs cannot accurately reproduce the fracture behavior of graphene and CNTs. In addition, while reducing the C-C cutoff can remedy this, it then renders them unable to predict defects and thermal properties. In contrast, GAP-20 can correctly describe bond breakage, defects, and structures at high temperatures, making it suitable to accurately model the mechanical properties of carbon materials with defects and under high temperatures. Our results show that machine learning is a promising method for constructing universally applicable potentials for simulating and predicting material properties with much higher accuracy compared to traditional potentials that are restricted to specific applications due to their more limited parameterization.

SUPPLEMENTARY MATERIAL
See the supplementary material for the complete atomic structures of defective graphene, PES of bilayer graphene, and stress-strain curves of CNT and graphene when using different CBOPs.