File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

dpdata: A Scalable Python Toolkit for Atomistic Machine Learning Data Sets

Author(s)
Zeng, JinzhePeng, XingliangZhuang, Yong-BinWang, HaidiYuan, FengboZhang, DuoLiu, RenxiWang, YingzeTuo, PingZhang, YuzhiChen, YixiaoLi, YifanNguyen, Cao ThangHuang, JiamengPeng, AnyangRynik, MarianXu, Wei-HongZhang, ZezhongZhou, Xu-YuanChen, TaoFan, JiahaoJiang, WanrunLi, BowenLi, DenanLi, HaoxiLiang, WenshuoLiao, RuihaoLiu, LipingLuo, ChenxingWard, LoganWan, KaiweiWang, JunjieXiang, PanZhang, ChengqianZhang, JinchaoZhou, RuiZhu, Jia-XinZhang, LinfengWang, Han
Issued Date
2025-11
DOI
10.1021/acs.jcim.5c01767
URI
https://scholarworks.unist.ac.kr/handle/201301/91389
Fulltext
https://pubs.acs.org/doi/10.1021/acs.jcim.5c01767?src=getftr&utm_source=clarivate&getft_integrator=clarivate
Citation
JOURNAL OF CHEMICAL INFORMATION AND MODELING, v.65, no.21, pp.11497 - 11504
Abstract
Seamless management of atomistic data sets is a critical prerequisite for the successful development and deployment of machine learning potentials (MLPs). Here, we present dpdata, an open-source Python library designed to streamline every aspect of MLP data handling. Built upon a flexible, plugin-based architecture, dpdata supports reading, writing, and converting between a broad range of file formats-from popular quantum-chemistry packages and molecular-dynamics engines to specialized MLP frameworks. Users may define custom data types, formats, drivers, and minimizers, enabling effortless extension to emerging software. Key utilities include automated train-test splitting, coordinate perturbation for active learning, outlier-energy removal, Delta-learning data set generation, error-metric computation, and unit conversion. Through efficient NumPy-backed storage and system-level operations, dpdata achieves significant memory saving and inference speedups over configuration-by-configuration tools such as ASE. We also highlight practical impact, with dpdata used across published studies, for format conversion, data storage, coordinate perturbation, and utilization in other projects for data processing.
Publisher
AMER CHEMICAL SOC
ISSN
1549-9596
Keyword
MOLECULAR-DYNAMICS

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.