Masked pre-training has been widely demonstrated as an effective method for learning useful representations on vast amounts of unlabeled data. Following its success in natural language processing and computer vision, masked time series pre-training has been proposed to extend this approach to time series analysis. However, while existing methods have primarily focused on adapting approaches from other fields, they have often overlooked the inherent characteristics of time series data, which are significantly different from other data types. Specifically, time series data structurally contains: (a) cross-time dependencies; (b) cross-channel dependencies; and (c) noise and variability, which during data collection hinders the accurate learning of the structural characteristics. This thesis addresses these challenges by proposing masked time series pre-training models tailored to these unique characteristics of time series data. To capture cross-time dependency within time series, I propose ST-MTM, a masked time series pre-training models with seasonal-trend decomposition. By incorporating decomposition architecture in both masking and representation learning methods, ST-MTM effectively learns the representations of time series components by disentangling the distinct temporal variations from each component. Extensive evaluations on real-world benchmarks demonstrate ST-MTM’s superior performance in time series forecasting, where capturing intricate temporal dependency is crucial. To capture the cross-channel de- pendency in multivariate time series, I propose ShuffleMTM, a simple yet innovative masked time series pre-training model that captures cross-channel dependency through shuffled series. Specifically, ShuffleMTM adaptively incorporates the dependent structure from cross-channel patches through the patch shuffling methods and dependency bridge layer, thereby achieving the learning of both cross-channel and cross-time dependencies. Through the rigorous experiments on time series forecasting and classification, ShuffleMTM performs on par or better than state-of-the-art baselines and effectively captures both dependencies in multivariate time series. In addition, ShuffleMTM’s pre-training architecture improves the downstream performance of the time series foundation model, highlighting its potential to strengthen the cross-channel modeling capacity of large-scale, pre-trained time series models. Lastly, to address the inherent noise and variability arising from time series data collection, I develop a noise-variability-robust pre-training framework and introduce NERVE, a biosignal time series foundation model. As biosignals naturally contain considerable environmental noise and variability, this framework effectively enhances resilience to these factors, improving generalization performance across diverse downstream tasks. The masked time series pre-training models developed in this thesis tackle a broad spectrum of challenges comprehensively, encompassing both intrinsic structural characteristics and the exogenous factors that hinder their effective learning. The broad scope of this thesis, which extends from single-dataset learning to large-scale pre-training for enhanced generalization performance, offers compelling evidence of the effectiveness and fundamental contribution of masked time series pre-training for various time series problems.
Publisher
Ulsan National Institute of Science and Technology