Are KAN and KAN-based models Effective for Time Series Forecasting
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
KAN4TSF: Are KAN and KAN-based models
Effective for Time Series Forecasting
Abstract
Time series forecasting is a crucial task that predicts the future values of variables
based on historical data. Time series forecasting techniques have been developing
in parallel with the machine learning community, from early statistical learning
methods to current deep learning methods. Although existing methods have made
significant progress, they still suffer from two challenges. The mathematical
theory of mainstream deep learning-based methods does not establish a clear
relation between network sizes and fitting capabilities, and these methods often lack
interpretability. To this end, we introduce the Kolmogorov-Arnold Network (KAN)
into time series forecasting research, which has better mathematical properties
and interpretability. First, we propose the Reversible Mixture of KAN experts
(RMoK) model, which is a KAN-based model for time series forecasting. RMoK
uses a mixture-of-experts structure to assign variables to KAN experts. Then, we
compare performance, integration, and speed between RMoK and various baselines
on real-world datasets, and the experimental results show that RMoK achieves the
best performance in most cases. And we find the relationship between temporal
feature weights and data periodicity through visualization, which roughly explains
RMoK’s mechanism. Thus, we conclude that KAN and KAN-based models
(RMoK) are effective in time series forecasting. Code is available at KAN4TSF:
https://github.com/2448845600/KAN4TSF.
1 Introduction
Time series forecasting (TSF) is the task of using historical data to predict future states of variables.
This research area includes a broad scope of applications, such as financial investment, weather
forecasting, traffic estimation, and health management Bi et al. [2023], Gao et al. [2023], Savcisens
et al. [2024], Han et al. [2024a]. The machine learning community’s progress has long inspired time
series forecasting technology: the popularity of early statistical learning methods gave rise to SVR
and ARMIA, while the development of deep learning introduced MLP and Transformer into time
series forecasting. At present, various time series forecasting methods cover almost all deep learning
network architectures, such as RNN, CNN, Transformer, and MLP Nie et al. [2023], Wu et al. [2023],
Han et al. [2024b]. The forecasting models derived from different network architectures have their
own advantages in forecasting performance, running speed, and resource usage.
Although deep learning-based models have made notable progress in time series forecasting, there
are still several challenges. The universal approximation theorem (UAT), which is the mathematical
foundation of most mainstream forecasting models, cannot provide a guarantee on the necessary
network sizes (depths and widths) to approximate a predetermined continuous function with specific
Preprint. Under review.
arXiv:2408.11306v1 [cs.LG] 21 Aug 2024accuracy. And this theory can only achieve an approximation rather than a representation. The
limitations of UAT have become the sword of Damocles hanging over time series forecasting.
Furthermore, the prediction mechanism of existing models is black-box, resulting in a lack of
interpretability. These nontransparent methods are suspected of being suitable for tasks that require a
low tolerance for errors, such as medicine, law and finance.
Kolmogorov-Arnold Network (KAN) Liu et al. [2024a], which is based on the Kolmogorov-Arnold
representation theorem (KART), has become a novel approach to solving the above challenges. On the
one hand, KART proves that a multivariate continuous function can be represented as a combination
of finite univariate continuous functions. This theorem establishes the relationship between network
size and input shape under the premise of representation. On the other hand, KAN offers a pruning
strategy that simplifies the trained KAN into a set of symbolic functions, enabling the analysis
of specific modules’ mechanisms, thereby significantly enhancing the network’s interpretability.
In addition, KAN’s function fitting idea is consistent with the properties of time series, such as
periodicity and trend, which is conducive to embedding prior knowledge into the network structure
and improving the performance of the network.
Despite being a relatively recent proposal, KAN, which employs a trainable 1D B-spline functions to
convert incoming signals, has already sparked numerous efforts to improve or broaden its capabilities.
Some studies propose KAN’s variants which replace the B-splines with Chebyshev polynomials SS
[2024], wavelet functions Bozorgasl and Chen [2024], Jacobi polynomials Aghaei [2024], ReLU
functions Qiu et al. [2024], etc., to accelerate training speed and improve network performance.
Other studies introduce KAN with existing popular network structures for various applications. For
example, ConvKAN Bodner et al. [2024] and GraphKAN Zhang and Zhang [2024], Xu et al. [2024]
are proposed for image processing and graph processing. In summary, KANs have been extensively
empirically studied in vision and language Azam and Akhtar [2024], Yu et al. [2024]. However,
existing studies lack a KAN-based model that considers time series domain knowledge, making it
impossible to verify whether KAN is effective in time series forecasting.
To this end, we aim to propose a KAN-based model for the time series forecasting task and evaluate its
effectiveness from four perspectives: performance, integration, speed, and interpretability. First, we
propose the Reversible Mixture of KAN Experts model (RMoK), a KAN-based time series forecasting
model that uses multiple KAN variants as experts and a gating network to adaptively assign variables
to specific experts for prediction. RMoK is implemented as a single-layer network because we
hope that it can have similar performance and better interpretability than existing methods. Then,
we use a unified training and evaluation setting to compare the performance of RMoK and current
popular baselines on seven real-world datasets. The experimental results show that RMoK achieves
state-of-the-art (SOTA) performance in most cases. Subsequently, we conduct a comprehensive
empirical study on KAN-based models, including the comparison between KAN and Linear, the
effect of integrating KANs with the Transformer, and the speed of the KAN-based models. Finally,
we discuss the interpretability of RMoK using the example of weather prediction. We visualize the
weights of temporal features at different time steps in KAN and find the correlation between the
weight distribution and the periodicity of the data.
To sum up, the contributions of this work include:
• To the best of our knowledge, this is the first work that comprehensively discusses the
effectiveness of the booming KANs for time series forecasting.
• To validate our claims, we propose the Reversible Mixture of KAN Experts model, which
uses a single layer of the mixture of KAN experts to keep a balance between performance
and interpretability.
• We fairly compare the performance between RMoK and baselines on seven real-world
datasets, and the experimental results show that RMoK achieves the best performance in
most cases. And we also conduct a comprehensive empirical study on KAN-based models
about integration and speed.
• We mine the relationship between time feature weights and data periodicity through visual
ization, which roughly explains the mechanism of RMoK.
In summary, compared with the baselines in terms of performance, integration, speed, and inter
pretability, we conclude that KAN is effective in time series forecasting.
2Linear
KAN
Figure 1: The computational process of Linear and KAN layers under a certain input and output
dimension.
2 Problem Defintion
In multivariate time series forecasting, given historical data X = [X1, · · · , XT ] ∈ R
T ×C , where T
is the time steps of historical data and C is the number of variates. The time series forecasting task is
to predict Y = [XT +1, · · · , XT +P ] ∈ R
P ×C during future P time steps.
3 Related Work
3.1 Time-Series Forecasting Models
Although Transformer-based methods have almost become the standard in CV and NLP, various
network architectures (such as Transformer, CNN, and MLP) are competing in time series forecasting
recently.