### Some basics on the Kalman filter

If you want to read about Time Series Lab in a practical setting, then please go to our **Case studies** section, using the Case studies link below. But if you are interested in an introduction to the methodology of our time series analysis and forecasting, then stay here and continue reading.

### Finding a signal in the time series

You may have heard of Nate Silvers' famous book

The Signal and the Noise: Why Most Predictions Fail – but Some Don't.

**Signal** and **noise** are key concepts in time series analysis and forecasting. The signal in a time series represents the information of interest for the analyst. We aim to extract the signal as accurately as we can from a time series because the signal facilitates decision-making and forecasting. In general, the signal identifies the structural dynamic dependencies in the time series. These structures lead to the formulation of an optimal forecast function. Noise on the other hand is just... well noise. Noise is what remains from the time series after we have accounted for the signal. The statistical formulation of the signal and the noise in the time series is the most important task for a time series analyst. As the signal usually is subject to different dynamic features, it is often decomposed into different components of interest such as trend, seasonal and cycle. In this approach, we obtain the **Structural Time Series Model** which is set up in terms of these directly interpretable components.

**Components with a direct interpretation**

An example is the forecasting of electricity demand. The signal of a time series of electricity demand often contains strong seasonal and cyclical patterns. Energy demand is not constant over time. On the contrary, it is a time-varying process that is affected by hour of day, day of week, and the four seasons in a year. When we formulate the signal accurately with multiple seasonal and cyclical components, we can forecast the energy demand with much more precision, compared to a model that does not take these effects into account.

The next question is: "how do we decompose a time series into signal and noise?". The short answer is "we apply a filter". The more elaborate answer is provided below.

### Structural time series models

The **Structural Time Series Model** allows the explicit modelling of the trend, seasonal and error term, together with other relevant components for a time series at hand. It aims to present the stylised facts of a time series in terms of its different dynamic features which are represented as **unobserved components**. In the seminal book of Harvey, A. (1989) it is stated as follows: The statistical formulation of the trend component in a structural model needs to be flexible enough to allow it to respond to general changes in the direction of the series. A trend is not seen as a deterministic function of time about which the series is constrained to move for ever more. In a similar way the seasonal component must be flexible enough to respond to changes in the seasonal pattern. A structural time series model therefore needs to be set up in such a way that its components are stochastic; in other words, they are regarded as being driven by random disturbances.

A framework that is flexible enough to handle the above requirements is the **State Space model**.

The unobserved components can be extracted with the use of **filters**. More specifically, Time Series Lab - State Space Edition adopts the celebrated Kalman filter which is widely seen as a highly effective tool for analysing time series in fields such as economics, finance, engineering, medicine, and many more.

### State Space models and the Kalman filter

For an extensive discussion of **State Space Models** and its accompanying **Kalman filter** we refer to the books of Harvey (1989) and Durbin and Koopman (2012).

The aim of the State Space Model is to provide a unifying specification of linear time series models. The central part of the model is the **state vector** that contains all relevant dynamic features of the time series. The state vector evolves over time by means of a linear dynamic system of equations. For example, consider the picture below for a state vector that contains trend (with level and slope, or "growth") and seasonal components.

The next step is to establish the optimal move from the **state** at time *t* to the one at time *t+1*. For example, in the case of daily data, assume that time *t* occurs on a Monday. Based on historical data, we have determined the trend and seasonal effects for this Monday. Hence, we have an estimate of the signal (trend plus seasonal) available for time *t*. When the data point for this Monday has been observed, we can establish how close the estimated signal has been. We can measure the prediction error as a result. The next challenge is to determine an estimate (or a prediction) of the seasonal effect for the next day, which is then for a Tuesday. It makes sense to adjust the earlier estimate for Tuesday based on the most recent prediction error, although the error is for a Monday. But how to make this adjustment? How to weight the current estimate and the prediction error of the day before? Here the Kalman filter comes in to play: it provides the optimal weights for this updating step, or for this transition step from Monday to Tuesday (and next, from Tuesday to Wednesday, etc.).

The State Space Model is a very powerful tool which opens the way for the handling of a wide range of time series models. Once a model has been put in state space form, the Kalman filter can be applied straightforwardly. It also opens up other estimation algorithms for prediction, filtering, smoothing, and forecasting. It also solves issues related to missing entries in the time series.

The essence of the Kalman filter is summarised by Harvey, A. (1989) as follows:
The Kalman filter rests on the assumption that the disturbances and initial state vector are Normally distributed. A standard result on the multivariate Normal distribution is then used to show how it is possible to calculate recursively the distribution of the state, conditional on the information set at time t, for all t from 1 to the end of the time series. These conditional distributions are themselves Normal and hence are completely specified by their means and covariance matrices. It is these quantities which the Kalman filter computes. It can be shown that the mean of the conditional distribution of the state is an optimal estimator of the state, in the sense that it minimises the mean square error (MSE). When the Normality assumption is dropped, there is no longer any guarantee that the Kalman filter will give the conditional mean of the state vector. However, it is still an optimal estimator in the sense that it minimises the mean square error within the class of all linear estimators.

### Linear estimator

We can conclude that the Kalman filter produces optimal estimates of the signal and its components for a class of **linear** time series models, including the **Structural Time Series Model**. So what if we need a **non-linear** estimator? We encounter these situations in for example volatility modelling or the modelling of count data (number of goals scored by a football team, number of earthquakes in a region, or number of Covid-19 cases to name a few), but many more situations can arise. Time Series Lab offers a powerful alternative for non-linear time series modelling and the methodology that is developed for this class of estimators is described in the next article.

# Bibliography

### References

Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. *Oxford university press*.

Harvey, A. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. *Cambridge: Cambridge University Press*. doi:10.1017/CBO9781107049994