Time series analysis and data-driven model discrimination beyond usual assumptions: ideas, methods and examples - Illia Horenko

Abstract

Due to the intrinsically multiscale nature, results of data analysis in many real-life application areas may be inherently biased by the implicit assumptions imposed by the data analysis methods. For example, Bayesian learning approaches impose the strong implicit a priori assumptions like Gaussianity (for Gaussian Mixture Models) and homogenous Markovianity/independence(for Hidden Markov Models and related approaches like Bayesian Mixture Models).

More generally, all of the standard learning approaches involve in some form implicit a priori assumptions about the memory in the data, about the probabilistic model formulation (usually in form of the parametric probability distribution function and/or the parametric stochastic model), some form of sequential statistical independence (in time ands pace) and stationarity/homogeneity assumptions (i.e., the assumption that the inferred probabilistic model parameters, e.g. the mean value and the variance, are not changing significantly for the whole available data set). These a priori method assumptions (through a bias that they induce) may also induce problems when comparing different model descriptions for the same data, i.e. on the step of statistical model discrimination (e.g., through standard cross-validation approaches).

Generic scenarios where these implicit assumptions of available methods may become violated for realistic processes will be discussed from mathematical perspective and some alternative approaches to numerical data modeling and model discrimination beyond the standard probabilistic methodology will be described. Main ideas behind the non-stationary and non-parametric time series analysis framework (combining concepts from the functional analysis, partial differential equations,statistics, information theory and high-performance computing) that allows to go beyond these implicit assumptions of standard tools will be explained.

A brief general overview of published real-life applications of the resulting FEM-BV-framework (Finite Element Model of data analysis with Bounded Variation of model parameters) will be given, showing some examples from:

(i) geosciences (analysis of climate/ocean/atmosphere data),
(ii)bio-sciences and bio-informatics (identification of metastable spatialconfigurations of biological molecules, unsupervised identification of coding andnon-coding regions in DNA),
(iii) sociology (non-stationary factor analysis of politicalpreferences of German voters),
(iv) economics (non-stationary risk minimization for investment portfolios),
(v) engineering(analysis and compression of turbulent flow data).

Review paper on FEM-BV time series analysis methodology:

Ph. Metzner, L. Putzig and I. Horenko
Analysis of persistent non-stationary time series and applications
Communications in Applied Mathematics and Computational Science, 7(2), 175-229, 2012