Hydroclimatological Modeling Using Data Mining And Chaos Theory
Abstract
The land–atmosphere interactions and the coupling between climate and land surface hydrological processes are gaining interest in the recent past. The increased knowledge in hydro climatology and the global hydrological cycle, with terrestrial and atmospheric feedbacks, led to the utilization of the climate variables and atmospheric tele-connections in modeling the hydrological processes like rainfall and runoff. Numerous statistical and dynamical models employing different combinations of predictor variables and mathematical equations have been developed on this aspect. The relevance of predictor variables is usually measured through the observed linear correlation between the predictor and the predictand. However, many predictor climatic variables are found to have been switching the relationships over time, which demands a replacement of these variables. The unsatisfactory performance of both the statistical and dynamical models demands a more authentic method for assessing the dependency between the climatic variables and hydrologic processes by taking into account the nonlinear causal relationships and the instability due to these nonlinear interactions.
The most obvious cause for limited predictability in even a perfect model with high resolution observations is the nonlinearity of the hydrological systems [Bloschl and Zehe, 2005]. This is mainly due to the chaotic nature of the weather and its sensitiveness to initial conditions [Lorenz, 1963], which restricts the predictability of day-to-day weather to only a few days or weeks.
The present thesis deals with developing association rules to extract the causal relationships between the climatic variables and rainfall and to unearth the frequent predictor patterns that precede the extreme episodes of rainfall using a time series data mining algorithm. The inherent nonlinearity and uncertainty due to the chaotic nature of hydrologic processes (rainfall and runoff) is modeled through a nonlinear prediction method. Methodologies are developed to increase the predictability and reduce the predictive uncertainty of chaotic hydrologic series.
A data mining algorithm making use of the concepts of minimal occurrences with constraints and time lags is used to discover association rules between extreme rainfall events and climatic indices. The algorithm considers only the extreme events as the target episodes (consequents) by separating these from the normal episodes, which are quite frequent and finds the time-lagged relationships with the climatic indices, which are treated as the antecedents. Association rules are generated for all the five homogenous regions of India (as defined by Indian Institute of Tropical Meteorology) and also for All India by making use of the data from 1960-1982. The analysis of the rules shows that strong relationships exist between the extreme rainfall events and the climatic indices chosen, i.e., Darwin Sea Level Pressure (DSLP), North Atlantic Oscillation (NAO), Nino 3.4 and Sea Surface Temperature (SST) values. Validation of the rules using data for the period 1983-2005, clearly shows that most of the rules are repeating and for some rules, even if they are not exactly the same, the combinations of the indices mentioned in these rules are the same during validation period with slight variations in the representative classes taken by the indices.
The significance of treating rainfall as a chaotic system instead of a stochastic system for a better understanding of the underlying dynamics has been taken up by various studies recently. However, an important limitation of all these approaches is the dependence on a single method for identifying the chaotic nature and the parameters involved. In the present study, an attempt is made to identify chaos using various techniques and the behaviour of daily rainfall series in different regions. Daily rainfall data of three regions with contrasting characteristics (mainly in the spatial area covered), Malaprabha river basin, Mahanadi river basin and All India for the period 1955 to 2000 are used for the study. Auto-correlation and mutual information methods are used to determine the delay time for the phase space reconstruction. Optimum embedding dimension is determined using correlation dimension, false nearest neighbour algorithm and also nonlinear prediction methods. The low embedding dimensions obtained from these methods indicate the existence of low dimensional chaos in the three rainfall series considered. Correlation dimension method is repeated on the phase randomized and first derivative of the data series to check the existence of any pseudo low-dimensional chaos [Osborne and Provenzale, 1989]. Positive Lyapunov exponents obtained prove the exponential divergence of the trajectories and hence the unpredictability. Surrogate data test is also done to further confirm the nonlinear structure of the rainfall series.
A limit in predictability in chaotic system arises mainly due to its sensitivity to the infinitesimal changes in its initial conditions and also due to the ineffectiveness of the model to reveal the underlying dynamics of the system. In the present study, an attempt is made to quantify these uncertainties involved and thereby improve the predictability by adopting a nonlinear ensemble prediction. A range of plausible parameters is used for generating an ensemble of predictions of rainfall for each year separately for the period 1996 to 2000 using the data till the preceding year. For analyzing the sensitiveness to initial conditions, predictions are made from two different months in a year viz., from the beginning of January and June. The reasonably good predictions obtained indicate the efficiency of the nonlinear prediction method for predicting the rainfall series. Also, the rank probability skill score and the rank histograms show that the ensembles generated are reliable with a good spread and skill. A comparison of results of the three regions indicates that although they are chaotic in nature, the spatial averaging over a large area can increase the dimension and improve the predictability, thus destroying the chaotic nature.
The predictability of the chaotic daily rainfall series is improved by utilizing information from various climatic indices and adopting a multivariate nonlinear ensemble prediction. Daily rainfall data of Malaprabha river basin, India for the period 1955 to 2000 is used for the study. A multivariate phase space is generated, considering a climate data set of 16 variables. The redundancy, if any, of this atmospheric data set is further removed by employing principal component analysis (PCA) method and thereby reducing it to 8 principal components (PCs). This multivariate series (rainfall along with 8 PCs) are found to exhibit a low dimensional chaotic nature with dimension 10. Nonlinear prediction is done using univariate series (rainfall alone) and multivariate series for different combinations of embedding dimensions and delay times. The uncertainty in initial conditions is thus addressed by reconstructing the phase space using different combinations of parameters. The ensembles generated from multivariate predictions are found to be better than those from univariate predictions. The uncertainty in predictions is reduced or in other words, the predictability is improved by adopting multivariate nonlinear ensemble prediction. The restriction on predictability of a chaotic series can thus be reduced by quantifying the uncertainty in the initial conditions and also by including other possible variables, which may influence the system. Even though, the sensitivity to initial conditions limit the predictability in chaotic systems, a prediction algorithm capable of resolving the fine structure of the chaotic attractor can reduce the prediction uncertainty to some extent. All the traditional chaotic prediction methods are based on local models since these methods model the sudden divergence of the trajectories with different local functions. Conceptually, global models are ineffective in modeling the highly unstable structure of the chaotic attractor [Sivakumar et al., 2002a]. This study focuses on combining a local learning wavelet analysis (decomposition) model with a global feedforward neural network model and its implementation on phase space prediction of chaotic streamflow series. The daily streamflow series at Basantpur station in Mahanadi basin, India is found to exhibit a chaotic nature with dimension varying from 5-7. Quantification of uncertainties in future predictions are done by creating an ensemble of predictions with wavelet network using a range of plausible embedding dimension and delay time. Compared with traditional local approximation approach, the total predictive uncertainty in the streamflow is reduced when modeled with wavelet networks for different lead times. Localization property of wavelets, utilizing different dilation and translation parameters, helps in capturing most of the statistical properties of the observed data. The need for bringing together the characteristics of both local and global approaches to model the unstable yet ordered chaotic attractor is clearly demonstrated.
Collections
- Civil Engineering (CiE) [349]