A New Mathematical Framework for Regional Frequency Analysis of Floods

Basu, Bidroha

View/Open

G26935.pdf (8.299Mb)

Date

2018-06-19

Author

Basu, Bidroha

Metadata

Show full item record

Abstract

Reliable estimates of design flood quantiles are often necessary at sparsely gauged/ungauged target locations in river basins for various applications in water resources engineering. Development of effective methods for use in this task has been a long-standing challenge in hydrology for over five decades.. Hydrologists often consider various regional flood frequency analysis (RFFA) approaches that involve (i) use of regionalization approach to delineate a homogeneous group of watersheds resembling watershed of the target location, and (ii) use of a regional frequency analysis (RFA) approach to transfer peak flow related information from gauged watersheds in the group to the target location, and considering the information as the basis to estimate flood quantile(s) for the target site. The work presented in the thesis is motivated to address various shortcomings/issues associated with widely used regionalization and RFA approaches. Regionalization approaches often determine regions by grouping data points in multidimensional space of attributes depicting watershed’s hydrology, climatology, topography, land-use/land-cover and soils. There are no universally established procedures to identify appropriate attributes, and modelers use subjective procedures to choose a set of attributes that is considered common for the entire study area. This practice may not be meaningful, as different sets of attributes could influence extreme flow generation mechanism in watersheds located in different parts of the study area. Another issue is that practitioners usually give equal importance (weight) to all the attributes in regionalization, though some attributes could be more important than others in influencing peak flows. To address this issue, a two-stage clustering approach is developed in the thesis. It facilitates identification of appropriate attributes and their associated weights for use in regionalization of watersheds in the context of flood frequency analysis. Effectiveness of the approach is demonstrated through a case study on Indiana watersheds. Conventional regionalization approaches could prove effective for delineating regions when data points (depicting watersheds) in watershed related attribute space can be segregated into disjoint groups using straight lines or linear planes. They prove ineffective when (i) data points are not linearly separable, (ii) the number of attributes and watersheds is large, (iii) there are outliers in the attribute space, and (iv) most watersheds resemble each other in terms of their attributes. In real world scenario, most watersheds resemble each other, and regions may not always be segregated using straight lines or linear planes, and dealing with outliers and high-dimensional data is inevitable in regionalization. To address this, a fuzzy support vector clustering approach is proposed in the thesis and its effectiveness over commonly used region-of-influence approach, and different cluster analysis based regionalization methods is demonstrated through a case study on Indiana watersheds. For the purpose of regional frequency analysis (RFA), index-flood approach is widely used over the past five decades. Conventional index-flood (CIF) approach assumes that values of scale and shape parameters of frequency distribution are identical across all the sites in a homogeneous region. In real world scenario, this assumption may not be valid even if a region is statistically homogeneous. Logarithmic index-flood (LIF) and population index-flood (PIF) methodologies were proposed to address the problem, but even those methodologies make unrealistic assumptions. PIF method assumes that the ratio of scale to location parameters is a constant for all the sites in a region. On the other hand, LIF method assumes that appropriate frequency distribution to fit peak flows could be found in log-space, but in reality the distribution of peak flows in log space may not be closer to any of the known theoretical distributions. To address this issue, a new mathematical approach to RFA is proposed in L-moment and LH-moment frameworks that can overcome shortcomings of the CIF approach and its related LIF and PIF methods that make various assumptions but cannot ensure their validity in RFA. For use with the proposed approach, transformation mechanisms are proposed for five commonly used three-parameter frequency distributions (GLO, GEV, GPA, GNO and PE3) to map the random variable being analyzed from the original space to a dimensionless space where distribution of the random variable does not change, and deviations of regional estimates of all the distribution’s parameters (location, scale, shape) with respect to their population values as well as at-site estimates are minimal. The proposed approach ensures validity of all the assumptions of CIF approach in the dimensionless space, and this makes it perform better than CIF approach and related LIF and PIF methods. Monte-Carlo simulation experiments revealed that the proposed approach is effective even when the form of regional frequency distribution is mis-specified. Case study on watersheds in conterminous United States indicated that the proposed approach outperforms methods based on index-flood approach in real world scenario. In recent decades, fuzzy clustering approach gained recognition for regionalization of watersheds, as it can account for partial resemblance of several watersheds in watershed related attribute space. In working with this approach, formation of regions and quantile estimation requires discerning information from fuzzy-membership matrix. But, currently there are no effective procedures available for discerning the information. Practitioners often defuzzify the matrix to form disjoint clusters (regions) and use them as the basis for quantile estimation. The defuzzification approach (DFA) results in loss of information discerned on partial resemblance of watersheds. The lost information cannot be utilized in quantile estimation, owing to which the estimates could have significant error. To avert the loss of information, a threshold strategy (TS) was considered in some prior studies, but it results in under-prediction of quantiles. To address this, a mathematical approach is proposed in the thesis that allows discerning information from fuzzy-membership matrix derived using fuzzy clustering approach for effective quantile estimation. Effectiveness of the approach in estimating flood quantiles relative to DFA and TS was demonstrated through Monte-Carlo simulation experiments and case study on mid-Atlantic water resources region, USA. Another issue with index flood approach and its related RFA methodologies is that they assume linear relationship between each of the statistical raw moments (SMs) of peak flows and watershed related attributes in a region. Those relationships form the basis to arrive at estimates of SMs for the target ungauged/sparsely gauged site, which are then utilized to estimate parameters of flood frequency distribution and quantiles corresponding to target return periods. In reality, non-linear relationships could exist between SMs and watershed related attributes. To address this, simple-scaling and multi-scaling methodologies have been proposed in literature, which assume that scaling (power law) relationship exists between each of the SMs of peak flows at sites in a region and drainage areas of watersheds corresponding to those sites. In real world scenario, drainage area alone may not completely describe watershed’s flood response. Therefore flood quantile estimates based on the scaling relationships can have large errors. To address this, a recursive multi-scaling (RMS) approach is proposed that facilitates construction of scaling (power law) relationship between each of the SMs of peak flows and a set of site’s region-specific watershed related attributes chosen/identified in a recursive manner. The approach is shown to outperform index-flood based region-of-influence approach, simple-and multi-scaling approaches, and a multiple linear regression method through leave-one-out cross validation experiment on watersheds in and around Indiana State, USA. The conventional approaches to flood frequency analysis (FFA) are based on the assumption that peak flows at the target site represent a sample of independent and identically distributed realization drawn from a stationary homogeneous stochastic process. This assumption is not valid when flows are affected by changes in climate and/or land use/land cover, and regulation of rivers through dams, reservoirs and other artificial diversions/storages. In situations where evidence of non-stationarity in peak flows is strong, it is not appropriate to use quantile estimates obtained based on the conventional FFA approaches for hydrologic designs and other applications. Downscaling is one of the options to arrive at future projections of flows at target sites in a river basin for use in FFA. Conventional downscaling methods attempt to downscale General Circulation Model (GCM) simulated climate variables to streamflow at target sites. In real world scenario, correlation structure exists between records of streamflow at sites in a study area. An effective downscaling model must be parsimonious, and it should ensure preservation of the correlation structure in downscaled flows to a reasonable extent, though exact reproduction/mimicking of the structure may not be necessary in a climate change (non-stationary) scenario. A few recent studies attempted to address this issue based on the assumption of spatiotemporal covariance stationarity. However, there is dearth of meaningful efforts especially for multisite downscaling of flows. To address this, multivariate support vector regression (MSVR) based methodology is proposed to arrive at flood return levels (quantile estimates) for target locations in a river basin corresponding to different return periods in a climate change scenario. The approach involves (i) use of MSVR relationships to downscale GCM simulated large scale atmospheric variables (LSAVs) to monthly time series of streamflow at multiple locations in a river basin, (ii) disaggregation of the downscaled streamflows corresponding to each site from monthly to daily time scale using k-nearest neighbor disaggregation methodology, (iii) fitting time varying generalized extreme value (GEV) distribution to annual maximum flows extracted from the daily streamflows and estimating flood return levels for different target locations in the river basin corresponding to different return periods.

URI

https://etd.iisc.ac.in/handle/2005/3728

Collections

Civil Engineering (CiE) [457]