Algorithms for Estimating Integrals in High Dimensional Spaces
Abstract
Sampling, estimation and integration in high dimensional continuous spaces is required in diverse areas ranging from modeling multi-particle physical systems and optimization to inference from data. When the number of independent parameters increases, analytical methods are not always tractable and numerical methods require exponentially increasing computational effort (NP-hardness). In the first and the major part of the thesis, we propose a general Monte Carlo sampling method to estimate n-volumes and integrals even over non-convex domains. Deterministic sampling methods such as the Quasi-Monte Carlo are very efficient when the integrand can be reduced to a function of a single effective variable. Similarly, the naive Monte Carlo is very effective when the independent variables are sampled uniformly over an n-orthotope (a rectangle for n=2, cuboid for n=3, etc.). In problems where some function defines the boundary of the domain or its membership, and in problems where the independent variables are sampled with an implicit probability distribution, correctly sampling the n-volume of the domain in itself amounts to be NP-hard in general.
Markov Chain Monte Carlo (MCMC) methods are suited for convex domains and scale as O(n^4) in samples required for the estimation of volume. The proposed n-sphere Monte Carlo (NSMC) method not only has lower computational complexity but also preserves the independence of the random samples making it well suited for parallel computing. The proposed method decomposes the estimated volume into volumes of weighted n-spheres, and these weights are trivially estimated by sampling the extents of the domain. While other methods typically scale well only for relatively smooth convex bodies, the performance of the proposed method is only dependent on the variance of the distribution of extents and is independent of the roughness and the convexity of the body. The required number of samples to estimate n-volumes scales linearly with the number of dimensions for a fixed distribution of extents, while the total computing effort scales quadratically. We show a straight-forward adaptation of this method for estimating arbitrary integrals. Even in the case of convex shapes which are not defined by a fixed distribution of extents, our numerical results show that the naive NSMC has significant advantages over MCMC for number of dimensions n < 100. But this approach has challenges with highly eccentric volumes given by tailed distributions, and the large front constants of the linear scaling in such cases can be reduced by an appropriate non-uniform importance sampling of the extents. The challenges in such a non-uniform sampling in high dimensional spaces are described along with the proposed solution. With this geometric importance sampling, we show that the O(n) scaling in samples can be achieved even for highly eccentric spheroids, where the tail of the distribution of extents contributes significant volume. Future work is aimed at demonstrating this favorable scaling in samples for other challenging shapes as well.
In the second and minor part of the thesis, we propose a non-sampling method to compute functions of scalar random variables using their moments. This method, while restricted in applicability to simple functions, can be applied to augment the NSMC approach to integration and also provides semi-analytical expressions for evaluating the moments of functions of random variables.