A Study of the Performance  Benefits of Controlling Parallel Asynochrous Iteractive Applications

Joseph, P J

dc.contributor.advisor	Sriram, V
dc.contributor.author	Joseph, P J
dc.date.accessioned	2005-03-11T09:21:20Z
dc.date.accessioned	2018-07-31T04:39:01Z
dc.date.available	2005-03-11T09:21:20Z
dc.date.available	2018-07-31T04:39:01Z
dc.date.issued	2005-03-11T09:21:20Z
dc.date.submitted	1997
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/84
dc.identifier.srno	null
dc.description.abstract	High performance networks of workstation are becoming increasingly popular a parallel computing platform because of their lower cost. Both message passing and software distributed shared memory (DSM) programming paradigms have been developed and employed on such distributed hardware platforms. An important performance bottleneck in these systems is the effective data transmission latency, which is poorer than in high-speed parallel computer interconnection networks. Iterative algorithms are used in a large class of applications like solution of partial algorithms are used, optimization problems, solutions to systems of linear equations, and so on. These can be parallelized in a straight-forward fashion with cad1 node computing a part of the data set and also synchronizing and exchanging data with the other nodes as required. But these synchronous version perform poorly when message transmission delays are high, as is the case in network of workstations. The asynchronous parallel version of these algorithms provide an additional degree of freedom to address large data transmission latencies. These algorithms do not synchronize, and behave correctly in the presence of losses and delays in the propagation of updates. Thus, in shared memory systems they do not synchronize accesses to shared data and they will work correctly even in the presence of delays and losses in updates. This gives synchronous algorithms a performance advantage over their synchronous counterparts since synchronization costs are avoided and further computation can be overlapped with communication. The message generation rate of asynchronous algorithms is however greater than that of their synchronous counterparts. Uncontrolled asynchronous runs can create a large network load resulting in large queuing delays, which in turn can increase the message generation of the asynchronous algorithms. This is especially possible in lower bandwidth network like that in network of workstations. Such a positive feedback loop leads to unstable network conditions. Recent work has tried to improve the performance of asynchronous algorithms on a distributed shared memory (DSM) system by adaptively buffering shared memory updates depending on the network load, and transmitting multiple updates together. This reduces congestion and message transmission overhead, but could still result in slow convergence since nothing is guaranteed about, the update propagation delay. Also, although adaptive throttling of message will kick in once the network gets heavily loaded, it cannot, prevent the initial flooding. Furthermore, the processor is not freed when computation with the available values does not result in much further convergence. In this thesis we present an alternate method of controlling iterative methods and present performance results for the same. We propose a new system-supported blocking read primitive, termed Global Read that is guaranteed to return a value of acceptable age of the specified location in a DSM system. The main idea is to enforce an upper bound on the age of shared updates seen by a node in a manner visible to the underlying system (DSM). Information about processes being blocked can be used for adapting the underlying system, especially the network, towards better performance. A reading process is throttled until its Global-Read is satisfied, thus implementing program-level flow control and also freeing the processor. The Global-Read can also help in communication-based scheduling of, processes. Performance evaluation using a benchmark from Cray, on a network of workstations and on the IBM SP2 parallel computer, showed good performance improvements. We also present results of a systematic study wherein we implement parallel code for different iterative techniques for the solution of Lap lace equation wing PVM, anti characterize when controlled asynchrony work befit. We studied the improvements in computation time and analyzed the sources of this improvement, using various input and parallelism, on both IBM SP2 and a network of workstations. We find significant performance improvements for controlling asynchrony when the traffic generated by the asynchronous algorithm becomes more than what can be sustained by the network. But we found that the scalability of the applications is limited by the software overhead for messages. Systems which have reduced software overhead will show very good scalable performance for controlled asynchrony.	en
dc.format.extent	3451545 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Indian Institute of Science	en
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.	en
dc.subject.classification	Computer and Information Science	en
dc.subject.keyword	Multiprogramming	en
dc.subject.keyword	Parallel Process (E-Computers)	en
dc.subject.keyword	Iterative Methods	en
dc.subject.keyword	Distributeded Shared Memory ( DSM )	en
dc.subject.keyword	Asynchronous Parallelisation	en
dc.title	A Study of the Performance Benefits of Controlling Parallel Asynochrous Iteractive Applications	en
dc.type	Electronic Thesis and Dissertation	en
dc.degree.name	MSc Engg.	en
dc.degree.level	Masters	en
dc.degree.grantor	Indian Institute of Science	en
dc.degree.discipline	Faculty of Engineering	en

Files in this item

Name:: A_Study_Of_The_Performance_Ben ...
Size:: 3.291Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [542]

Show simple item record