A Peer To Peer Web Proxy Cache For Enterprise Networks
Abstract
In this thesis, we propose a decentralized peer-to-peer (P2P) Web proxy cache for enterprise networks (ENs). Currently, enterprises use a centralized proxy-based Web cache, where a dedicated proxy server does the caching. A dedicated proxy Web Cache has to be over-provisioned to handle peak loads. It is expensive, a single point of failure, and a bottleneck. In a P2P Web Cache, the clients themselves cooperate in caching the Web objects without any dedicated proxy cache. The resources from the client machines are pooled together to form a Web cache. This eliminates the need for extra hardware and the single point of failure, and improves the average response time, since all the machines serve the request queue. The most important attraction for the P2P scheme is its inherent scalability.
Squirrel was the earliest P2P Web cache. Squirrel is built upon a structured P2P protocol called Pastry. Pastry is based on consistent hashing; a special hashing that performs well in the presence of client membership changes. Consistent hashing based protocols are designed for Internet-wide environments to handle very large membership sizes and high rates of membership change. To minimize the protocol bandwidth, the membership state maintained at each peer is very small. This state consists of the information about the peer’s immediate neighbours, and those of a few other P2P members, to achieve faster look-up.
This scheme has the following advantages: (i) since peers do not maintain information about all the other peers in the system, any peer needing an object has to find the peer responsible for the object through a multi-hop lookup, thereby increasing the latency, and (ii) the number of objIds assigned to a peer depends on the hashing used, and this can be skewed, which affects the load distribution.
The popular applications of the P2P paradigm have been file-sharing systems. These systems are deployed across the Internet. Hence, the existing P2P protocols were designed to operate within the constraints of Internet environments. The P2P proxy Web cache has been a recent application of the P2P paradigm. P2P Web Proxy caches operate across the entire network of an enterprise. An enterprise network(EN) comprises all the computing and communications capabilities of an institution. Institutions typically consist of many departments, with each department having and managing its own local area netwok (LAN). The available bandwidth in LANs is very high. LANs have low latency and low error rates. EN environments have smaller membership size, less frequent membership changes and more available bandwidth. Hence, in such environments, the P2P protocol can afford to store more membership information.
This thesis explores the significant differences between EN and Internet environments. It proposes a new P2P protocol designed to exploit these differences, and a P2P Web proxy caching scheme based on this new protocol. Specifically, it shows that it is possible to maintain complete the consistent membership information on ENs. The thesis then presents a load distribution policy for a P2P system with complete and consistent membership information to achieve (i) load balance and (ii) minimum object migrations subsequent to each node join or node leave event.
The proposed system requires extra storage and bandwidth costs. We have seen that the necessary storage is available in general workstations and the required bandwidth is feasible in modern networks. We then evaluated the improvement in performance achieved by the system over existing consistent hashing based systems. We have shown that without investing in any special hardware, the P2P system can match the performance of dedicated proxy caches. We have further shown that the buddy based P2P scheme has a better load distribution, especially under heavy loads when load balancing becomes critical. We have also shown that for large P2P systems, the buddy based scheme has a lower latency than the consistent hashing based schemes. Further, we have compared the costs of the proposed scheme and the existing consistent hashing based scheme for different loads (i.e., rate of Web object requests), and identified the situations in which the proposed scheme is likely to perform best.
In summary, the thesis shows that (i) the membership dynamics of P2P systems on ENs are different from that of Internet file-sharing systems and (ii) it is feasible in ENs, to maintain complete the consistent view of the P2P membership at all the peers. We have designed a structured P2P protocol for LANs that maintains a complete and consistent view of membership information at all peers. P2P Web caches achieve single hop routing and a better balanced load distribution using this scheme. Complete and consistent view of membership information enabled a single-hop lookup and a flexible load assignment.