Networked information discovery and retrieval on the ERNET
Abstract
Networked Information Discovery and Retrieval (NIDR) is the problem of locating information that is present on a network and making use of the located information. With the increase in volume and kind of information on networks, NIDR is becoming more important than ever.
The Education and Research Network (ERNET) is the computer network that connects the academic and research community of India. This thesis addresses the NIDR problem on the ERNET by presenting Narada, an NIDR toolkit-cum-solution for the ERNET.
This thesis:
Surveys existing NIDR tools and discusses their suitability on the ERNET.
Discusses Anarchie, an NIDR tool for resources on anonymous FTP sites, which has been functioning in the IISc campus for over a year.
Introduces the concept of an Aggregate FTP Server, which provides a single window to all the FTP sites in a network domain.
Explores the relevance of providing offline access to the Internet.
To reduce the need for robots to crawl over websites, this thesis proposes an extension to the HTTP protocol, which under certain conditions, eliminates the need for a robot to crawl a website.
Deploying centralized NIDR solutions would burden ERNET’s network links. Distributed NIDR solutions are difficult to set up and administer. This thesis proposes a simpler hybrid system designed to be easy to set up and administer.
After presenting the implementation details of the various components, proposed extensions, and experiences with public deployment of Anarchie, the thesis concludes by presenting future research directions.