University of Maryland
The dramatic improvements in global interconnectivity due to intranets, extranets, and the Internet has led to an explosion in the number and variety of new data-intensive applications. Along with theproliferation of these new applications have come increased problems of scale. This is demonstrated by frequent delays and service disruptions when accessing networked data sources. Recently, push-based techniques have been proposed as a solution to scalability problems for distributed applications. This paper argues that push indeed has its place, but that it is just one aspect of a much larger design spacefor distributed information systems. We propose the notion of a Dissemination-Based Information System (DBIS) which integrates a variety of data delivery mechanisms and information broker hierarchies. We discuss the properties of such systems and provide some insight into the architectural imperatives that will in uence their design. The DBIS framework can serve as the basis for development of atoolkit for constructing distributed information systems that better match the technology they employ to the characteristics of the applications they are intended to support.
1 Introduction 1.1 The World-Wide Wait
The scenario is all too familiar | a major event, such as a national election, is underway and the latest, up-to-the minute results are being posted on the Web. You want to monitorthe results for the important national races and for the races in your state, so you re up your trusty web
This work has been partially supported by the NSF under grant IRI-9501353, by Rome Labs Agreement Number F30602-97-2-0241 under ARPA order number F078, by an IBM Cooperative Graduate Fellowship, and by research funding and equipment from Intel Corporation.
browser, point it at the electionresult web site and wait, and wait, and wait: : : . What's the problem? It could be any number of technical glitches: a congested network, an overloaded server, or even a crashed server. In a larger sense, however, the problem is one of scalability; the system cannot keep up with the heavy load caused by the (transient) surge in activity that occurs in such situations. We argue that suchscalability problems are the result of a mismatch between the data access characteristics of the application and the technology (in this case, HTTP) used to implement the application. An election result server, such as that of the preceding scenario, is an example of a data dissemination-oriented application. Data dissemination involves the delivery of data from one or more sources to a large set ofconsumers. Many dissemination-oriented applications have data access characteristics that di er signi cantly from the traditional notion of client-server applications as embodied in navigational web browsing technology. For example, the election result server has the following characteristics: 1) There is a huge population of users (potentially many millions) who want to access the data; 2) There is atremendous degree of overlap among the interests of the user population; 3) Users who are following the event closely are interested only in new data and changes to the existing data; and, 4) The amount of data that must be sent to most users is fairly small. When looking at these characteristics, it becomes clear that the request-response (i.e., RPC), unicast (i.e., point-to-point) method of datadelivery used by HTTP is the wrong approach for this application. Using request-response, each user sends requests for data to the server. The large audience for a popular event can generate huge spikes in the load at servers, resulting in long delays and server crashes. Compounding the situation is
that users must continually poll the server to obtain the most current data, resulting in...