A foundation for capturing and querying complex multidimensional data
Torben Bach Pedersena,*, Christian S. Jensena, Curtis E. Dyresonb
b a Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark School of Electrical Engineering and Computer Science, Washington State University, PO Box 642752, Pullman, WA99164-2752, USA
Abstract On-line analytical processing (OLAP) systems considerably improve data analysis and are ﬁnding wide-spread use. OLAP systems typically employ multidimensional data models to structure their data. This paper identiﬁes 11 modeling requirements for multidimensional data models. These requirements are derived from an assessment of complex data found in real-world applications.A survey of 14 multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, lack built-in mechanisms for handling change and time, lack support for imprecision, and are generally unable to insert data with varying granularities. This paper deﬁnes an extended multidimensional datamodel and algebraic query language that address all 11 requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, andin the presentation of the imprecise result to the user. In addition, alternative queries unaﬀected by imprecision are oﬀered. The data model and query evaluation techniques discussed in this paper can be implemented using relational database technology. The approach is also capable of exploiting multidimensional query processing techniques like pre-aggregation. This yields a practical solutionwith low computational overhead. r 2001 Elsevier Science Ltd. All rights reserved.
Keywords: Multidimensional data; Data modelling; Imprecise data; On-line analytical processing
1. Introduction On-line analytical processing (OLAP)  is an area of active commercial and research interest. Continued advances in hardware for on-line mass storage have made possible the warehousing of large amountsof data. OLAP tools focus on
*Corresponding author. E-mail addresses: firstname.lastname@example.org (T.B. Pedersen), email@example.com (C.S. Jensen), firstname.lastname@example.org (C.E. Dyreson).
providing fast answers to ad hoc queries that aggregate the warehouse data. This enables users to quickly analyze the data and make informed decisions. Traditional data models, such as the ER model  and the relational model,do not provide good support for OLAP applications. As a result, new data models based on a multidimensional view of data have emerged. Multidimensional models typically categorize data as either measurable business facts (measures), which are numerical in nature, or dimensions, which are mostly textual
0306-4379/01/$ - see front matter r 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 30 6 - 4 3 7 9 ( 0 1 ) 0 0 0 2 3 - 0
T.B. Pedersen et al. / Information Systems 26 (2001) 383–423
and characterize the facts. For example, in a retail business, products are sold to customers at certain times in certain amounts at certain prices. A typical fact would be a purchase. Typical measures would be the amount and price of the purchase. Typical dimensions would be thelocation of the purchase, the type of product being purchased, and the time of the purchase. Most OLAP research to date has concentrated on performance issues. Higher-level issues, such as conceptual modeling, have received less attention. Several researchers have identiﬁed this deﬁciency and have suggested combining the good performance of OLAP systems with the advanced data modeling capabilities of...