An Oracle Whitepaper January 2006
This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this documentremains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced, or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor canit be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates.
Oracle Warehouse Builder 10gR2 Transforming Data into Quality Information
Enterprises have always relied on data to be successful. Customers, products, suppliers, and sales transactions all need to be described and tracked in one way or another. Even before computers becamecommercially available, the data in the form of paper records has been vital to both commercial and non-commercial organizations. With the advent of computing technology, the sophistication of data usage by businesses and governments grew exponentially. The technology industry serving these needs has generated many buzz words that come and go: decision support systems, data warehousing, customerrelationship management, business intelligence, etc., but the fact remains the same—organizations need to make the best use of the data they have to increase their efficiency today and improve their planning for tomorrow. If the data is so fundamental to the business, it is not surprising that a lot of effort goes into acquiring and handling data and making it available to those who need it. In theprocess, the data is moved around, manipulated, and consolidated. The quality of the data is rarely a high priority as the pressure is on to deliver the project and “get the data out to the users.” The justifications for not making data quality a priority are often just thoughts such as “our data is good enough” and “we can always clean it later.” Yet it is proven that data quality is one of thebiggest obstacles blocking the success of any data integration project. The often-quoted estimate by The Data Warehouse Institute (TDWI) is that data quality problems cost U.S. businesses more than $600 billion a year— a very impressive number, but hard to relate to without first understanding the concepts of data quality and the technologies that address data quality issues. This paper answers thequestions: What is data quality? Why put an effort into data quality? Why is this effort most efficient inside the extract, transform, and load (ETL) process? How will Warehouse Builder make this effort successful? You will discover how Warehouse Builder combines its core data integration functionalities with advanced data quality functionality.
WHAT IS DATA QUALITY?
Data quality is anall-encompassing term describing both the state of data that is complete, accurate, and relevant, as well as the set of processes to achieve such a state. The goal is to have data free of duplicates, misspellings, omissions, and unnecessary variations, and to have the data conform to the defined structure. Simply put, the data quality addresses the problem cynically but precisely summed up as “garbagein-garbage out.” A significant part of data quality deals with customer data—names and addresses, due to both their key roles in business processes and their highly dynamic nature. Names and addresses are ubiquitous—they tend to exist in almost every source and are often the only identifying data. Most matching applications rely heavily on names and addresses, because a common unique identifier...