Introduccion A Datawarehouse
Slides kindly borrowed from the course “Data Warehousing and Machine Learning” Aalborg University, Denmark Christian S. Jensen Torben Bach Pedersen Christian Thomsen {csj,tbp,chr}@cs.aau.dk
Course Structure
• Business intelligence
Extract knowledge from large amounts of data collected in a modern enterprise Datawarehousing, machine learning Acquire theoretical background in lectures and literature studies Obtain practical experience on (industrial) tools in practical exercises
Data warehousing: construction of a database with only data analysis purpose
•
Purpose
Business Intelligence (BI)
Machine learning: find patterns automatically in databases
2
•1
Literature
• MultidimensionalDatabases and Data Warehousing, Christian S. Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan & Claypool Publishers, 2010 • Data Warehouse Design: Modern Principles and Methodologies, Golfarelli and Rizzi, McGraw-Hill, 2009 • Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008 • The Data Warehouse LifecycleToolkit, Kimball et al., Wiley 1998 • The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002
3
Overview
• • • • Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics
Multidimensional modeling ETL Performance optimization
4
•2
What is Business Intelligence (BI)?
• From Encyclopedia of Database Systems: “[BI] refers to a setof tools and techniques that enable a company to transform its business data into timely and accurate information for the decisional process, to be made available to the right persons in the most suitable form.”
5
What is Business Intelligence (BI)?
• BI is different from Artificial Intelligence (AI)
AI systems make decisions for the users BI systems help the users make the rightdecisions, based on available data
• Combination of technologies
Data Warehousing (DW) On-Line Analytical Processing (OLAP) Data Mining (DM) ……
6
•3
Why is BI Important?
• Worldwide BI revenue in 2005 = US$ 5.7 billion
10% growth each year A market where players like IBM, Microsoft, Oracle, and SAP compete and invest Small and medium-sized companies can alsobenefit from BI You cannot afford not to use the “gold” in your data
• BI is not only for large enterprises
• The financial crisis has increased the focus on BI
7
BI and the Web
• The Web makes BI even more useful
Customers do not appear “physically” in a store; their behaviors cannot be observed by traditional methods A website log is used to capture the behavior ofeach customer, e.g., sequence of pages seen by a customer, the products viewed Idea: understand your customers using data and BI!
Utilize website logs, analyze customer behavior in more detail than before (e.g., what was not bought?) Combine web data with traditional customer data
8
•4
Case Study of an Enterprise
• Example of a chain (e.g., fashion stores or car dealers)
Each store maintains its own customer records and sales records
Hard to answer questions like: “find the total sales of Product X from stores in Aalborg”
The same customer may be viewed as different customers for different stores; hard to detect duplicate customer information Imprecise or missing data in the addresses of some customers Purchase records maintained in theoperational system for limited time (e.g., 6 months); then they are deleted or archived The same “product” may have different prices, or different discounts in different stores
• Can you see the problems of using those data for business analysis?
9
Data Analysis Problems
• The same data found in many different systems
Example: customer data across different stores and...
Regístrate para leer el documento completo.