Data Warehousing: June 2016

Wednesday, June 29, 2016

METADATA in simple terms

Metadata can be defined as the data about data present in the Data Warehouse. It can be directly linked as the Data Dictionary of the Warehouse.

The following are the different types:

1) Operational Metadata

It consists of all the information about the operational data sources.

2) Extraction & Transformation Metadata

It consists of the information about the extraction of data from its sources, such as- extraction frequencies, methods or business rules and along with extraction it also contains details about the transformation.

3) End-User Metadata

It is the navigational map for the data warehouse as it enables the end users to search the warehouse with their own business terms.

Thus, Metadata connects the parts of the warehouse together and makes it easy for end users to search.

Monday, June 27, 2016

Inmon and Kimball Approach for Data Models

Bill Inmon supports the Top-Down approach for data warehouse design whereas, Ralph Kimball supports Bottom-Up design. Now let us discuss what each are in detail, but before that it is important to answer the following questions before starting the design of a warehouse-

Do we need a Top-Down/Bottom-Up approach for our warehouse?
Should the warehouse be enterprise wide or be departmental?
What do we create first- Data Mart or the Data Warehouse?
Should we capture dependent data marts/independent marts?

Whenever a large data warehouse is channeled to small data marts then it’s a Top-Down approach as here the data repository feeds into the local, departmental marts. However, when we have data marts created first and then consolidating them into a repository then it’s called the Bottom-Up approach.

Few Characteristics of both-

Bill Inmon’s Top-Down Approach

Has an enterprise wide view of data.
Is inherently architected.
Has a single, central data storage
Has centralized rules and control
Gives quick results if implemented with iterations
Takes quite a long time to build
Has a high risk of failure

Ralph Kimball’s Bottom-Up

Has a fast and easy implementation of manageable pieces
Has a very less failure risk
Id inherently incremented
Each data mart has its own narrow view of data
Allows redundant data
Has inconsistent data
Has unmanageable interface
But, most severe drawback is Data Fragmentation

Wednesday, June 22, 2016

Building Blocks of a Warehouse

Building Blocks of a Warehouse

A Data Warehouse has the following major components-

Source Data: Sources from where the data actually enters a warehouse and this can be of the following categories- Production data, Internal Data, Archived data and External data.

Data Staging: Once we have acquired all the data it goes through the Stating area where the data is prepared to enter the warehouse. It has 3 major functions involved- Extraction of data, Transformation of Data and Loading of data.

Data Storage: Storage for a warehouse is kept separate from the Operational Systems storage because of the following reasons-

- Warehouse data contains volumes of historical data for analysis.
- Data is present in structures that are suitable for analysis.

Information delivery: As the user’s requirements are different thus the data should be delivered in a way that must support all the user’s requirements. Thus, there are multiple types if information delivery available such as-

- Ad-hoc reports
- Complex queries
- MD analysis
- Statistical Analysis
- EIS Feed
- Data mining

All the above are are effectively depicted in the Diagram below-

Thus these ares can be categorized into the following-

Data Acquisition , Data Storage and Information delivery also represented in the above Diagram -

While defining the architecture of the warehouse, the following factors must be considered-

Consider the number and extent of the data sources.
How many legacy systems are you going to extract the data from?
What are the external sources?
Are you planning to include departmental files, spreadsheets, and private databases?
What about including the archived data?

Monday, June 20, 2016

All about STAR and SNOWFLAKE Schema

STAR and SNOWFLAKE Schema

STAR Schema

Consists of one fact table that is surrounded by dimension tables. It is an easy to understand structure.

How it works to give results?

A query made on this structure can easily combine results from various dimensions linked to the fact table and result the required data required to answer any user question. Relationship of a record in a fact table is with the rows present in the dimension tables.

Example:

SNOWFLAKE Schema

This is the Normalized version of STAR Schema. All the tables present as dimensions here are normalized. The Fact table remains the same as STAR Schema.

It holds an advantage of less storage space and being a normalized structure it is easy to update and maintain. Thus, for a database that changes frequently this structure is preferred.

Example:

Wednesday, June 15, 2016

Initial Steps-Data Warehouse Development

Dimensional Modeling

Now that we are aware of the basic concepts of Data Warehouse, it is important to understand that one of the key steps involved in the warehouse development involves understanding the requirements and presenting them in a view/model that is easy to understand for – users, developer and all the involved parties.

To provide this view the warehouses are modeled multidimensional with the help of Use Cases. Data models should always be easy to understand, read and develop.

Data Warehouse requirements have different abstraction levels for detailed, testable and complete specifications for the entire team. These levels are-

- Business requirements
- User requirements
- Detailed System Requirements
- Requirements Attributes (Functional, Information and Other)

These requirements are then captured in the Use cases that follows an iterative approach.

BUS Matrix

All the operational activities defined for an organization are captured in a BUS Matrix where each row defines a business process.

For instance, in the BUS Matrix defined above, each of the business process are captured as a separate row in the Matrix. Such as-

· Retail Sales
· Retail Inventory
· Retail Deliveries
· Warehouse Inventory
· Warehouse Deliveries
· Purchase Orders

Each process captured above corresponds to one row each in the matrix.

Monday, June 13, 2016

OLAP in detail

OLAP is a graphical tool that provides multidimensional views of data. Raw data when analyzed through OLAP can lead to a very easy way in understanding the data. It provides the users with all the information related to the data. Users can identify patters or trends from in the data easily through this tool and it makes the work easy as user no more has to search through piles of raw data.

Its biggest advantage is that it is beneficial for future predictions of an organization’s sales/costs/profit with respect to the current trends and pattern. Also, technical knowledge is not required to work on OLAP tool, only the knowledge of Relational Database is sufficient.

All the OLAP tools basically present the data to the users in the form of Multidimensional Cubes. Any relational tables can be represented in the form of a cube.

For Example, the below tables shows the records of watch sales in different locations-

This table can be represented in the form of cube where-

Here it’s a 2 Dimensional layout.Each axis of cube is the dimension. Horizontal Dimension holds the product and vertical dimension holds the sales of the product. The above data is arranged as Products by Store but it can also be arranged as Store by products thus-

 Different business questions can be answered by arranging the data differently.

Wednesday, June 8, 2016

Basics of Data Warehousing

Data Warehousing

What is a Data Warehouse?

In simple terms a repository for any organization that contains the data pertaining to all the areas defines a warehouse. Basic use is for analysis rather than for transaction processing. Thus, it separates the analysis workload from all the transactions workload. All the information gathered in a warehouse can be used for the following analysis using different technologies that can help analyze the data effectively.

Be it any king of business, a data warehouse can help in analysis of customers and their buying behavior or patterns. Predictive analysis on the data captured in the warehouse can lead to increased profit, perfect demand and supply ratio and many more benefits.

Overview-
On Line Analytical Processing helps in viewing and analyzing the historical data captured in the data warehouse.
x

Why do we need a Data Warehouse?

Before preparing the data model for a Warehouse, answer the following questions-

1. Why do we need the warehouse, its business objective?

2. What functionality should the warehouse cover?

3. Types of data needed to support the functionality determined for the warehouse in the above step.

4. How enormous the warehouse must be?

Once all the information is gathered the next step becomes clearer and easy but before that it is important to understand the term OLAP.

OLAP?