ETL or Extract, Transform, and Load tools play an important role in Business Intelligence (BI) initiatives. In this post, we’ll take a look at what ETL tools do, particularly in the context of business intelligence projects.
Before companies can perform OLAP analysis, data mining, data visualization or gain business insights from typical BI modules like reports, dashboards, or alerts, they need to gather all enterprise data into a data warehouse (DW). That’s not exactly a trivial task. Enterprise data can actually reside in a wide range of data sources, including operational systems, ERP, CRM, SQL, flat files, and spreadsheets, among others.
The reason for gathering all enterprise data into one place is so that everyone who will be working with the BI tools will be looking at a single version of the truth. Some data sources may not contain the same set of information found in others. For instance, the information you have in your supply chain system’s database will certainly not be the same information found in your accounting system.
Besides, there can be data from different sources that, even when referring to the same entity, may actually be expressed in different formats. For instance, “Female” in one system, may be expressed as “F” or “5” in another.
This is where an ETL tool can come in handy.
As its name implies, the first function of an ETL tool is to extract data. This data can be of varying data formats and can come from heterogeneous data sources like the ones mentioned earlier. The second function of an ETL is to transform all this data, which can involve several tasks such as reformatting, cleansing, and standardizing the data. Once the data has already been converted to a format suitable for the target data warehouse, it is then loaded into the DW. Loading is the third and final function.
ETL systems have traditionally operated in batch process modes, particularly on a monthly, weekly, or daily basis. However, in recent years, senior managers and top executives have started seeking business insights based on real-time information.
Since BI draws its information from the data warehouse, real-time BI can only be served by real-time data warehousing. Naturally, ETL jobs that run on weekly or even daily cycles are not enough. That’s why we are slowly seeing highly advanced ETL systems that can now run jobs on per-minute cycles.
This is how intertwined ETL is to BI. Any business demand that impacts business intelligence will also impact ETL.
It would be interesting to see how these two closely interrelated systems will evolve with the emergence of Big Data. Big Data is believed to be a treasure trove of untapped information that can provide valuable business insights. Hence, theoretically, it should be a perfect fit for BI initiatives. Unfortunately, Big Data is mostly made up of unstructured data, which incidentally isn’t suitable for most existing data warehouses and ETL systems. Will DWs and ETLs eventually be made to support unstructured data as well? Only time will tell.