Biggest BI Mistakes - Mistake #1: Not Using an ETL Platform

by Tom Penichter 3. June 2009 10:55
Introduction
ETL (extract-transform-load) refers to the data integration processes in your information systems. ETL is fast becoming a redundant concept as the practice of enterprise master data management takes hold in the marketplace. That said, the ETL tools themselves are expanding beyond the limits of specific data transformations in support of business intelligence reporting, to meet the broader reaching cross-enterprise needs of data integration and management. In the absence of a better moniker, "ETL tool" remains the predominant way to refer to these data integration and management platforms, such as Informatica, Talend, MS SQL Server's SSIS, Oracle's Warehouse Builder, etc...

ETL typically takes 70-80% of the project hours for any data warehouse project or business intelligence effort. A typical enterprise data system is a complex and heterogenous environment, spanning relational and non-relational database backends of operational transaction systems, text files, mainframe database, websites, and more. Often the data is physically housed in disparate places, maintained by distinct tech teams, and supporting different business units or departments. Tying this data together in a consistent and meaningful way is the greatest challenge of any business intelligence effort.
 
Medium and small sized enterprises often make the mistake of developing ETL processes without the assistance of an ETL platform. While custom coding in Java or Perl (or even SQL!) has it's advantages, such as flexibility, a common driver for custom coding is not knowing about the features and usability of ETL platforms. Also, until recent years ETL tools came at a price point only Fortune 1000 companies could afford. As the technology matures, and technology options with dramatically lower price points enter the market, mid-market firms are beginning to adopt and use ETL platforms as well.

In this posting we put forth the argument of why engaging in that learning curve - especially when open source or package-licensed ETL platforms are readily available to you - is the smart move to make.


Development time reduced
Whether looking at it from the time-to-market perspective, or total cost of ownership (TCO), it is hard to imagine a scenario where you wouldn't benefit from using an ETL platform. Junior developers can learn the core features of many ETL platforms in a matter of hours - regardless of their technology background. This knowledge can then be put to use developing integration solutions for your firm; in minutes a novice ETL platform developer can accomplish tasks which equate to hundreds of lines of code in Java or Perl.
 
Besides initial development benefits, depending on the platform, editing integration solutions developed in ETL platforms can be done with a handful of drag-and-drops, with considerably less risk of breaking the overall code.

Automatic Documentation and Training Uptime
Another benefit of using an ETL platform is that detailed documentation is built into the development process. The graphical view of an ETL project provides a bird's eye view of the processes and data flow, while giving immediate access to a finer level of detail of each individual process within your data flow. Developers do not need to put in additional hours summarizing and documenting their work, and there isn't the concern of documentation continually shooting at a moving target.
This also makes training a heck of a lot easier, as developers have an accurate and consistent summary level of information to engage in, without the need to understand every line of the underlining code base.

Problem Solving and Performance Optimization
Many ETL platforms include performance tracking capabilities that allow you to measure the speed of individual processes and queries within your data flow. This allows you to easily see where the process is getting hung up, what the bottlenecks are, and which components of processes are causing performance to drag.

Metadata management 
Most ETL tools include (Microsoft SSIS is a notable exception) metadata management features which tie in well with modern master data management goals of an enterprise. Through your ETL platform metadata features, you can track changes to and summarize the data sources that exist across your enterprise. This largely automated level of documentation again reduces uptime for new staff, while providing transparency into the inner workings of databases potentially managed by diverse tech teams with little direct contact with each other.
 
Without this level of insight into cross-enterprise data stores, utilization of valuable data is often curtailed to within department analyses, as the level of effort in accessing and understanding a given operation's data can be greater than the value of any given report or ad hoc query.
 
Technology Equalizer
Finally, ETL platforms come along with a host of "connectors" which interface the ETL system to various data sources, including relational databases (SQL Server, Oracle, DBII, MySQL), mainframes, text files, and web data (RSS, email). These convenient data adapters reduce the need for your ETL technicians to be masters of each of the varied data source technologies they will encounter.

All combined, these points are good reason for you to review the current state of your data architecture, and determine if you are making use of an ETL platform to manage your enterprise data.