The traditional approach for handling data warehousing as an analytical task has been Extact, Transform, and Load (ETL). However, in recent years, there has been an explosion in the types and volume of data organizations process that strains the capacity of data warehouse architecture.
Enter Extract, Load and Transform. ELT is a new approach that offers a viable alternative to the traditional ELT process paradigm. This article defines each process, explores the technological differences, how data migration is affected by each process, and how to determine which approach works best for your business environment.
Define ETL process
Extract, Transform, Load (ETL), are three separate processes functioning as a single tool. Data from specified sources is read, and only the data required for output is extracted. Once the extract process has been completed, the transform function takes over. By applying a series of transformation routines, data integrity and quality checks are performed. This process is driven mostly by the data output format. Once the data has been transformed into the desired output, the load function begins. Just as the function name suggests, the load portion of ETL moves the converted data, or changes to the converted data, to a target database or data warehouse.
How are you addressing ETL, compared to your peers?
Define ELT process
Like ETL, ELT isolates the extract and load process from the transformation process without the use of third-party tools. Raw data is extracted from the source to a staging database. Business rules, integrity, and quality checks are completed. The raw data is loaded into the target database or data warehouse where it is transformed into the desired output format.
ETL and ELT: Technological Differences
ETL is the original standard for moving data from source to data warehouse. With the ETL process; the required output is designed first. Doing the extraction this way ensures that only the required data is captured from the source. All routines used to extract transform and load data in the ETL process are designed, developed, and implemented to achieve a specific result. Business rules that define aggregations and the relationships between the various entities in both the source and target are integrated into the routines that control the ETL process.
For example, if you want to merge customer records from three different systems into a single customer table, ETL would extract relevant information from all three source systems and then apply business rules during the transformation step to create a single record to use for insert, update or delete operations against the destination table. Moving data this way requires interdependency between the routines at each stage of the ETL process.
ELT is the new framework leveraged to take advantage of advances in processing and storage. The ETL method extracts data from the source directly to tables in the destination database which can be copies of the source files. In the ELT process, the transformation process is not dependent on the extract and load process. Continuing with the customer example, if you want to merge customer records from three different sources into a single customer table, ELT would extract ALL information from the each source and load it directly into the target database. Once loaded, business rules are applied that transform the data into its target output format. Moving data using the ELT method, allows you to connect to the data source, extract the required data, and close the connections quickly. Additionally, ELT reduces cost, risk, and makes for easier system management.
Determine the best approach for business environment
Since the inception of data integration, BI architects have been challenged with creating and populating a data warehouses with data from multiple sources. The dramatic growth of data sources and the sheer volume of data, push BI Architects to find ways to implement more efficient integration and staging operations. The decision to transform data before (ETL) or after (ELT) it is loaded is critical to the technical, business, and financial well-being of the organization.
For deployments where you are providing data to end users can have the ability to wrangle large amounts data into meaningful reports, ELT is the way to go.
For deployments where want to merge data from several sources into one structured table that provides schedule reports, ETL is the best direction.
While the order in which the transform step takes place is important, your data type drives which process will be most cost-effective and useful to your organization.
What process flows do data warehouses support?