The ETL approach has been the traditional approach for data warehousing and analytics. ELT is the new kid on the block. Beyond the differences between ETL vs ELT, how can the ETL process be improved with ELT?
The ETL (Extract, Transform, Load) process was revolutionary in its time when it came to the data processing flow for organizations. Back in the 70’s, organizations were starting to collect data from many different sources, such as payroll packages, accounting software, and sales data. Each of these sources produced different data formats, creating the need for a unified way to load all this data into a centralized repository. ETL allowed organizations to standardize this data and hold it in a data warehouse. A common ETL example would be moving data from an OLTP database into a dimensional data warehouse.
ETL has also gained popularity as various BI tools (Business Intelligence) have been developed to help visualize huge amounts of data. These tools allow end users to make sense of the data and allow data to be queried by different types of users to achieve their business goals. For example, a marketing executive would want to measure sales versus social media engagement, while an IT administrator would be looking at server usage over a busy holiday weekend. Suddenly, patterns and correlations can be found in huge data sets that can be used to identify new opportunities, reduce costs and leverage knowledge across an organization.
As the amount of data being generated has increased, and the number of data sources has multiplied, the ETL process has become even more important. For more about ETL, check out our ETL tutorial.
Other developments have impacted the traditional ETL model. For example, processing power has increased, and access to the cloud has become easier and more widespread than ever before. This has led to some interesting developments, one of which is the ELT process, or “Extract, Load, Transform”.
What is ELT? The ELT process is a relatively new approach that offers a streamlined alternative to the traditional ETL process. In this case, after the data from all data sources is extracted, you immediately start moving it into a centralized data repository. As its name suggests, data is extracted (“E”) from the various data sources and loaded (“L”) into the data storage area, such as a data warehouse. From there, it is transformed (“T”) on-the-fly and when required, to give quick access to anyone needing to query the data. Many organizations today maintain massive data pools in the cloud at low cost, leveraging ELT tools to process the data.
The ELT approach leverages improved infrastructure technologies, especially those utilizing the cloud. ELT enables a large, expanding data pool and fast processing, together with less time to query data, less cost, and better security.
Key differences between the two approaches include:
ETL
ELT
One of the biggest drawbacks of traditional ETL processing was the amount of time it took. For example, end users wanting to use Business Intelligence (“BI”) tools on information in the data warehouse, would have to wait until the entire process was complete. It could be quite tedious as each step would have to be completed (and tested) before the next one could be started. Various ETL tools have been developed to streamline this process.
One of the biggest effects that ELT has had on ETL is the requirement for more real-time querying ability. ETL is solid and works, but the speed that is offered by ELT has made it seem a little outdated.
A traditional ETL process requires batch processing. This involves preparing data and transforming it using a rigid, structured process. With the need for quicker processing, ETL with stream processing has been used - using a modern stream processing framework like Kafka, data can be pulled in real-time from data sources, manipulated on the fly using Kafka’s Stream API, and loaded to a target system such as Amazon Redshift. This gives users all the benefits of ELT in terms of speed, while utilizing the solid base of traditional ETL.
Traditional ETL has worked for decades. However, as data demands become more exacting, a way of processing data from multiple disparate data sources that is more in tune with modern requirements is necessary. Part of the answer is ELT. However, with advances in stream processing, and even fully automated data warehousing, the opportunity exists to build a next-generation data pipeline.
When it comes to your data processing and BI needs, Panoply is your ideal partner. Panoply is an autonomous data warehouse built for analytics professionals, by analytics professionals.