The ETL approach has been the traditional approach for data warehousing and analytics. ELT is the new kid on the block. Beyond the differences between ETL vs ELT, how can the ETL process be improved with ELT?
The ETL Process Changed the Game
The ETL (Extract, Transform, Load) process was revolutionary in its time when it came to the data processing flow for organizations. Back in the 70’s, organizations were starting to collect data from many different sources, such as payroll packages, accounting software, and sales data. Each of these sources produced different data formats, creating the need for a unified way to load all this data into a centralized repository. ETL allowed organizations to standardize this data and hold it in a data warehouse. A common ETL example would be moving data from an OLTP database into a dimensional data warehouse.
ETL has also gained popularity as various BI tools (Business Intelligence) have been developed to help visualize huge amounts of data. These tools allow end users to make sense of the data and allow data to be queried by different types of users to achieve their business goals. For example, a marketing executive would want to measure sales versus social media engagement, while an IT administrator would be looking at server usage over a busy holiday weekend. Suddenly, patterns and correlations can be found in huge data sets that can be used to identify new opportunities, reduce costs and leverage knowledge across an organization.
As the amount of data being generated has increased, and the number of data sources has multiplied, the ETL process has become even more important. For more about ETL, check out our ETL tutorial.
The ELT Process Changed It Futher
Other developments have impacted the traditional ETL model. For example, processing power has increased, and access to the cloud has become easier and more widespread than ever before. This has led to some interesting developments, one of which is the ELT process, or “Extract, Load, Transform”.
What is ELT? The ELT process is a relatively new approach that offers a streamlined alternative to the traditional ETL process. In this case, after the data from all data sources is extracted, you immediately start moving it into a centralized data repository. As its name suggests, data is extracted (“E”) from the various data sources and loaded (“L”) into the data storage area, such as a data warehouse. From there, it is transformed (“T”) on-the-fly and when required, to give quick access to anyone needing to query the data. Many organizations today maintain massive data pools in the cloud at low cost, leveraging ELT tools to process the data.
The ELT approach leverages improved infrastructure technologies, especially those utilizing the cloud. ELT enables a large, expanding data pool and fast processing, together with less time to query data, less cost, and better security.
ETL vs ELT: Key Differences
Key differences between the two approaches include:
- Uses staging area and system, extra time to load data relative to ETL
- Transformation: need to wait, especially for big data sizes. As data grows, transformation time increases
- High maintenance - choice of data to load and transform and must do it again if deleted or want to enhance the main data repository
- At early stage, requires less space and result is cleaner
- Analysis & processing style: based on multiple scripts to create the views - deleting view means deleting data
- Data lake support not part of approach
- Fixed tables, Fixed timeline, Used mainly by IT
- Not always cost-effective for small to medium sized businesses
- All-in-one system, load only once
- Speed is not dependent on data size
- Low maintenance - all data is always available
- Requires in-depth knowledge of tools and expert design of the main large repository
- Enables use of data lake with unstructured data supported
- Scalable and available to all business sizes using online SaaS solutions
- Often requires careful planning and design
How ELT Is Improving ETL
One of the biggest drawbacks of traditional ETL processing was the amount of time it took. For example, end users wanting to use Business Intelligence (“BI”) tools on information in the data warehouse, would have to wait until the entire process was complete. It could be quite tedious as each step would have to be completed (and tested) before the next one could be started. Various ETL tools have been developed to streamline this process.
One of the biggest effects that ELT has had on ETL is the requirement for more real-time querying ability. ETL is solid and works, but the speed that is offered by ELT has made it seem a little outdated.
A traditional ETL process requires batch processing. This involves preparing data and transforming it using a rigid, structured process. With the need for quicker processing, ETL with stream processing has been used - using a modern stream processing framework like Kafka, data can be pulled in real-time from data sources, manipulated on the fly using Kafka’s Stream API, and loaded to a target system such as Amazon Redshift. This gives users all the benefits of ELT in terms of speed, while utilizing the solid base of traditional ETL.
Getting the Most Out of Your Data
Traditional ETL has worked for decades. However, as data demands become more exacting, a way of processing data from multiple disparate data sources that is more in tune with modern requirements is necessary. Part of the answer is ELT. However, with advances in stream processing, and even fully automated data warehousing, the opportunity exists to build a next-generation data pipeline.
When it comes to your data processing and BI needs, Panoply is your ideal partner. Panoply is an autonomous data warehouse built for analytics professionals, by analytics professionals.