Panoply Blog: Data Management, Warehousing & Data Analysis

11 Powerful ETL Tools For Salesforce Data Integration

Written by Trisha McNary | Aug 13, 2019 10:13:46 PM

Salesforce is an industry-leading customer relationship management (CRM) platform. This BI tool supports integrated sales, automated marketing, and data analytics and management for more than 150,000 businesses of all sizes. Its advanced features include a built-in predictive AI and a user-friendly build platform environment for fast app development.

Salesforce creates reusable connectors that link your on-premises, cloud, or hybrid apps and devices to its MuleSoft Anypoint Platform. But enterprises often have data in Salesforce that they need to pull out for analysis with other specialized BI tools. MuleSoft extracts data from most sources and systems, but it has some limitations. And designers often want to load data directly into Salesforce from the command line.

In all data operations, an ETL (extract, transform, and load) is how data moves from one place to another. There are many ETLs that can work with Salesforce. Some ETL tools move data out, and others load data in. Some are easy to use by non-programmers and beginners, and others are developer-level tools. This blog lists the best ETLs to make your specific data work easier whether you’re extracting data from Salesforce or putting it into your Salesforce data warehouse.

Extract data from Salesforce

1. Panoply (cloud ETL + data warehouse)

Panoply is a fast and easy way to automatically pull data out of Salesforce. Once you connect Salesforce to Panoply, it continuously uploads, sorts, simplifies, and manages your latest Salesforce data. Panoply makes it easier than ever to query your Salesforce data with SQL and connect to tons of popular BI and analytical tools so your and analytics and output always reflect the most relevant metrics.

Whether you're trying to build an analytics stack for your startup or you just want to make it easier to work with Salesforce's data model, Panoply's user-friendly Salesforce data connector makes setup a snap, and  includes a cloud data warehouse so you don’t need to set up a separate destination to store all that precious data from your Salesforce CRM. 

2. dataloader.io

Salesforce's own ETL product dataloader.io is the company’s supported method for extracting data from Salesforce data stores. If you have a Salesforce account, you can sign in to dataloader.io with the same credentials. But dataloader.io isn’t free with Salesforce. It’s a separate paid service. This cloud data pipeline is powered by the Salesforce Mulesoft Anypoint Platform. It goes beyond what Mulesoft can do as an ETL. dataloader.io can save time by grouping your related objects into a single pull. So you don’t have to export multiple datasets individually and then recombine them in Excel or another database system. The dataloader.io ETL also has connectors to pull data from Box, DropBox, FTP, and SFTP repos into Salesforce.

dataloader.io has a clean, user-friendly UI with features like search filters and keyboard shortcuts. You can schedule automatic data extractions from Salesforce as frequently as once an hour, but dataloader.io won’t refresh your Salesforce-connected apps and BI tools in real-time.

3. Airflow Plugin - Salesforce

Airflow Plugin - Salesforce is an Apache Airflow-based plugin developed by Astronomer, Inc. This open-source ETL tool extracts data from Salesforce to Amazon S3 buckets and Redshift tables on the cloud. Apache Airflow is a powerful ETL scheduler, organizer, and manager, but it doesn’t process or stream data. For that, you need a plugin like Airflow Plugin - Salesforce to act as a data pipeline. This plugin’s Salesforce Hook authenticates your requests to Salesforce. Then you can create new connections to pull and save Salesforce data. To extract defined datasets, the plugin’s SalesforceToS3Operator operator queries the Salesforce bulk API with Salesforce Object Query Language (SOQL), a version of SQL that talks to the Salesforce datastore. Another operator reconciles schema and copies Salesforce data attributes and types to Redshift tables.

Airflow Plugin - Salesforce is a designer-level tool that runs in the development environment. It’s not for non-programmers and probably a struggle for beginners.

4. Skyvia

Skyvia, is a cloud data integration tool that copies data from Salesforce. This easy-to-use ETL doesn’t need any coding work to replicate Salesforce data to about 50 major cloud apps and on-premises data systems. The Skyvia data pipeline connects to many BI tools, three G Suite apps, and database systems like MySQL, PostgreSQL, SQL Server, Amazon RDS, and Google BigQuery. With Skyvia, you can either extract data manually or schedule automatic pulls. Its features include run history, mapping, lookups, and transformation expressions.

Skyvia’s user-friendly GUI makes it a good Salesforce data extraction choice for non-programmers or developers who want to save time and effort. But it doesn’t run in a CLI development environment.

5. Stitch

The Stitch Salesforce integration is an ETL that copies data from Salesforce to eight popular cloud data warehouses including Amazon S3, Microsoft Azure SQL, and Panoply. Some examples of the customer data you can access are accounts, leads, opportunities, and contacts. The Stitch Salesforce integration supports table and column selection, scheduling, and logging and reporting. This integration is designed for developers, and you need a paid Stitch account. But Stitch also offers a free marketing cloud integration tool for email marketing automation and management.

Stitch handles bulk and incremental data updates. Its replication engine uses multiple strategies to deliver data to users. This BI tool integrates your Salesforce data with a massive suite of data analysis tools. The Stitch data pipeline collects, transforms, and loads Salesforce data into its own system, where it automatically produces business insights on your raw data. Salesforce is just one of many datastores that work with Stitch.

6. Blendo

Blendo is a self-service ETL tool that quickly extracts Salesforce cloud data and transmits it to your data warehouse. Blendo’s Salesforce data connector automatically syncs your source data as soon as you sign in. After you set up the incoming end of the data pipeline, you can load it into these cloud storage destinations: Panoply, Redshift, Google BigQuery, MS SQL Server, PostgreSQL, and Snowflake. This data pipeline also connects to seven popular BI tools including Metabase, Looker, and Power BI, so it’s easy to get business insights from the Salesforce customer data you pull into Blendo. There’s a separate Blendo integration tool for Salesforce Pardot.

The Blendo ETL is a fast and safe way to pull and analyze your Salesforce data and then securely store it in a cloud data warehouse. But it might not be the right choice if you need continuous updates to your output. Blendo’s data extraction isn’t automatic. You need to set a schedule for data load. Then your data analytics tools, output, and warehouse get updated on that schedule, not in real-time. And of course, this ETL won’t work if your BI tool or data warehouse aren’t supported by Blendo.

7. Fivetran

The Fivetran data pipeline extracts Salesforce source data and connects it to cloud and on-premises warehouses that include Panoply, Amazon Redshift, MySQL, Mailchimp, and many more. Fivetran has connectors to Salesforce and Salesforce products: Commerce Cloud, Pardot,  and Marketing Cloud. If you want to try it out, but you don’t have a data warehouse, Fivetran provides one during the trial period and then helps you set up something permanent if you become a subscriber. This user-friendly ETL tool uploads and preps all your current and historical data, but you can select data to exclude from replication. To minimize load on your systems, Fivetran reads logs of your changes instead of querying your live data. Then it streams the updates in near real-time to your output and data warehouse.

Both developers and non-programmers can set up a Fivetran pipeline with just 10 clicks. There’s no coding or development work needed after that because Fivetran continuously tests and maintains your automated ETL. And its built-in schemas designed for analytics save your tech team even more time.

Load data into Salesforce

1. dataloader.io

Saleforce’s own ETL product dataloader.io is the company’s supported method for loading data into the Salesforce data warehouse. If you have a Salesforce account, you can sign in to dataloader.io with the same credentials. But dataloader.io isn’t free with Salesforce. It’s a separate paid service. This cloud data pipeline is powered by the Salesforce Mulesoft Anypoint Platform. It goes beyond what Mulesoft can do as an ETL. dataloader.io quickly processes and loads unlimited amounts of data into Salesforce. Its data mapping feature saves time by automatically prepping your data to fit into Salesforce fields. The dataloader.io ETL also has connectors to pull data from Box, DropBox, FTP, and SFTP repos.

dataloader.io has a clean, user-friendly UI with features like search filters and keyboard shortcuts. You can schedule automatic data loads to happen as frequently as once an hour, but dataloader.io won’t refresh your Salesforce data warehouse in real-time.

2. Data Loader

Data Loader, also called the Apex Data Loader, is a Salesforce Developers ETL tool that runs in the Force.com platform, now called the Lightning Platform. On the platform, you can create custom apps that populate with your Salesforce data and other data sources through the Data Loader pipeline. The Data Loader pulls data from database objects and loads it into Salesforce objects. Its basic features include a graphical user interface (GUI), wizard, logging, drag-and-drop field mapping, and more. Data Loader can upload large files with up to 5 million records. You’ll need another ETL that works with Salesforce if you’ve got more than 5 million. You can also use Data Loader from the command line interface (CLI), but only on Windows. By running Data Loader in the CLI, you can write SOQL queries and SQL code and set up automatic data processing.

To run Data Loader, you need the Salesforce SOAP API. The API is only included with the Enterprise, Unlimited, and Developer editions of Salesforce. With Data Loader’s open-source version on GitHub, you can run it on more operating systems than just Windows.

3. Skyvia

Open-source Skyvia, is a cloud data integration tool that loads data into Salesforce. This easy-to-use ETL doesn’t need any coding work to move data from about 50 major cloud apps and on-premises data sources into Salesforce storage. The Skyvia data pipeline connects to many BI tools, three G Suite apps, and database systems like MySQL, PostgreSQL, SQL Server, Amazon RDS, and Google BigQuery. With Skyvia, you can either load data manually or schedule automatic uploads. Features include run history, mapping, lookups, and transformation expressions that help you integrate mismatched data types into Salesforce.

Skyvia’s GUI makes it a good Salesforce data-load choice for non-programmers or developers who want to save time and effort. But it doesn’t run in a CLI development environment.

4. Jitterbit Data Loader for Salesforce

Jitterbit Data Loader for Salesforce is an open-source ETL tool from Salesforce AppExchange member Jitterbit. This data pipeline platform is fully integrated on the Harmony Cloud. It’s feature-heavy GUI doesn’t need coding to load data into Salesforce from on-premises, FTPs, and a limited number of database systems that include MySQL, Oracle, and SQL Server. You can schedule automated uploads to Salesforce daily, weekly, or monthly. And you can map your incoming source files to Salesforce objects for faster integration. More features include logging, built-in functions, and unlimited, bulk, and parallel processing.

Jitterbit isn’t a development tool. It’s user-friendly GUI is a multifunctional ETL designed for Salesforce admins. The Jitterbit platform includes documentation and help with its complex features.