6 best open source PHP ETL tools
PHP is a powerful tool for scripting code for your data operations. And it works when you build anything from a simple blog to a full-service website. PHP is a good option if you want to deliver results, but not your code, as the final output on the client side.
Along with PHP, you need an ETL (Extract, Transform and Load) to pull data from your source databases, transform it and move it into your data warehouse. PHP ETL libraries are convenient for programmers because they do ETL in the command line. Check out some of the open source PHP ETLs on GitHub. Most but not all are libraries, not standalone downloads.
This posts talks about some of the most documented and supported PHP ETLs. We include a few others from GitHub contributors that you might want to try if you’re curious and you can work with only basic instructions. If your data projects need a heavyweight PHP ETL tool with more support, see the info on a few paid ones at the end of the blog.
Free and open source PHP ETLs
1. PHP ETL
PHP ETL from GitHub contributor Leonardo Marquine is a well-documented PHP library that performs ETL on the command line. It can pull data from your source files, perform transformations and store the output in tabular databases. For both source data and the data warehouse, PHP ETL supports MySQL, PostgreSQL, SQL Server and SQLite. Data types it extracts are CSV, JSON, and XML. And PHP ETL also has extractors for iterable items, SQL queries, and database tables. Its transformations are limited to encoding and decoding JSON, renaming and trimming columns, deleting duplicate rows and converting text case.
PHP ETL is a good choice when command-line work is convenient for you, and you don’t need the more robust functionality of some other open source ETLs. Instructions for using it are available on GitHub. But PHP ETL probably isn’t a good choice for beginners because of the work you’ll need to do for set up and use.
YaEtl is a PHP ETL library with functions that go beyond the basics of PHP ETL. YaEtl uses directed graphs like Airflow made up of executable Nodes (NodalFlow) to transform, join and branch dataflow from multiple sources. It supports a variety of formats for input and output. You can also extract and load data without transforming it. YaEtl’s Node design saves time because it can chain ETLs and create and organize complex tasks that are reusable and automatic.
The Move to Islandora Kit (MIK) is a PHP ETL that was designed to prepare data for upload to Islandora. There’s strong documentation and support for this open source ETL, including a wiki and a site where you can log issues. But MIK’s use is limited to its design purpose.
The MIK ETL converts your source data into XML packages that can be imported into Islandora, but it doesn’t do the upload. With this intentional limitation, quality checks can be run on data before the output is committed to Islandora. Certain people or teams can work on the data extraction and transformation, and others can check the output before it gets uploaded.
MIK was created by Simon Fraser University Library developers for a specific data migration. But some other universities and organizations have made use of it. MIK’s installation directions tell you how to either clone its GitHub repo or get the download.
PHP ETLs from GitHub contributors
If you’re curious to see what some other open source ETLs can do, and you’re comfortable with figuring things out on your own, you might try some of these PHP ETLs with light documentation. Even more ETLs are in progress on GitHub, so check back later to see what’s new.
Bentool is a basic PHP ETL from GitHub contributor Beno!t POLASZEK with enough documentation to get it up and running. It uses CSV, JSON and TXT files for source data, and it transforms and loads iterable data.
php-etl is a PHP library from GitHub contributor Florian Klein that runs ETL commands. It works with PHP 5.5+ only, and there’s not much documentation available on this open source ETL other than how to install it.
bitexpert/etcetera from GitHub contributor Stephan Hochdörfer is a lightweight PHP ETL that reads, extracts, writes and processes data. An instruction demo is available, and the Readme file says more information will be available soon.
Paid PHP ETLs
Panoply is a great all-in-one data platform that covers data warehouse, automation and ETL for many PHP-based applications like Wordpress and other MySQL-based apps. It’s the only cloud service that combines an ETL with a data warehouse. It has a PHP SDK that sends events directly from PHP to your Panoply database in real time. Panoply builds and manages cloud data warehouses for you. It saves time and effort because you don’t have to wait for transformations to complete before you load your data, and you won’t need to set up a separate destination to store the data you pull with Panoply’s ETL.
8. ETLS from InfoSpectrum
If you’re working in the Customer Relationship Management (CRM) environment, you might use PHP script to generate data without exposing your code on the client side. ETLS from InfoSpectrum is a paid ETL tool for SugarCRM and SuiteCRM that supports PHP for data transformations. Excel, Access, SQL Server, CSV and ODBC work as source data.
In addition to its paid tool, ETLS has some free ones. logiZip is a utility tool that deploys logic hooks and jobs and provides some automation. PShellExec Secures and Executes PowerShell scripts is a security tool. And CrystalPower runs Crystal Reports® is a reporting tool.