For any data operations, you need an ETL (Extract, Transform and Load) to process your data from its source to your output or data warehouse. And when you build apps with Node.js, you need an ETL tool that works with it. With any other data work, you might want the flexibility and advantages an ETL built on Node.js offers. In this post, we'll cover some of the top open source Node.js ETL toolss and what they do best.
Node.js and most open source Node.js ETL tools might be a challenge for non-programmers and beginners, but the user-friendly features of paid ETL tools can make things easier. Even pro-level developers can save time and effort with the powerful features that subscription data management tools deliver. That’s even more true if you have critical big data jobs that demand enterprise-level support. We're going to start with open source ETL options, but at the end of the post, we'll list some paid ETLs that offer power, speed, ease of use and specialized tools for your business intelligence tasks.
Open source ETLs
Empujar from TaskRabbit is a Node.js-based ETL tool that pushes data and does backup and other data operations. It takes advantage of Node.js’s asynchronous nature to run data operations in series or parallel. Empujar uses a book, chapter, page format. Top-level objects are called books. They contain chapters, which are run in order, with pages that can be run in parallel up to a limit that you set.
Out of the box, this open source ETL tool connects to MySQL, Amazon Redshift, Elasticsearch, FTP, and S3. Empujar’s GitHub documentation explains how to add these built-in offerings, as well as how to create your own connections.
This well-documented ETL tool supports several databases: Postgres, MSSQL, MySQL, MariaDB, SQLite3, and Oracle. Nextract can extract and output CSV and JSON data, and it also extracts data from database queries and outputs it to tables. With its built-in plugins, you can perform even more ETL operations like sorting, filtering, and basic math. One current limitation is that Nextract runs on the resources of a single machine, so it doesn’t work well with big data.
This open source ETL package has API reference pages that explain how to do tasks like writing scripts and creating drivers. Extraload’s GitHub documentation also guides you on installing the drivers for CSV, MySQL, XML and XPath.
Datapumps from agmen.hu is a basic ETL tool for Node.js that uses "pumps" to read input and write output. A simple example is exporting data from MongoDB to Excel. For complex ETL work, you can create a group of pumps. Some features of this open source ETL tool are data transformation, encapsulation, error handling and debugging.
There’s some additional setup needed with Datapumps because it doesn’t do all the ETL work on its own. The Datapumps components only pass data in a controlled flow. It relies on 10 different mixins to import, export and transfer your data. Each time you add a new mixin, you’ll have to fork Datapumps and create a pull request in GitHub.
proc-that’s GitHub page shows you how to import its ETL tool and then add its built-in transformers and loaders. If you want to implement your own extractors, transformers, and loaders, the creators of proc-that invite you to contribute them to their list in the proc-that GitHub repo. As of this writing, proc-that has a build-status failing badge, so you might want to check that before you get started.
A Not-so-documented open source Node.js ETL tool
If you’re curious to see what some other open source ETL toolss can do, and you’re comfortable with figuring things out on your own, you might try this Node.js-based ETL—with not much documentation. Even more ETLs are in progress on GitHub, so check back later to see what’s new.
Paid ETLs for Node.js
Event is a subscription ETL manager that builds, deploys and scales RESTful Node.js microservices—small independent processes that together form a complex app. Its secure platform creates database-backed services (RESTful), including IoT, that are built to work on the web. With user-friendly Event, you can deploy a public-facing microservice with just one-click. It embeds an ETL process into your microservice that collects structured and unstructured data, including API data from multiple sources, in real time.
Panoply is easy for non-programmers, but it also delivers the unbeatable speed and support that professional engineers need for big and small data ops. This automated ETL data platform pulls data from any source, simplifies it, and stores it all in one place. Panoply continuously streams data in real time to your output. It’s the only service that combines a fully integrated ETL with a cloud data warehouse that it builds and manages for you.
The user-friendly Panoply BI platform has one-click connectors to many data source apps that support Node.js script including MongoDB (recommended by Node.js), MySQL and PostgreSQL. Panoply uploads your raw data in minutes and automatically sorts and models it during the upload, so you can start querying all your data right away. You can try Panoply out for free or get a personalized demo.