A data engineer conceives, builds and maintains the data infrastructure that holds your enterprise’s advanced analytics capacities together.
These are the capacities that allow your enterprise to leverage the multiple, disconnected streams of data into rational, data-driven decisions and customer engagement.
Given the importance of this undertaking, a great data engineer is a critical element of your team that brings advanced analytics for decision-making to your enterprise. She or he will be responsible for designing, building, and constantly improving your company’s analytics infrastructure.
On the flip side, data engineers do not come cheap, given that the average annual salary of a data engineer in the US is $124,000. A poor data engineer will not only slow the development your company’s data-driven decision-making capacity, but will drain your enterprise.
As Forbes points out, “in Silicon Valley, startups that succeed run lean.” The article goes on to note that “even larger, established companies can't afford the high price of poor performers”.
Knowing what makes a great data engineer is a critical first step towards identifying and onboarding the right data engineers to make your enterprise succeed. So what are the traits of outstanding data engineers?
While most successful data engineers will have computer science or IT backgrounds, many great data engineers come from a range of engineering backgrounds—frequently, but not limited to, computer engineering.
. . . more of a generalist backend programmer . . . with the ability to integrate with diverse APIs, and understand multiple languages well enough to work in them (though, of all the languages, Python is probably the most common and important). Data engineers who are well versed in Matlab or R, or one of the other main languages your data scientists use, are doubly valuable. If you are dealing with billions of records or more, you need someone who is familiar with distributed storage and processing tools like Hadoop or Spark.
Data Engineers are problem solvers focused on building and maintaining infrastructure and architecture for data generation. As Karlijn Willems of the DataCamp community writes,
“Data engineers will need to recommend and sometimes implement ways to improve data reliability, efficiency, and quality.” To do so, they will need to blend the practical, creative problem solving of an engineer together with “a variety of languages and tools to marry systems together or try to hunt down opportunities to acquire new data from other systems so that the system-specific codes . . . can become information in further processing by data scientists.”
“Very closely related to these two is the fact that data engineers will need to ensure that the architecture is in place that supports the requirements of the data scientists and the stakeholders, the business.”
Essentially, a great data engineer is a skilled problem-solver who loves to build things that are useful for others. A great data engineer must also have specialist knowledge of tools and languages relevant for data wrangling as well as more generalist knowledge of a range of fields.
Team-oriented and collaborative
Given the shifted understanding of the need to balance analysis with data management, companies are increasingly looking to weave together data science teams instead of hiring unicorn data scientists. For the data engineer, that means that in order to excel, they need to be able to collaborate effectively within IT and cross-enterprise teams.
This requires not only the ability to bring up-to-date data engineering expertise to the table, but also to be able to achieve alignment with broader enterprise needs—ensuring that all are able to advance enterprise objectives (using specialist knowledge, such as of APIs, so all can meet general KPIs).
This requires curiosity and a willingness to truly understand the individual, team, and strategic enterprise objectives that are advanced by one’s work. A great data engineer must also be strategic and collaborative in outlook.
In order to effectively provide for the needs of all stakeholders across and outside of the enterprise, he or she must have the acumen to understand and prioritize these needs. This mindedness is just as essential as the engineering and computer science skill.
And did I mention the endless store of patience when dealing with non-technical personnel?
Given the endless problem-solving that data engineers face on a daily basis, a curiosity to know how things work and how to make them better is essential.
Given the fast-pace of change in our world, an outstanding data engineer has embraced a desire--a passion even--for continuous learning. Lifelong learning enables him or her to remain current regarding cutting-edge technologies relevant for data engineering. Attending industry events and staying in the know is crucial.
Some of this learning should be directed at examining the potential for automation that is revolutionizing the work of the data engineer, taking away the need for the overhead of preparing and modeling data, and managing cloud infrastructure.
This investment complements the other elements of learning that promote the strategic, team orientation. This is because tools that automate ETL, for example, free-up data engineers to focus on creative ways to improve an enterprise’s data infrastructure (as opposed to spending this estimated 60% of time wrangling data).
A great data engineer, working in concert with others across an enterprise, can turn this saved time into clear competitive advantages.
To learn more about how Panoply utilizes machine learning and natural language processing (NLP) to learn, model and automate the standard data management activities performed by data engineers, sign up to our blog.