Across industries, a rapidly rising demand for data scientists and engineers by far outstripps supply. In fact, the shortage of skilled IT professionals is one of the main factors that is slowing companies’ digital transformation. And digital transformation is on everyone’s minds this year. As QuickBase survey shows, a majority (68 percent) of senior management rated it as a top organizational priority.
A Glassdoor “Best Jobs in America 2017” survey, backs this up, placing data scientists and data engineers in the top two spots (data scientist claiming the top spot for the second year in a row).
In this post, I want to spread the joy, sharing with you some details on trends in jobs for data scientists and engineers, what qualifications are needed to take advantage of these trends, expected pay grades, and potential job prospects.
If we drill down to look at Glassdoor’s figures, data scientists (“the sexiest job of the 21st century”) receive an average salary in the US of $111,000. At the time of writing this, Glassdoor lists 4,184 job openings with the title of data scientist.
What do data scientists do? Essentially, they extract value out of data. As described by Toptal, they “proactively fetch information from various sources and analyze it for better understanding about how the business performs, and build AI tools that automate certain processes within the company… [A data scientist] may be X% scientist, Y% software engineer, and Z% hacker.”
Data scientists typically have:
- Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
- Experience with common data science toolkits, such as R, Weka, NumPy, MatLab, etc.
- Experience with data visualization tools, such as D3.js, GGplot, etc.
- Proficiency in using query languages such as SQL, Hive, Pig
- Experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Good applied statistics skills, such as distributions, statistical testing, regression, etc.
- Good scripting and programming skills
Data engineers are also in incredibly high demand and can expect a median base salary of $106,000—which may have its own sex appeal. Again, at the time of writing, Glassdoor’s lists nearly 2,600 such positions with the title “data engineer.”
Data engineers essentially work to support data scientists and analysts, providing infrastructure and tools that can be used to deliver end-to-end solutions to business problems that can be developed rapidly and maintained easily. Data engineers build scalable, high performance infrastructure for delivering clear business insights from raw data sources; implement complex analytical projects with a focus on collecting, managing, analyzing, and visualizing data; and develop batch & real-time analytical solutions.
Regarding their experience and skill sets, Toptal writes that they have:
- Working experience with languages like Python and Pandas. Prior experience with Luigi can be a plus as can working experience with Scala and Spark.
- Familiarity with Google Cloud Platform (e.g. GCS and BigQuery).
- Working experience with Ruby and Rails is a plus.
- Familiarity with the basic principles of distributed computing and data modeling.
- Extensive experience with object-oriented design and coding and testing patterns, including experience with engineering software platforms and data infrastructures. Familiarity with functional programming concepts is a plus.
- Outstanding data engineers are multi-disciplined, are excellent problem solvers who are team oriented and curious.
Data engineers need to be willing to roll-up their sleeves and get down into the trenches to wrangle data. Your data munging skills should let you deal with, what Dave Holtz, described on a Udacity post, as ““data imperfections include[ing] missing values, inconsistent string formatting (e.g., ‘New York’ versus ‘new york’ versus ‘ny’), and date formatting (‘2014-01-01’ vs. ‘01/01/2014’, unix time vs. timestamps, etc.).”
When it comes to data infrastructure, there's no better choice than Panoply. Panoply's streamlined ETL and data warehousing saves thousands of code lines and countless hours of debugging and research. Learn more about what Panoply can do with your business with a personalized demo!