Across industries, a rapidly rising demand for data scientists and engineers by far outstripps supply. In fact, the shortage of skilled IT professionals is one of the main factors that is slowing companies’ digital transformation. And digital transformation is on everyone’s minds this year. As QuickBase survey shows, a majority (68 percent) of senior management rated it as a top organizational priority.
A Glassdoor “Best Jobs in America 2017” survey, backs this up, placing data scientists and data engineers in the top two spots (data scientist claiming the top spot for the second year in a row).
In this post, I want to spread the joy, sharing with you some details on trends in jobs for data scientists and engineers, what qualifications are needed to take advantage of these trends, expected pay grades, and potential job prospects.
If we drill down to look at Glassdoor’s figures, data scientists (“the sexiest job of the 21st century”) receive an average salary in the US of $111,000. At the time of writing this, Glassdoor lists 4,184 job openings with the title of data scientist.
What do data scientists do? Essentially, they extract value out of data. As described by Toptal, they “proactively fetch information from various sources and analyze it for better understanding about how the business performs, and build AI tools that automate certain processes within the company… [A data scientist] may be X% scientist, Y% software engineer, and Z% hacker.”
Data scientists typically have:
- Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
- Experience with common data science toolkits, such as R, Weka, NumPy, MatLab, etc.
- Experience with data visualization tools, such as D3.js, GGplot, etc.
- Proficiency in using query languages such as SQL, Hive, Pig
- Experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Good applied statistics skills, such as distributions, statistical testing, regression, etc.
- Good scripting and programming skills
Data engineers are also in incredibly high demand and can expect a median base salary of $106,000—which may have its own sex appeal. Again, at the time of writing, Glassdoor’s lists nearly 2,600 such positions with the title “data engineer.”
Data engineers essentially work to support data scientists and analysts, providing infrastructure and tools that can be used to deliver end-to-end solutions to business problems that can be developed rapidly and maintained easily. Data engineers: build scalable, high performance infrastructure for delivering clear business insights from raw data sources; implement complex analytical projects with a focus on collecting, managing, analyzing, and visualizing data; and develop batch & real-time analytical solutions.
Regarding their experience and skill sets, Toptal writes that they have:
- Working experience with languages like Python and Pandas. Prior experience with Luigi can be a plus as can working experience with Scala and Spark.
- Familiarity with Google Cloud Platform (e.g. GCS and BigQuery).
- Working experience with Ruby and Rails is a plus.
- Familiarity with the basic principles of distributed computing and data modeling.
- Extensive experience with object-oriented design and coding and testing patterns, including experience with engineering software platforms and data infrastructures. Familiarity with functional programming concepts is a plus.
- Outstanding data engineers are multi-disciplined, are excellent problem solvers who are team oriented and curious.
Data engineers need to be willing to roll-up their sleeves and get down into the trenches to wrangle data. Your data munging skills should let you deal with, what Dave Holtz, described on a Udacity post, as ““data imperfections include[ing] missing values, inconsistent string formatting (e.g., ‘New York’ versus ‘new york’ versus ‘ny’), and date formatting (‘2014-01-01’ vs. ‘01/01/2014’, unix time vs. timestamps, etc.).”
It is worth noting that advances in solutions that use machine learning and natural language processing (NLP) to learn, model and automate standard data management activities iis shifting the amount of down-and-dirty work required of data engineers.
For example Panoply.io unique self-optimizing data warehouse architecture automates and simplifies data analytics, eliminating the overhead of preparing and modeling data, and managing cloud infrastructure. Panoply utilizes machine learning and natural language processing (NLP) to learn, model and automate the standard data management activities performed by data engineers, server developers and data scientists, saving thousands of code lines and countless hours of debugging and research.
Data Girl - Women in data science and engineering
There is a well reported underrepresentation of women in data science-related roles. But there are extremely good reasons to believe this is changing—one of the factors driving this being the incredible demand for qualified talent.
There are a number of signs of the interest in leveling the playing field to ensure that the burgeoning data science and engineering fields are able to take advantage of all of the human capital available. Some examples of the interest directed towards this can be seen in reception to the first conference on women in data science (WIDS) that was held in Stanford University in 2016, with the second, now international, conference hosted in February 2017 taking place at more than 80 locations worldwide. Some examples of women who have achieved extraordinary success in data science can be found here.
Be part of the digital transformation
So come be part of the digital transformation that is sweeping our world! I have found that working in the data science field provides me with an exciting, challenging, satisfying, and financially rewarding career. Come join me!
Want to learn more how data warehousing concepts can help you achieve digital transformation? Click to learn more about data warehouse concepts.