If you want to be a data scientist, you probably have asked:
Which data science languages/tools should I learn?
What are the top skills the employers want?
What is the minimum requirement for education?
To answer these questions in a data scientist’s way, we applied Natural Language Processing (NLP) techniques on job descriptions from Indeed. We included 2,681 data scientist job postings from 8 North American cities on January 25th, 2020.
Soon we will also post the step-by-step analysis in Python behind this article. Stay tuned.
Update: We have posted the technical part of this article. Check out How to use NLP in Python: a Practical Step-by-Step Example — To find out the In-Demand Skills for Data Scientists with NLTK.
Let’s see the results first!
Top Tools In-Demand
Among the job postings included in the analysis, 86% listed specific tools wanted for the jobs. Data scientists can’t function without these tools/languages.
The top 10 tools required by the employers are:
- Amazon Web Services
Python is the indisputable winner. 62% of the job postings ask for knowledge of Python. While some others demand Python-related tools or packages such as Pytorch, Pandas, Numpy.
Because Python is such a powerful yet simple language, it can do a lot more than machine learning modeling. For example, data scientists can also use it to automate tasks, use cloud services, and conduct web development.
Which tool should we learn first to become a data scientist?
SQL also appeared in 40% of the job postings. It is still a good idea to know this database querying language. Classics never go out of style.
People often consider R as a competitor for Python. But, according to this analysis, it is not close. Only 39% of the data scientists’ job descriptions mentioned R.
Other big data-related tools such as Spark, Cloud, AWS, Tensorflow, and Hadoop are in demand as well. These are essentials when the employer has a more substantial amount of data.
Besides Python and R, Java is valuable for some employers as well. It is more important for data engineers while still useful for data scientists to know.
While SAS is still popular among big corporations, it is less required for data scientists. The software is better for reporting but less flexible for data analysis.
The top 50 list of tools mentioned in data scientists’ job postings is below.
Each tool category is not counted mutually exclusively in the job postings, i.e., one posting could mention multiple tools. The “nothing specified” means the job description didn’t ask for any tools.
Top Skills In-Demand
Besides the tools, data scientists must have specific skills or techniques to succeed.
The top 10 skillsrequired by the employers are:
- Machine Learning
- Deep Learning
- Natural Language Processing
For many, machine learning is all that data scientists do. That concept does have a reason. 64% of the job postings mentioned machine learning.
Data scientists need to learn different algorithms such as supervised/unsupervised learning, reinforcement learning; also different models such as decision trees, artificial neural networks, support vector machines.
Machine learning is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
Based on Wikipedia’s definition, it is a hybrid field between computer science and statistics.
Statistics knowledge sets solid foundations for data scientists. 59% of the job postings required it. To become data scientists, we should learn data collection, experimental design, probability distributions, and other statistical concepts.
Research is also considered a vital skill for data scientists. 50% of the jobs asked for it. Data science is a rapidly developing field. Data scientists must be creative and adaptive; to explore and apply new concepts.
Subsets of machine learning or statistical skills, including prediction, recommendation, optimization, deep learning, natural language processing, and regression, are also in demand as well.
Different employers require different techniques. For instance, a credit card company might want us to predict the customer’s credit profile; a chatbot company would like us to analyze natural languages.
Data visualization is also gaining popularity. A picture is worth thousands of words; a right graph or image is the summary of big data. Data scientists need to present their analysis results well.
The top 50 list of skills mentioned in data scientists’ job postings is below.
Like tools, each skill category is not counted mutually exclusively in the job postings, i.e., one posting could mention multiple skills.
Minimum Education Required
Moreover, data scientists are often required to have a higher education level.
- 49% of the job postings asked for a bachelor’s degree as the minimum education level.
- 27% mentioned master’s degree or above.
- 9% asked for Ph.D.
- Only minimal postings asked for postdoc or MBA.
- 15% of the postings with degree requirements “not specified” could be more open about it.
So good news, a bachelor’s degree in a related field is enough. That said, one would be more competitive with an advanced degree of master’s or Ph.D.
To become data scientists is an exciting though challenging journey.
But as we all know, data science will continue to be in-demand in the foreseeable future.
Whether you already have these skills or not, it’s never too late to start!