How to Learn Data Science Online: ALL You Need to Know
 Python, SQL, Machine Learning, Portfolios plus other Online resources

Lianne & Justin

Lianne & Justin

Share on twitter
Share on linkedin
Share on facebook
Share on email
compass into data science how to learn
Source: Adobe Stock

This is a complete roadmap/curriculum of getting into data science with online resources.

Whether you want to learn for free or more efficiently, this guide will walk you through the step-by-step process that’ll put you on the right path. We’ll talk about skills, online courses, books, and other resources.

You’ll discover:

  • the basics of data science (Python, SQL, Machine Learning/Statistics) and How to learn them.
  • Why and How to build a data science portfolio.
  • other tips/resources to dive into the world of data science.

Start your data science journey today!


Who are we, and why follow this roadmap?

Justin and Lianne both have worked in the data science field for 5+ years. We’ve worked in various industries such as marketing, fraud, anti-money laundering, and big data technologies. Justin has a bachelor’s degree in computer engineering and a master’s degree (later) in statistics; while Lianne has both a bachelor’s and a master’s degree in statistics.

Data science is the hybrid of computer science and statistics. We both started from one side missing. So we understand the challenge of learning the other field to get into data science. With our working experiences, we also understand what the industry requires.

We write this roadmap (and the Just into Data blog) to help more people to get into this fun and promising field.

The recommendations are all based on our years of experience plus hours of researching.

Hope you find something helpful.



Step #0: Get a Feel for Data Science

Why Data Science? Should I get into this field?

Take a moment to imagine:

  • you can explore the world based on data analytics to discover the pattern/truth.
  • you can impact the world by using advanced technologies, machine learning algorithms.
  • you have a good salary job that you are passionate about.

If all these sound like what you want, then data science is for you!

Check out these Youtube videos to get an even better idea about data science:

Keep in mind that data science can be useful for many different industries, such as:

  • robotics
  • marketing
  • finance
  • healthcare

And it can be used for many different applications, such as:

  • analytics
  • prediction
  • classification
  • recommendation
  • Natural Language Processing (NLP)

There are also various data science career paths. You can be a reporting analyst, a data analyst, or a data scientist. They all require different strengths/skill sets.

After all, it never hurts to have a new skill set.

Once you have made up your mind, let’s begin the hard (but fun) work.

Step #1: Learning the Basics of Data Science

Data science is a hybrid among different fields, including computer science, statistics, information science, mathematics. So you need to have both programming/coding skills and theoretical knowledge.

What are the basics of data science we need to know?

In the end, all our hard work is to launch a career in data science. You might wonder what do the employers want:

Which data science languages/tools?

What are the top skills?

What is the minimum requirement for education?

That’s why we summarized answers to the above questions based on Indeed job postings. Check out the post What are the In-Demand Skills for Data Scientists in 2020 to find answers.

To make it simple to start, we’ll be focusing on the technical skills below:

  • Python
  • SQL
  • Machine Learning (including basics statistics)

Tip: there are also other tools/skills such as production systems, soft skills. But you’ll discover them along the way. They are also dependant on the industry. There’s no need to stress out with everything at the beginning.

Disclosure: All the recommended courses, books below are picked and tested independently by us.
Some of the courses and books have affiliate links for the platforms or Amazon, which means if you buy through them, it won’t cost you a penny more, but we’ll get a small commission. This helps to keep the lights on for us. Thanks!

Data Science Online Courses

Which courses should we pick from the many different ones?

Below we summarize the features of the most popular platforms.

We will provide more detailed recommendations for each topic of Python/SQL/Machine Learning. Please keep reading to find out.

datacamp data science
  • cater to people with little experience
  • focus on the basics and easy to follow
  • with Python/SQL/R interactive environment
  • with data science projects section
edx  data science
  • can audit some of the courses for a specific time for free
  • certificates available with pay
  • mostly taught by professors from colleges/universities such as MIT, Harvard
coursera  data science
  • can audit some of the courses for free
  • certificates available with pay
  • mostly taught by professors from a wide range of top colleges/universities
udemy  data science
  • have the largest selection of courses, anyone can become instructors on the platform
  • good review systems
udacity  data science
  • built and recognized by top tech companies such as Google, AWS, IBM
  • more in-depth: Nanodegrees take 4-5 months to complete (5-10 hrs/week)
  • has real-life projects that are reviewed
  • offer technical mentor support
  • offer personal career services

Python

Python is the most required data science language, plus it is free to use. It is a powerful language that can help you achieve most tasks with data science, and even more!

R is another popular data science language. Yet, if you are still struggling with choosing between R or Python, read What are the In-Demand Skills for Data Scientists in 2020.

Based on our experience, we compiled a list of things that are essential for data science:

  • numeric types
  • boolean types
  • lists
  • sets
  • dictionaries
  • functions
  • dates
  • files
  • string manipulation
  • conditionals – if/else
  • loops
  • packages – installing, importing, etc

That sounds like a lot of concepts.

Update: Just into Data is now offering a FREE Python crash course: breaking into data science

The course is beginner-friendly that covers the basics you need to start data science. Sign up/Learn More by clicking the link below!

But don’t worry, we reviewed online courses that would help. All the courses below contain most of the concepts we just mentioned, and you can easily fill in the gaps during/after the courses.

Check out the reviews below:

  • content quality/coverage is based on the above list of essential knowledge and our judgment.
  • affordability is based on money. The more $ signs, the more expensive the courses.
  • recommended courses include the course(s) that we consider to be the best that covered the basics of Python.
    We reviewed different options on each platform one by one.
Content Quality/CoverageAffordabilityRecommended Courses
DataCamp 4/5$courses are offered in modules,
so need all the following:
Introduction to Python
Intermediate Python
Writing Efficient Python Code
Working with Dates and Times in Python
edX4/5audit for some time for free or
earn the certificate for $  
Python Basics for Data Science
Coursera3.5/5audit for free or
earn the certificate for $$  
Python for Everyone Specialization
Udemy4.5/5$$$Complete Python Bootcamp: Go from zero to hero in Python 3
Udacity5/5$$$$$Programming for Data Science with Python

Tips:
Most of the courses above don’t cover the installation procedure, which could be confusing. Check out our post for step-by-step instructions: How to Install/Setup Python and Prep for Data Science NOW.
It is essential to practice while learning to code. We can’t master every package/function in Python at once. Just get the basics first and learn others while practicing!

Before moving on to machine learning, it is also necessary to get familiar with two must know packages:

  • NumPy: a low-level package that allows for efficient numerical operations. All the other Python machine learning packages use it.
    You can read the tutorial on their website, which is also a good reference to keep.
  • Pandas: a package built upon NumPy, which is used for data cleaning, munging, and exploration.
    It’s easy to pick up but can get confusing since there are different ways of doing things. Read this material as a starting point. You may also try searching for some online courses to get more in-depth. We will also recommend some books below.

SQL

SQL is the classic and still the dominant language to extract data from databases. Most companies have data in databases, which makes SQL an essential skill to have for data science.

Based on our experience, we compiled a list of things that are essential for data science:

  • select
  • filter
  • join
  • aggregates
  • group by
  • subqueries
  • expressions
  • create tables
  • indexes/keys
  • window functions
  • database table diagrams

Check out the reviews below:

  • content quality/coverage is based on the above list of essential knowledge and our judgment.
  • affordability is based on money. The more $ signs, the more expensive the courses.
  • recommended courses include the course(s) that we consider to be the best that covered the basics of SQL.
    We reviewed different options on each platform one by one.
Content Quality/CoverageAffordabilityRecommended Courses
DataCamp 5/5$courses are offered in modules,
so need all the following:
Introduction to SQL
Joining Data in SQL
Intermediate SQL
Exploratory Data Analysis in SQL
PostgreSQL Summary Stats and Window Functions
Introduction to Relational Databases in SQL
edX4/5audit for some time for free or
earn the certificate for $  
SQL for Data Science
Coursera4/5audit for free or
earn the certificate for $$  
SQL for Data Science
Udemy 4/5$$$The Complete SQL Bootcamp
Udacity3.5/5freeSQL for Data Analysis

Machine Learning/Statistics

Machine Learning is what people talk about the most about data science. We leave it at the end since it requires programming skills to apply its algorithms. The courses of machine learning usually also include the basic concept of statistics.

Check out the reviews below:

  • content quality/coverage is based on our judgment.
    There are different topics in machine learning. We need to dig more in-depth on the specific field after learning the basics.
  • affordability is based on money. The more $ signs, the more expensive the courses.
  • recommended courses include the course(s) that we consider to be the best that covered the basics of Machine Learning.
    We reviewed different options on each platform one by one.
Content Quality/CoverageAffordabilityRecommended Courses
DataCamp 4/5$courses are offered in modules,
so need all the following:
Machine Learning with Tree-Based Models in Python
Time Series Analysis in Python
Linear Classifiers in Python
Cluster Analysis in Python
Extreme Gradient Boosting with XGBoost
Introduction to Deep Learning with Keras
edX3/5audit for some time for free or
earn the certificate for $  
Analyzing Data with Python
Coursera4/5audit for free or
earn the certificate for $$  
Both are more theoretical and less applied:
Applied Data Science with Python Specialization or
Machine Learning by Andrew Ng (not in Python)
Udemy 3.5/5$$$Machine Learning, Data Science and Deep Learning with Python
Udacity 5/5$$$$$Both degrees are very comprehensive and practical:
Data Analyst Nanodegree
Data Scientist Nanodegree

Data Science Free Online Tutorials

There are also some free online written tutorials. They are good as references since it’s text-based and easily searchable. But they are often unstructured and not as easy to follow as courses/books.

Python

We recommend online courses for people with no programming experience. But, if you are experienced with programming other than Python, you’ll be fine learning yourself.

Check out our post for step-by-step instructions to set up the environment: How to Install/Setup Python and Prep for Data Science NOW.

And take a look at W3School Python Tutorial or Python Doc.

SQL

Like Python, there is also W3School SQL Tutorial. Take a look to learn the basics.

PostgreSQL and MySQL are free open source databases. And they are also common in real production environments. You can try installing them and practice with sample databases:

It is also important to learn the more advanced Window functions. Check the instructions for popular servers below:

Machine Learning/Statistics

We couldn’t find any free online tutorials that cover all the necessary materials. So we recommend online courses or books.

If you are on a strict budget, try the free auditable online courses above.

Data Science Books

Reading is a traditional way of learning.

We recommend reading books that are either basics or only focusing on the concepts. The content of more in-depth programming sometimes could get outdated.

The books may cover multiple skills, so we put them together instead of dividing into Python/SQL/Machine learning sections.

Data Science from Scratch is the best book we found to start data science.

data science from scratch book

There are minimal prerequisites required to read this book. It covers a broad range of topics such as:

  • data science, machine learning introduction
  • Python and SQL basics
  • linear algebra, statistics and probability basics
  • data handling
  • models ranging from simple linear regression to more advanced neural networks

Tip: this book does not cover topics in-depth. But it’s an excellent way to start. Once you know the pieces of data science, you’ll find a clearer path.

Introduction to Machine Learning with Python: A Guide for Data Scientists is a good one to start if you want to focus on the machine learning aspect of data science.

introduction to machine learning with python

This book covers:

  • machine learning introduction
  • Python basics
  • machine learning algorithms, models

Also, check out these beginner-friendly books:

If you already know some basics, try the books below to advance the skills:

Tip: It is impossible to master all the concepts from these courses and books at once. It is easy to feel confused and frustrated, but don’t give up and practice.
That’s why the next step is very critical. You’ll get there after a few cycles of learning and practicing!

Summary

That is still a lot of options to learn the basics of data science! Sometimes too many options are not ideal.

The more options you have, the more likely you are going to choose none of the options.

So, to keep this very simple in general for you:

Step #2: Building a Data Science Portfolio

Building a data science portfolio is important for two main reasons:

  • There is no real gain if we don’t get our hands dirty and practice. So after learning the necessary knowledge, it’s good to apply them in real-life situations.
  • This is the best way of showing your potential employers what you are both passionate about and capable of doing.

Follow the general procedures below:

Find an Interesting Problem/Topic

This initial process is critical since people care more about what data science can do, instead of the theories and algorithms. They want data science projects to be interesting and useful. Any data science tools/models that we use should surround the problem we are trying to solve.

Remember that data science can be applied to many different fields, so:

  • if you have any dream industry, focusing on related problems could help you land the dream job.
  • if not, try starting from some problems in daily life that interest you.

To give some ideas,

  • we are into watching sports, so we find using data science to make some extra cash is cool.
  • we are interested in learning more about our favorite fitness YouTube Channel, so we dive into its data.
  • we want to learn more about coronavirus, the disease which largely impacted our lives. So we dug into its data as well.

I’m sure you can find an interesting topic that data science can help!

Look up Open Datasets and Articles related to the Topics

After having an idea of the problem, we can start looking for datasets for analysis.

Some data are easily accessible through public data sources such as:

Some might need you to scrape the data. But don’t worry, Python can help you to achieve it.

For example, to get the Indeed job posting data, we scraped data from the Indeed website with Python. Take a look at How to use NLP in Python: a Practical Step-by-Step Example.

Tip: many articles suggest attending Kaggle competition, but we found a lot of the datasets or solutions are not realistic.

Apply the Knowledge to the datasets

Once we have the dataset, it’s time to dive into it. We need to:

  • explore the data
  • clean the data
  • research a good solution to solve the problem
  • apply the algorithms/models to the data

If you prefer to start with some guidance, search for articles with clear step-by-step instructions and practice. For example, you may check out articles on our blog where we explain the full procedures and provide the Python code.

Tip: you may try implementing different models to explore the same dataset. It is a good way to showcase your skills.

Write to Show off the Work

In the end, we need to let the world know our work:

  • post your code on GitHub. Check out the Hello World GitHub Guide to learn the basics.
    Don’t forget to create a ReadMe file and comments. Your code might even help somebody else on the same topic.
  • write articles to summarize the work.
  • and post the articles on platforms such as:
    LinkedIn
    Medium
    Twitter
    Facebook groups
    – Or a personal website
    We love Medium since you even get some pay based on the views. The largest data science publication on Medium is Towards Data Science.
  • or pitch the articles to other publications such as KDnuggets, Dataquest.

Tip: the results of the project sometimes don’t have to be perfect. But make sure you explain how you approach the problem.

Step #3: Connecting with the World of Data Science

Now that we have a portfolio to demonstrate our skills. It is important to connect with the real world of data science.

As mentioned in the previous section, there are different social networks you may join:

You can also try to meet up with real people by attending events from Meetup and Eventbrite.

Stack Overflow is also a technology community to get answers to your questions while helping others.


Final Words

It is not an easy journey to get into the data science career path. But it is a rewarding and impactful one!

So please don’t give up and keep trying.

Good luck!


Please leave a comment if you have any questions. We’ll try our best to answer them.

Before you leave, don’t forget to sign up for the Just into Data newsletter below! Or connect with us on Twitter, Facebook.
So you won’t miss any new data science articles from us!

Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on facebook
Facebook
Share on email
Email
Lianne & Justin

Lianne & Justin

2 thoughts on “How to Learn Data Science Online: ALL You Need to Know<br /><div style='color:#7A7A7A;font-size: large;font-family:roboto;font-weight:400;'> Python, SQL, Machine Learning, Portfolios plus other Online resources</div>”

Leave a Comment

Your email address will not be published. Required fields are marked *

More recent articles

Scroll to Top
We use cookies to ensure you get the best experience on our website.  Learn more.