How to Install/Setup Python and Prep for Data Science NOW
 Step-by-Step process to be ready for Data Science, Machine Learning, Deep Learning

Lianne & Justin

Lianne & Justin

Share on twitter
Share on linkedin
Share on facebook
Share on email
just into data python set up
Source: Pexels

Want to kick-start data science today?

Want to build a machine learning or deep learning model?

In this guide, we’ll walk you through the step-by-step process to set up the Python environment.

You’ll learn:

  • How to install Python with Anaconda distribution.
  • How to use Python through tools such as Jupyter Notebook.
  • What are the Python packages that help with the end-to-end process (from analyzing data to deploying models).

Note: This guide demonstrates the Windows environment. But the procedures on Mac and Linux are similar and referenced as well.
To keep it simplified, we also leave out cloud computing.

Let’s get started! All you need is your computer.


Step #1: Install the Platform: Anaconda | Python

What is Anaconda?

It is a popular Python data science platform that is:

  • open-source.
  • packaged with data science and machine learning libraries, packages, and tools.
  • suitable for Windows, Mac, and Linux.
  • used by more than 19 million users.

So to keep your starting process of data science simple, use Anaconda.

Let’s go to the website below and click “Download” for Python 3.x version.

Note: Python 2 is outdated and will not be supported soon.

Upon downloading, you can click the file and follow the instructions to install Anaconda. The whole process takes less than 5 minutes.

anaconda install setup

Now you have Python and some packages on your computer.

Next, let’s choose the interface to manage your platform:

  • command line (conda) or
  • desktop GUI (Anaconda Navigator)

GUI is more visual, while the command line is more convenient. We recommend the command line.

For Windows users, from the “Type here to search” box, search for “Anaconda”.

To access the command line, open “Anaconda Prompt”; to access the GUI, open “Anaconda Navigator”.

anaconda prompt or navigator

Note for Mac and Linux users: instructions can be found here.

Step #2: Create an Anaconda Environment – Optional

Python is a powerful language that can be used in many ways.

If you are planning on using Anaconda for non-data science purposes as well, it’s best practice to set up the environment.
Otherwise, please skip to the next step.

Creating different environments is an effective way to organize packages for different projects.

By following either the command line or GUI instructions, you can create an environment called “ds” (Data Science).

To activate the environment in Windows using the command line, enter the below code.

You’ll see the “base” change to the “ds” environment.

activate environment on conda

To activate the environment in Windows using Navigator, select the environment in the interface as below.

select environment anaconda navigator

Note for Mac and Linux users: instructions can be found here.

Step #3: List Must-Have Packages for Data Science

To make our future lives easier, let’s also install some popular packages for Data Science.

What are the main packages?

Fundamental Packages

  • SciPy: an ecosystem of software for mathematics, science, and engineering.
  • NumPy: fundamental package for scientific computing.
  • Pandas: a tool for data analysis and manipulation.
  • Flask: web application framework that helps to deploy models.

Machine Learning Packages

  • Scikit-Learn: tools for predictive data analysis, including classification, regression, clustering, model selection, etc.
  • NLTK: a platform to work with natural language data.

Deep Learning Packages

  • Pytorch: a framework for machine learning with functions for deep learning.
  • Tensorflow: a platform for machine learning with functions designed for deep learning as well.

Visualization Packages

  • Matplotlib: a 2D plotting library that produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • Plotly: tools for graphing libraries and data visualization.
  • Seaborn: a high-level interface for drawing attractive and informative statistical graphics based on Matplotlib.

These packages should be a good starting point. You can install other packages as it goes.

Let’s move on to the next step to install them.

Step #4: Install New Packages

As you may recall, there are two interfaces of managing our Anaconda platform.

We can manage packages with both interfaces as well.

Don’t forget to install them under the “ds” environment if you set it up.

You may use the below code to install all the packages under the “ds” environment.

During the process, there will be prompt to ask permission, enter “y” to allow installation.

Take a sip of your coffee. The process takes a few minutes.

Please follow the instructions to install the packages.

Step #5: Install Application for Python: Jupyter Notebook

In this step, we choose the application to use for Python.

Let’s install Jupyter Notebook because it’s more interactive. You may also choose JupyterLab since it’s based on Jupyter Notebook.

If you are using the command line, enter the below code to install it first.

Then you can enter “jupyter notebook” to launch it. A new window will pop up with the application.

jupyter notebook launch conda

If you are using GUI, go to the Home page and select “Install” and then “Launch” Jupyter Notebook. A new window will pop up with the application.

jupyter notebook launch navigator

Related article:  What are the In-Demand Skills for Data Scientists in 2020

Step #6: Test the Environment

Now we have set up the Python environment for data science.

Let’s test it out!

Go to the Jupyter Notebook that we have launched in the previous step.

First, let’s create a new folder to save the code – click “New” and choose “Folder”.

jupyter notebook new file

You’ll see a folder named “Untitled Folder” among the list of existing folders. You may toggle it and then click “Rename” to change its name.

Then, we can create a new notebook under this folder:

  • click the folder name to open the folder.
  • click “New” and then “Python 3”.

A new window will pop up as below.

We can rename the notebook by clicking “File” and then “Rename”.

jupyter notebook new notebook

Next, copy the example code from matplotlib below into the empty cell within the notebook.

Then press “Shift” + “Enter” to run the code.

Congratulations!

You plotted the first graph with Python.

jupyter notebook chart example

Further Reading: conda command Cheat Sheet


Thank you for reading.

What will be your first data science project?

Let us know by leaving a comment below! Or take a look at applications on our blog to find ideas.

Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on facebook
Facebook
Share on email
Email
Lianne & Justin

Lianne & Justin

Leave a Comment

Your email address will not be published. Required fields are marked *

More recent articles

Scroll to Top
We use cookies to ensure you get the best experience on our website.  Learn more.