Decision trees are a very popular machine learning model. The beauty of it comes from its easy-to-understand visualization and fast deployment into production.
In this tutorial, you’ll discover a 3 step procedure for visualizing a decision tree in Python (for Windows/Mac/Linux).
Just follow along and plot your first decision tree!
Updated on 2020 April:
The scikit-learn (sklearn) library added a new function that allows us to plot the decision tree without GraphViz.
So we can use the plot_tree function with the matplotlib library.
If you are new to Python, Just into Data is now offering a FREE Python crash course: breaking into data science!
The course is beginner-friendly that covers the basics you need to start data science. Sign up/Learn More by clicking the link below!
Step #1: Download and Install Anaconda
Depending on your computer OS versions, choose the right Anaconda package to download. Anaconda is a common Python distribution that is usually allowed to download and install in large corporations.
Related article: How to Install/Setup Python and Prep for Data Science NOW
Check out step-by-step instructions on installing Python with Anaconda.
Step #2: Import Packages and Read the Data
First, let’s import some functions from scikit-learn, a Python machine learning library.
The sklearn needs to be version 0.21 or newer. If you just installed Anaconda, it should be good enough.
Next, let’s read in the data. Breast cancer data is used here as an example.
Step #3: Create the Decision Tree and Visualize it!
Within your version of Python, copy and run the below code to plot the decision tree. I prefer Jupyter Lab due to its interactive features.
Congratulations on your first decision tree plot! Hope you found this guide helpful.
Leave a comment if you have any questions.