Creating a Simple Machine Learning Model with scikit-learn in Python

Introduction

Welcome to PythonSage! In this post, we'll introduce you to the exciting world of machine learning. We'll cover basic concepts and show you how to build a simple machine learning model using scikit-learn, a popular Python library for machine learning. By the end of this tutorial, you'll have a foundational understanding of machine learning and a working model you can experiment with.

create machine learning model using scikit-learn

What You Will Learn

  • Basic machine learning concepts
  • Setting up your environment
  • Loading and preparing data
  • Building and evaluating a simple machine learning model


Setting Up Your Environment

First, you'll need to install scikit-learn and other necessary libraries. You can do this using pip:

pip install scikit-learn pandas


Basic Machine Learning Concepts

Machine learning involves teaching computers to make predictions or decisions based on data. Here are some key concepts:

  • Features: The input variables used to make predictions.
  • Target: The output variable or the value we want to predict.
  • Training Data: The dataset used to train the model.
  • Model: An algorithm that learns from the training data and makes predictions.


Loading and Preparing Data

For this example, we'll use the famous Iris dataset, which is included in scikit-learn. This dataset contains measurements of iris flowers and their species.

from sklearn.datasets import load_iris

import pandas as pd

 

# Load the Iris dataset

iris = load_iris()

 

# Create a DataFrame

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df['target'] = iris.target

 

# Display the first few rows

print(df.head())


Building a Simple Machine Learning Model

We'll build a simple classifier to predict the species of an iris flower based on its measurements. We'll use a Decision Tree classifier for this example.

Splitting the Data

First, we'll split the data into training and testing sets. The training set is used to train the model, and the testing set is used to evaluate its performance.

from sklearn.model_selection import train_test_split

 

# Split the data into training and testing sets

X = df.drop(columns=['target'])

y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Training the Model

Next, we'll train the Decision Tree classifier using the training data.

from sklearn.tree import DecisionTreeClassifier

 

# Create and train the model

model = DecisionTreeClassifier()

model.fit(X_train, y_train)


Evaluating the Model

Finally, we'll evaluate the model's performance using the testing data.

from sklearn.metrics import accuracy_score

 

# Make predictions

y_pred = model.predict(X_test)

 

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy * 100:.2f}%")


Explanation of the Code

Loading and Preparing Data

  • Loading the Iris Dataset: We use load_iris() to load the Iris dataset, which is a built-in dataset in scikit-learn.
  • Creating a DataFrame: We create a DataFrame from the data and add the target column.


Splitting the Data

  • Splitting the Data: We use train_test_split to split the data into training and testing sets. We use 80% of the data for training and 20% for testing.


Training the Model

  • Creating the Model: We create an instance of DecisionTreeClassifier.
  • Training the Model: We train the model using the fit method, passing in the training data.


Evaluating the Model

  • Making Predictions: We use the predict method to make predictions on the testing data.
  • Calculating Accuracy: We use accuracy_score to calculate the accuracy of the model by comparing the predicted values to the actual values.


Conclusion

Congratulations! You've built your first machine learning model using scikit-learn. This basic example is a great starting point for exploring more advanced machine learning techniques and algorithms. Keep experimenting and exploring the vast possibilities of machine learning!

External Links to Learn More:

scikit-learn Documentation

Pandas Documentation

Python Official Website

For more Python tutorials and guides, visit PythonSage.

Abdullah Cheema

I’m Abdullah, a software engineer from Pakistan now in Saudi Arabia, eager to share my Python programming journey from basics to advanced techniques.

Post a Comment

Previous Post Next Post