Perceptron Algorithm from Scratch in Python 🔥 NumPy & PyTorch Guide!

Multi Needs

🔥 Want to master the Perceptron algorithm? In this tutorial, you’ll learn step by step how to implement the Perceptron algorithm in Python using both NumPy and PyTorch.  ✅ Perfect for beginners in machine learning and students who want to understand the core logic before diving deep into neural networks.  🚀 What you’ll learn in this video:  What is the Perceptron algorithm? 🤔  How Perceptron works (linearly separable data)  Code Perceptron from scratch using NumPy 🐍  Build the same model using PyTorch 🚀  Visualize decision boundaries  Test and improve your model performance  👉 Whether you're studying for ML interviews or starting deep learning, this tutorial builds your foundation!  🔥 Subscribe to NanoTechBoost for more AI, machine learning, and coding tutorials!  #Perceptron #MachineLearning #PythonTutorial #PyTorch #NumPy #DeepLearning #NanoTechBoost #AIProgramming #LearnMachineLearning #MLTutorial #codingforbeginners  #Perceptron   #MachineLearning   #PythonTutorial   #PyTorch   #NumPy   #DeepLearning   #NanoTechBoost   #AIProgramming   #LearnMachineLearning   #MLTutorial

Transcript

00:00hello everyone assalamualaikum welcome back in this video i am going to show you how we can

00:07implement a perceptron in python using numpy and pytorch so i will be using jupyter notebooks

00:14because i think for simpler code examples it's actually quite nice to use jupyter notebooks

00:21because i can execute one thing at a time and it will make things easier to explain

00:26however later on we have to think i think move on to python script files because when the deep

00:34learning models become larger and larger managing them and in jupyter notebooks can be a little bit

00:40like tedious and also dangerous well because it's easy to lose the overview and debugging is a little

00:47harder if you have separate cells and stuff like that and then the danger is that you might execute

00:54things out of order and also you want to import certain aspects from different files

01:01you don't want to have everything in one notebook because then it becomes really confusing and

01:06unmanageable but we will get to these parts later on so back to this topic of perceptron implementation in

01:15numpy and pytorch let me now walk through my numpy notebook

01:19now the pytorch notebook is actually very very similar which is one of the cool things about pytorch

01:28because it is very similar to numpy except there are some extra features that we will be using later

01:35on and i will have a lecture on pytorch where i will explain you these differences so for right now

01:42it doesn't make such a big difference whether we use numpy or pytorch notebook i will also show you a

01:48step-by-step comparison after i explain the numpy notebook so you will see that's actually difference

01:54difference between the two but let's do one thing at a time right all right let's include some headings

02:04difference and then we will get on with the code

02:11you

02:13you

02:15you

02:17you

02:21you

02:23you

02:34So, I'm importing some libraries for those who have not used notebooks.

02:46This one here, this command is for showing plots in the notebook.

02:50It's technically not necessary anymore but sometimes on some computers, plots will not

02:56be shown in a notebook if you don't include this line.

02:59And it doesn't hurt to include that line.

03:02So I always do this.

03:37I am not up to you.

03:39I am not up to you, so much more.

03:42This video is playlist.

03:44Please put your time?

03:45I'll show this video for the video.

03:47It's been a long time to be continued.

03:49It's been a long time.

03:51It's been a long time for the podcast to see some of the images.

04:25Okay, so I've typed some code.

04:48Now, let's just try to understand this and then we will move on.

04:52So here I'm just loading the dataset.

04:56So there is nothing really interesting happening here, but I will go through this step by step.

05:01So the dataset is like some toy data that I generated.

05:05Let me show you how that looks like.

05:10So here I have two feature columns.

05:12I didn't include any column header, but this is the first feature.

05:16This is the second feature value and this is the class label here.

05:20So there are zeros and ones and you can see the dataset is not shuffled.

05:25And actually it's helpful for learning if the dataset is shuffled.

05:28It will make the learning a little bit faster in Perceptron.

05:32So here I'm loading the data into NumPy.

05:34I can also use Pandas, but I thought it might be overkill because it's relatively simple data set.

05:42And then I'm assigning the features to X, which is a matrix, and then Y, which is a class label.

05:48I can maybe show you just how they look like.

05:52So typing this and so this is X, it's a matrix, and then this is Y, which is a class level array.

05:59So here it's shuffled because I actually executed this whole bunch of code.

06:08So you can then already guess what's going on here.

06:11So here I'm loading the data and then just printing some summary information.

06:16It's always, I think, a good idea to do that to get an idea of things.

06:23So we have 50 labels from class 0 and 50 labels from class 1.

06:28We have 100 data points in total and 2 feature columns and also 100 labels.

06:35So for example, here we can see these and these numbers match.

06:44And that's what we expect.

06:47Then here I'm shuffling the dataset so that they are not all in order.

06:52They are shuffled.

06:53And how I'm doing that is I have to shuffle X and Y together, right?

06:58So everything otherwise will be mixed up.

07:01Then the features won't correspond to the class labels anymore.

07:05So how am I doing that is I'm creating the shuffle index.

07:08So I can just show you creating this shuffle index here.

07:11And how it looks like this is just numbers from 0 to 99.

07:17So the 100 indices and then I'm actually shuffling these indices.

07:21So here I'm generating a random number generator.

07:26And then I'm shuffling these indices here.

07:30So you can look at these after shuffling.

07:32So after I execute that, you will see that they are now in random order.

07:39And then I'm using that to select the data points from X to Y.

07:44From X and Y.

07:45And then X and Y will be shuffled based on the shuffle index here.

07:51So that's how we shuffle.

07:53And then I will use the first 70 data points for training.

07:57And the last 30.

07:58So we have these 100 data points from 70 to 100.

08:02The last 30 data points will be for our test set.

08:06Later on, we will be seeing or using more convenient fields to load data in PyTorch.

08:10So there are some loading utilities here.

08:14I'm just doing it step by step.

08:17So you get a feeling of what's basically going on.

08:20And then I'm normalizing the data.

08:22So this is sometimes also called standardization.

08:26Here I'm standardizing the data such that after standardization,

08:29it will have mean 0 and unit variance.

08:32So I'm subtracting the mean and dividing by the standard deviation.

08:37So here I'm computing the mean and standard deviation of my sample.

08:41And then I'm subtracting the mean and dividing by standard deviation.

08:44And then both will be having mean 0 and standard deviation 1.

08:49So unit variance.

08:51So you can actually check that.

08:54Okay, very close to 0.

08:56So this is 17 digits after 0.

08:59So 0.000 or something up to 2.

09:04It's very small.

09:05It's almost identical to 0 essentially.

09:09And then for the standard deviation, it should be around 1.

09:12So yeah, it is around 1.

09:15So the data is standardized.

09:17Well, why am I doing that?

09:20It kind of speeds up training a little bit.

09:23It's like stabilizing the training for perceptron.

09:27It's not that necessary.

09:29But it is a good practice to do that for other types of optimization algorithms.

09:34Later on when we talk about stochastic gradient descent.

09:38So this is something like standardization that is usually recommended.

09:42The only types of machine learning models where this is really not that necessary is for tree-based models.

09:48But all other machine learning and deep learning models, I know they really can benefit from that.

09:56Especially stochastic gradient descent will just learn faster.

10:01Okay, now let's take a look at our data.

10:05Now let me write some code for that.

10:07So here, this is our training set, how it looks like.

10:20You can see it's around centered at 0.

10:24We have two classes, class 0, these circles here and then the square for class 1.

10:30So feature 1 and feature 2, that's our training set.

10:33And there should be 70 examples and then the remaining 30 examples in our test set down below.

10:45So what we want to do is what we want to train our model on the training set and then evaluate it on the test set.

10:51Okay, now let's implement our preceptron model.

10:58So yes, the preceptron code.

11:01Let's type this one and then I'll get back to you.

11:06Now you can see it's relatively short.

11:09I'm implementing it using a forward and backward method.

11:14And why I'm doing this is because that is also how things are done in PyTorch.

11:22And it will make things more familiar later on if I start using this happen.

11:26But let's start at the top.

11:28So I'm implementing it as a class here.

11:31And you should be, I think, familiar with Python classes.

11:35So here what I'm doing is I'm running the constructor.

11:39This is a special class method, a constructor.

11:41I'm giving it a number of features because that's what I need to know, the number of weights.

11:47And I'm using here the implementation where the weights and the bias are separate because that is more convenient.

11:53So I don't have to modify the feature vector.

11:56And what I'm doing here is I'm initializing the weight vector and the bias unit.

12:03The bias unit is just a single value.

12:05That's just one number.

12:06And the weights, the weight vector, it depends on the number of features.

12:10So I make this a row vector.

12:13So this is then equal to M, the number of features.

12:19So here I'm just setting up my weights and bias and setting them to zero.

12:24Later on, for certain algorithms, for stochastic gradient descent,

12:31it's better to initialize them to small random numbers here for the perceptron.

12:36It's not necessary.

12:39But for neural networks, it will be necessary later on.

12:42We will see that.

12:44Now in the forward method, I'm computing net inputs actually here in the linear.

12:49And then I'm computing the predictions.

12:53That's my threshold.

12:55So this is the net input.

12:57I'm calling it linear because later on, we will also see linear layers in PyTorch.

13:02They are called linear.

13:03And they are basically computing net input.

13:06So this is, you can see the dot product between the input vector and the weights.

13:11And then I'm adding the bias here.

13:13All right.

13:14Then here we have our threshold function.

13:17This threshold function is just using NumPy.

13:20Here's how this works is it's saying if linear.

13:25So if the net input is greater than zero, then output one, otherwise output zero.

13:31So it's our forward method.

13:34And here is our backward method.

13:37In the backward method, why am I calling it like that?

13:40It is for computing the errors.

13:43So usually when we have deeper neural networks, we will use something called backpropagation

13:48where we look at the outputs.

13:50And then based on the outputs, we adjust the inputs.

13:53So in that way, we run the forward method to produce the predictions.

13:57And then we compute the errors and then update.

14:00So it will become more clear when we have a deeper network where there are really like

14:06a backpropagation going on.

14:07So these are our two methods.

14:10Backward is computing the errors, which is the difference between true class labels and

14:14the predictions.

14:15And forward is used to get the predictions in the first place.

14:18So we implemented here the prediction that's going on here that we discussed in

14:23slides also.

14:26Prediction is equal to step A.

14:28And then step B is the backpropagation which gives us the errors.

14:34And now we have to put everything together.

14:37So I implemented this train method here.

14:41So this train method is basically the whole thing here in the slide as you can see.

14:47So for epoch in the number of epochs, so this is for every training epoch.

14:56And then for every training example, we perform the forward pass, the backward pass and update.

15:03Since backward is already doing all forward, we just call backward here.

15:09There's some reshape here going on as well.

15:12And that is because we are making the vector dimensions match.

15:14Otherwise, you will get some errors.

15:16So here, this will be one row and m columns.

15:20So it will be, I think, this is called a row vector because this is just one and multiple

15:25columns because it looks like a row.

15:27And here, we have this row vector and this has to have the same dimensions.

15:34I'm just making the same dimensions so you can compute everything nicely.

15:41Otherwise, you will find there will be dimension mismatch.

15:44So there's just a reshaping going on here.

15:46And then here, we perform the update.

15:48So again, I'm doing the reshape afterwards so that we get the original dimensions back.

15:53Because the weights here, see, we are matching the original dimensions.

15:58So we are just reshaping so we can add it to it.

16:01Otherwise, there will also be a dimension mismatch if this is just a single number.

16:06Or if there is 1 times m vector instead of m times 1 vector.

16:12And then also, we update the bias.

16:14So the bias, it's just updating it by the errors.

16:19And then next thing is evaluating it.

16:21So evaluating the performance here, I'm just doing the forward pass and then compute the

16:26accuracy.

16:27The accuracy is computed by checking how many of the predictions match the true label and

16:33then divide by the data set size.

16:35So it will be giving me a number between 0 and 1.

16:38So this is my perceptron algorithm.

16:41Sorry, perceptron class that we just implemented.

16:45And now I'm going to train it.

16:51So yeah, initializing it.

17:09so yeah initializing it and then training it for five epochs and then I will print the model

17:26parameters afterwards you can see that it's pretty fast so we get the weights the weight

17:32vector and then the bias here and now we can evaluate it like compute the accuracy let's do that

17:53so the test set accuracy is 93% it's not quite 100% on the training set it should actually have

18:01100% right because it's linearly separable this data set and it should converge if it's linearly

18:07separable and now everything is classified correctly test set is not as good as you can

18:15see because P may overfit so let's take a look at the decision boundaries okay so here is some

18:22complicated code to compute the decision boundaries it's actually not that compelling complicated it's

18:30what I did is just rearrange things here so what we have is if you think about it the decision

18:37boundary is greater or equal to zero so everything hinges upon zero so if we have our computations

18:42it's x0 times w0 right plus this and put x1 times y1 then plus the bias so this hinges upon zero what

18:55we are doing here is we are taking one fixed number so let's say we are taking for feature zero the minus

19:02value of minus two so we are going to the left hand side here and then we want to find so this is

19:10for x0 so x0 is the x-axis and x1 is the y-axis so we take minus two here and we want to find the

19:22corresponding x1 value so this is x0 at minus two so this is what we do what is the corresponding x1

19:30value that we want to find so we have to rearrange this solving for x1 right so what we do is we move

19:37this stuff and this to the right hand side so we have x1 value here so I get the x1 value I'm calling

19:45it min and the reason is because it's the left hand side then I'm doing the same thing for the right

19:51hand side so I'm doing it for the right hand side here I'm get again settings x0 to some value I'm

19:59setting it to 2 here and then I'm finding the corresponding y-axis value which is the x1 max

20:05here so I'm doing the same thing just rearranging now I'm just using a max value and then I'm

20:12connecting these lines and that's how I get this I've done this here for the left hand side for the

20:18training set and right hand side for the test set so one is the training and one is the test set

20:23now about the decision boundary it doesn't change actually because it's the same for training and

20:30test set just the data set is different because the decision boundary only depends on w right so we are

20:36providing these these are fixed values we are providing them and the decision boundary only depends

20:41on the model parameters so the decision boundary does not change here this is for the training set

20:47and this is for the test set here on the right hand side and you can see in the training set it

20:53perfectly classifies these examples and on the right hand side this is just this is the test that

20:59you can see it's maybe overfitting some of the data too closely I mean there's no other way actually

21:05but it happens that here is this case it doesn't perform well in this case actually there's a different

21:13way if you would fit the boundary like this something like this more straight then you may get these

21:22right but it just happens that these data points are not in the training set so the model doesn't

21:27know that it should shift boundary more to the right here so in this way the model does actually

21:33a good job on the training set but on the test set it's not so good so in that way it's actually

21:39this term is called overfitting because it fits the training data a little bit too closely and

21:45doesn't generalize so well on the test set so this is how the numpy code works

21:51and now for the pytaj code I will not be writing this down the complete code because

21:58to be honest everything is the same most of the things are same I will only talk about the differences

22:06so I don't need to talk about this much I think because the code is essentially the same

22:12all things are same except for the class okay so there are some differences though let's talk about

22:19only the differences so I've prepared a slide for that the differences that I feel we need to discuss

22:27and also know that we will be talking about this in more detail when I talk about pytaj code

22:34in the coming videos so here I highlighted the differences on the left hand side this is the

22:42number implementation and on the right hand side is the pytaj implementation so the on the left hand

22:47side and this is the same code that we just wrote and you can see that there are not that many

22:55differences so the way the weights and biases are implemented here we are using numpy zeros here

23:00we are using torch zeros here we had a bit more we should be a bit more specific instead of saying

23:08numpy float we say torch float 32 it's a 32 bit I have this device here because of the way I implemented

23:17things that would also run on the GPU if there is a GPU available if no GPU is available it will use the CPU

23:25so there's this device here which is provided optionally it's not necessary though and what's a

23:32little bit more is it's a little bit different so here I mean there are multiple ways you can write

23:38that you can also use a plus function to be honest I just happen to use torch dot add but I could have

23:45also used a plus and then the mm is for matrix multiplication in numpy we really write dot

23:54in torch we write pipe in write pytaj we write like mm for matrix multiplication but in pytaj dot

24:05function can also do matrix multiplication so in the way it is kind of like the same thing it's it just

24:11looks a little bit different the wear function in pytaj is a bit more I would say involved not that much

24:18more involved but it has to have placeholders such that such as having a one and zero it needs to have

24:26a tensor here so I'm creating this as placeholders here and providing them but it's the same concept

24:32and then what's a little bit different here is the last part so instead of numpy dot sum it's torch dot

24:38sum here I'm converting it to the float because otherwise it would be an integer and then an integer

24:43divided by some value will give an integer what we want to have is a float because it's a fraction

24:49between zero and one so if you don't do that you will get back an integer and that's not correct

24:54because the value of accuracy is between zero and one right which is why I am casting this to float

25:00but again the pytaj code will be covered in more detail later so that's that's what I wanted to say

25:07about the code if you have any questions feel free to drop a comment I hope you understood how

25:12to code this perceptron algorithm using both numpy and pytaj

Category

Transcript

Recommended