π₯ Want to master the Perceptron algorithm?
In this tutorial, youβll learn step by step how to implement the Perceptron algorithm in Python using both NumPy and PyTorch.
β Perfect for beginners in machine learning and students who want to understand the core logic before diving deep into neural networks.
π What youβll learn in this video:
What is the Perceptron algorithm? π€
How Perceptron works (linearly separable data)
Code Perceptron from scratch using NumPy π
Build the same model using PyTorch π
Visualize decision boundaries
Test and improve your model performance
π Whether you're studying for ML interviews or starting deep learning, this tutorial builds your foundation!
π₯ Subscribe to NanoTechBoost for more AI, machine learning, and coding tutorials!
#Perceptron #MachineLearning #PythonTutorial #PyTorch #NumPy #DeepLearning #NanoTechBoost #AIProgramming #LearnMachineLearning #MLTutorial #codingforbeginners
#Perceptron
#MachineLearning
#PythonTutorial
#PyTorch
#NumPy
#DeepLearning
#NanoTechBoost
#AIProgramming
#LearnMachineLearning
#MLTutorial
In this tutorial, youβll learn step by step how to implement the Perceptron algorithm in Python using both NumPy and PyTorch.
β Perfect for beginners in machine learning and students who want to understand the core logic before diving deep into neural networks.
π What youβll learn in this video:
What is the Perceptron algorithm? π€
How Perceptron works (linearly separable data)
Code Perceptron from scratch using NumPy π
Build the same model using PyTorch π
Visualize decision boundaries
Test and improve your model performance
π Whether you're studying for ML interviews or starting deep learning, this tutorial builds your foundation!
π₯ Subscribe to NanoTechBoost for more AI, machine learning, and coding tutorials!
#Perceptron #MachineLearning #PythonTutorial #PyTorch #NumPy #DeepLearning #NanoTechBoost #AIProgramming #LearnMachineLearning #MLTutorial #codingforbeginners
#Perceptron
#MachineLearning
#PythonTutorial
#PyTorch
#NumPy
#DeepLearning
#NanoTechBoost
#AIProgramming
#LearnMachineLearning
#MLTutorial
Category
π
LearningTranscript
00:00hello everyone assalamualaikum welcome back in this video i am going to show you how we can
00:07implement a perceptron in python using numpy and pytorch so i will be using jupyter notebooks
00:14because i think for simpler code examples it's actually quite nice to use jupyter notebooks
00:21because i can execute one thing at a time and it will make things easier to explain
00:26however later on we have to think i think move on to python script files because when the deep
00:34learning models become larger and larger managing them and in jupyter notebooks can be a little bit
00:40like tedious and also dangerous well because it's easy to lose the overview and debugging is a little
00:47harder if you have separate cells and stuff like that and then the danger is that you might execute
00:54things out of order and also you want to import certain aspects from different files
01:01you don't want to have everything in one notebook because then it becomes really confusing and
01:06unmanageable but we will get to these parts later on so back to this topic of perceptron implementation in
01:15numpy and pytorch let me now walk through my numpy notebook
01:19now the pytorch notebook is actually very very similar which is one of the cool things about pytorch
01:28because it is very similar to numpy except there are some extra features that we will be using later
01:35on and i will have a lecture on pytorch where i will explain you these differences so for right now
01:42it doesn't make such a big difference whether we use numpy or pytorch notebook i will also show you a
01:48step-by-step comparison after i explain the numpy notebook so you will see that's actually difference
01:54difference between the two but let's do one thing at a time right all right let's include some headings
02:04difference and then we will get on with the code
02:11you
02:13you
02:15you
02:17you
02:21you
02:23you
02:34So, I'm importing some libraries for those who have not used notebooks.
02:46This one here, this command is for showing plots in the notebook.
02:50It's technically not necessary anymore but sometimes on some computers, plots will not
02:56be shown in a notebook if you don't include this line.
02:59And it doesn't hurt to include that line.
03:02So I always do this.
03:37I am not up to you.
03:39I am not up to you, so much more.
03:42This video is playlist.
03:44Please put your time?
03:45I'll show this video for the video.
03:47It's been a long time to be continued.
03:49It's been a long time.
03:51It's been a long time for the podcast to see some of the images.
04:25Okay, so I've typed some code.
04:48Now, let's just try to understand this and then we will move on.
04:52So here I'm just loading the dataset.
04:56So there is nothing really interesting happening here, but I will go through this step by step.
05:01So the dataset is like some toy data that I generated.
05:05Let me show you how that looks like.
05:10So here I have two feature columns.
05:12I didn't include any column header, but this is the first feature.
05:16This is the second feature value and this is the class label here.
05:20So there are zeros and ones and you can see the dataset is not shuffled.
05:25And actually it's helpful for learning if the dataset is shuffled.
05:28It will make the learning a little bit faster in Perceptron.
05:32So here I'm loading the data into NumPy.
05:34I can also use Pandas, but I thought it might be overkill because it's relatively simple data set.
05:42And then I'm assigning the features to X, which is a matrix, and then Y, which is a class label.
05:48I can maybe show you just how they look like.
05:52So typing this and so this is X, it's a matrix, and then this is Y, which is a class level array.
05:59So here it's shuffled because I actually executed this whole bunch of code.
06:08So you can then already guess what's going on here.
06:11So here I'm loading the data and then just printing some summary information.
06:16It's always, I think, a good idea to do that to get an idea of things.
06:23So we have 50 labels from class 0 and 50 labels from class 1.
06:28We have 100 data points in total and 2 feature columns and also 100 labels.
06:35So for example, here we can see these and these numbers match.
06:44And that's what we expect.
06:47Then here I'm shuffling the dataset so that they are not all in order.
06:52They are shuffled.
06:53And how I'm doing that is I have to shuffle X and Y together, right?
06:58So everything otherwise will be mixed up.
07:01Then the features won't correspond to the class labels anymore.
07:05So how am I doing that is I'm creating the shuffle index.
07:08So I can just show you creating this shuffle index here.
07:11And how it looks like this is just numbers from 0 to 99.
07:17So the 100 indices and then I'm actually shuffling these indices.
07:21So here I'm generating a random number generator.
07:26And then I'm shuffling these indices here.
07:30So you can look at these after shuffling.
07:32So after I execute that, you will see that they are now in random order.
07:39And then I'm using that to select the data points from X to Y.
07:44From X and Y.
07:45And then X and Y will be shuffled based on the shuffle index here.
07:51So that's how we shuffle.
07:53And then I will use the first 70 data points for training.
07:57And the last 30.
07:58So we have these 100 data points from 70 to 100.
08:02The last 30 data points will be for our test set.
08:06Later on, we will be seeing or using more convenient fields to load data in PyTorch.
08:10So there are some loading utilities here.
08:14I'm just doing it step by step.
08:17So you get a feeling of what's basically going on.
08:20And then I'm normalizing the data.
08:22So this is sometimes also called standardization.
08:26Here I'm standardizing the data such that after standardization,
08:29it will have mean 0 and unit variance.
08:32So I'm subtracting the mean and dividing by the standard deviation.
08:37So here I'm computing the mean and standard deviation of my sample.
08:41And then I'm subtracting the mean and dividing by standard deviation.
08:44And then both will be having mean 0 and standard deviation 1.
08:49So unit variance.
08:51So you can actually check that.
08:54Okay, very close to 0.
08:56So this is 17 digits after 0.
08:59So 0.000 or something up to 2.
09:04It's very small.
09:05It's almost identical to 0 essentially.
09:09And then for the standard deviation, it should be around 1.
09:12So yeah, it is around 1.
09:15So the data is standardized.
09:17Well, why am I doing that?
09:20It kind of speeds up training a little bit.
09:23It's like stabilizing the training for perceptron.
09:27It's not that necessary.
09:29But it is a good practice to do that for other types of optimization algorithms.
09:34Later on when we talk about stochastic gradient descent.
09:38So this is something like standardization that is usually recommended.
09:42The only types of machine learning models where this is really not that necessary is for tree-based models.
09:48But all other machine learning and deep learning models, I know they really can benefit from that.
09:56Especially stochastic gradient descent will just learn faster.
10:01Okay, now let's take a look at our data.
10:05Now let me write some code for that.
10:07So here, this is our training set, how it looks like.
10:20You can see it's around centered at 0.
10:24We have two classes, class 0, these circles here and then the square for class 1.
10:30So feature 1 and feature 2, that's our training set.
10:33And there should be 70 examples and then the remaining 30 examples in our test set down below.
10:45So what we want to do is what we want to train our model on the training set and then evaluate it on the test set.
10:51Okay, now let's implement our preceptron model.
10:58So yes, the preceptron code.
11:01Let's type this one and then I'll get back to you.
11:06Now you can see it's relatively short.
11:09I'm implementing it using a forward and backward method.
11:14And why I'm doing this is because that is also how things are done in PyTorch.
11:22And it will make things more familiar later on if I start using this happen.
11:26But let's start at the top.
11:28So I'm implementing it as a class here.
11:31And you should be, I think, familiar with Python classes.
11:35So here what I'm doing is I'm running the constructor.
11:39This is a special class method, a constructor.
11:41I'm giving it a number of features because that's what I need to know, the number of weights.
11:47And I'm using here the implementation where the weights and the bias are separate because that is more convenient.
11:53So I don't have to modify the feature vector.
11:56And what I'm doing here is I'm initializing the weight vector and the bias unit.
12:03The bias unit is just a single value.
12:05That's just one number.
12:06And the weights, the weight vector, it depends on the number of features.
12:10So I make this a row vector.
12:13So this is then equal to M, the number of features.
12:19So here I'm just setting up my weights and bias and setting them to zero.
12:24Later on, for certain algorithms, for stochastic gradient descent,
12:31it's better to initialize them to small random numbers here for the perceptron.
12:36It's not necessary.
12:39But for neural networks, it will be necessary later on.
12:42We will see that.
12:44Now in the forward method, I'm computing net inputs actually here in the linear.
12:49And then I'm computing the predictions.
12:53That's my threshold.
12:55So this is the net input.
12:57I'm calling it linear because later on, we will also see linear layers in PyTorch.
13:02They are called linear.
13:03And they are basically computing net input.
13:06So this is, you can see the dot product between the input vector and the weights.
13:11And then I'm adding the bias here.
13:13All right.
13:14Then here we have our threshold function.
13:17This threshold function is just using NumPy.
13:20Here's how this works is it's saying if linear.
13:25So if the net input is greater than zero, then output one, otherwise output zero.
13:31So it's our forward method.
13:34And here is our backward method.
13:37In the backward method, why am I calling it like that?
13:40It is for computing the errors.
13:43So usually when we have deeper neural networks, we will use something called backpropagation
13:48where we look at the outputs.
13:50And then based on the outputs, we adjust the inputs.
13:53So in that way, we run the forward method to produce the predictions.
13:57And then we compute the errors and then update.
14:00So it will become more clear when we have a deeper network where there are really like
14:06a backpropagation going on.
14:07So these are our two methods.
14:10Backward is computing the errors, which is the difference between true class labels and
14:14the predictions.
14:15And forward is used to get the predictions in the first place.
14:18So we implemented here the prediction that's going on here that we discussed in
14:23slides also.
14:26Prediction is equal to step A.
14:28And then step B is the backpropagation which gives us the errors.
14:34And now we have to put everything together.
14:37So I implemented this train method here.
14:41So this train method is basically the whole thing here in the slide as you can see.
14:47So for epoch in the number of epochs, so this is for every training epoch.
14:56And then for every training example, we perform the forward pass, the backward pass and update.
15:03Since backward is already doing all forward, we just call backward here.
15:09There's some reshape here going on as well.
15:12And that is because we are making the vector dimensions match.
15:14Otherwise, you will get some errors.
15:16So here, this will be one row and m columns.
15:20So it will be, I think, this is called a row vector because this is just one and multiple
15:25columns because it looks like a row.
15:27And here, we have this row vector and this has to have the same dimensions.
15:34I'm just making the same dimensions so you can compute everything nicely.
15:41Otherwise, you will find there will be dimension mismatch.
15:44So there's just a reshaping going on here.
15:46And then here, we perform the update.
15:48So again, I'm doing the reshape afterwards so that we get the original dimensions back.
15:53Because the weights here, see, we are matching the original dimensions.
15:58So we are just reshaping so we can add it to it.
16:01Otherwise, there will also be a dimension mismatch if this is just a single number.
16:06Or if there is 1 times m vector instead of m times 1 vector.
16:12And then also, we update the bias.
16:14So the bias, it's just updating it by the errors.
16:19And then next thing is evaluating it.
16:21So evaluating the performance here, I'm just doing the forward pass and then compute the
16:26accuracy.
16:27The accuracy is computed by checking how many of the predictions match the true label and
16:33then divide by the data set size.
16:35So it will be giving me a number between 0 and 1.
16:38So this is my perceptron algorithm.
16:41Sorry, perceptron class that we just implemented.
16:45And now I'm going to train it.
16:51So yeah, initializing it.
17:09so yeah initializing it and then training it for five epochs and then I will print the model
17:26parameters afterwards you can see that it's pretty fast so we get the weights the weight
17:32vector and then the bias here and now we can evaluate it like compute the accuracy let's do that
17:53so the test set accuracy is 93% it's not quite 100% on the training set it should actually have
18:01100% right because it's linearly separable this data set and it should converge if it's linearly
18:07separable and now everything is classified correctly test set is not as good as you can
18:15see because P may overfit so let's take a look at the decision boundaries okay so here is some
18:22complicated code to compute the decision boundaries it's actually not that compelling complicated it's
18:30what I did is just rearrange things here so what we have is if you think about it the decision
18:37boundary is greater or equal to zero so everything hinges upon zero so if we have our computations
18:42it's x0 times w0 right plus this and put x1 times y1 then plus the bias so this hinges upon zero what
18:55we are doing here is we are taking one fixed number so let's say we are taking for feature zero the minus
19:02value of minus two so we are going to the left hand side here and then we want to find so this is
19:10for x0 so x0 is the x-axis and x1 is the y-axis so we take minus two here and we want to find the
19:22corresponding x1 value so this is x0 at minus two so this is what we do what is the corresponding x1
19:30value that we want to find so we have to rearrange this solving for x1 right so what we do is we move
19:37this stuff and this to the right hand side so we have x1 value here so I get the x1 value I'm calling
19:45it min and the reason is because it's the left hand side then I'm doing the same thing for the right
19:51hand side so I'm doing it for the right hand side here I'm get again settings x0 to some value I'm
19:59setting it to 2 here and then I'm finding the corresponding y-axis value which is the x1 max
20:05here so I'm doing the same thing just rearranging now I'm just using a max value and then I'm
20:12connecting these lines and that's how I get this I've done this here for the left hand side for the
20:18training set and right hand side for the test set so one is the training and one is the test set
20:23now about the decision boundary it doesn't change actually because it's the same for training and
20:30test set just the data set is different because the decision boundary only depends on w right so we are
20:36providing these these are fixed values we are providing them and the decision boundary only depends
20:41on the model parameters so the decision boundary does not change here this is for the training set
20:47and this is for the test set here on the right hand side and you can see in the training set it
20:53perfectly classifies these examples and on the right hand side this is just this is the test that
20:59you can see it's maybe overfitting some of the data too closely I mean there's no other way actually
21:05but it happens that here is this case it doesn't perform well in this case actually there's a different
21:13way if you would fit the boundary like this something like this more straight then you may get these
21:22right but it just happens that these data points are not in the training set so the model doesn't
21:27know that it should shift boundary more to the right here so in this way the model does actually
21:33a good job on the training set but on the test set it's not so good so in that way it's actually
21:39this term is called overfitting because it fits the training data a little bit too closely and
21:45doesn't generalize so well on the test set so this is how the numpy code works
21:51and now for the pytaj code I will not be writing this down the complete code because
21:58to be honest everything is the same most of the things are same I will only talk about the differences
22:06so I don't need to talk about this much I think because the code is essentially the same
22:12all things are same except for the class okay so there are some differences though let's talk about
22:19only the differences so I've prepared a slide for that the differences that I feel we need to discuss
22:27and also know that we will be talking about this in more detail when I talk about pytaj code
22:34in the coming videos so here I highlighted the differences on the left hand side this is the
22:42number implementation and on the right hand side is the pytaj implementation so the on the left hand
22:47side and this is the same code that we just wrote and you can see that there are not that many
22:55differences so the way the weights and biases are implemented here we are using numpy zeros here
23:00we are using torch zeros here we had a bit more we should be a bit more specific instead of saying
23:08numpy float we say torch float 32 it's a 32 bit I have this device here because of the way I implemented
23:17things that would also run on the GPU if there is a GPU available if no GPU is available it will use the CPU
23:25so there's this device here which is provided optionally it's not necessary though and what's a
23:32little bit more is it's a little bit different so here I mean there are multiple ways you can write
23:38that you can also use a plus function to be honest I just happen to use torch dot add but I could have
23:45also used a plus and then the mm is for matrix multiplication in numpy we really write dot
23:54in torch we write pipe in write pytaj we write like mm for matrix multiplication but in pytaj dot
24:05function can also do matrix multiplication so in the way it is kind of like the same thing it's it just
24:11looks a little bit different the wear function in pytaj is a bit more I would say involved not that much
24:18more involved but it has to have placeholders such that such as having a one and zero it needs to have
24:26a tensor here so I'm creating this as placeholders here and providing them but it's the same concept
24:32and then what's a little bit different here is the last part so instead of numpy dot sum it's torch dot
24:38sum here I'm converting it to the float because otherwise it would be an integer and then an integer
24:43divided by some value will give an integer what we want to have is a float because it's a fraction
24:49between zero and one so if you don't do that you will get back an integer and that's not correct
24:54because the value of accuracy is between zero and one right which is why I am casting this to float
25:00but again the pytaj code will be covered in more detail later so that's that's what I wanted to say
25:07about the code if you have any questions feel free to drop a comment I hope you understood how
25:12to code this perceptron algorithm using both numpy and pytaj