Logistic Regression & K-NN Algorithm

  • 2 days ago
DEAR SIR/MADAM, I NEED THIS CHANNEL SPONSORSHIP FROM YOU TOO IF POSSIBLE https://www.dailymotion.com/mokkamodina19

DEAR SIR/MADAM, PLEASE ACCEPT AND PERMISSION DAILY MOTION, VIMEO AND RUMBLE.COM IN BANGLADESH.
Mohammad Ali Ashraf, [6/29/2024 7:07 PM]
https://ea.ebs.bankofchina.com/contactUs_en.html

ashrafm703@gmail.com IS MY EMAIL AND
DEAR SIR/MADAM, I NEED SUPPORT AND LOAN FROM CHINESE PRESIDENT AND GOVT.

DEAR SIR/MADAM, I WANT LOAN FROM YOU TO LEARN AND EARN MONEY BY THE SUPER AFFILIATE MARKETING OF JOHN CRESTANI.

DEAR SIR/MADAM.I AM FROM BANGLADESH MY PAYONEER ID IS mokkamodina19@gmail.com , IN ANY BANK ACCOUNTS OF BANGLADESH IS NOT SAFE FOR MY MONEY YOU KNOW THAT. WILL PAYONEER BANK ACCOUNTS OF MINE WILL BE SAFE FOR MY MONEY.? ALL OF MY ASSETS OR PROPERTIES ARE SEIZED BY FAKE WAYS IN BANGLADESH. I WANT TO LOAN FROM YOU TO BUY READYMADE HOUSES OR HOMES FOR ME. ALSO NEED LOANS FROM YOU WORK OR JOB VISA FOR USA/CANADA. ALSO NEED LOAN FROM YOU TO SETTLE AND STUDY IN THE USA/CANADA.. I WANT TO HANDOVER ALL OF MY ASSETS OR PROPERTIES WHICH ARE SEIZED BY FAKE WAYS TO THE USA/CANADA UNIVERSITIES.

https://www.aiddata.org/how-china-lends

mokkamodina19@gmail.com is my payoneer id. i want my youtube channel sponsorship from you.https://www.youtube.com/channel/UCrNbtmpYbyYMgmzD9aPtGMQ
and also help me to monetize my tiktok https://www.tiktok.com/@mohammadaliashr3


AFTER MONETIZATION I WANT THIS LINK SPONSORSHIP FROM YOU TOO. https://rutube.ru/channel/37901720/ ,,,,mokkamodina19@gmail.com is my payoneer id.
Transcript
00:00Hello everyone, welcome to the chapter 4, where we will be studying about different
00:16algorithms used for classifications and an introduction about decision tree.
00:21So, first we will study the logistic regression and KNN algorithm which will be present in
00:27this session. Moving on, coming to the logistic regression, it comes under supervised learning
00:34algorithm and it is used for predicting the categorical dependent variable using a given
00:41set of independent variable. Now, the logistic regression here predicts the output based
00:47on the categorical dependent variable and therefore, the outcomes will definitely be
00:53categorical or discrete value which is either yes or no or 0, 1, true, false, etcetera.
01:02Now, instead of giving the exact value of 0 and 1, here it gives the probabilistic value
01:09that ranges between the value 0 and 1. In logistic regression, instead of using a regression
01:17line, a line that fits the regression line, we are using an S-shaped logistic function
01:25which predicts the two maximum values that is from 0 to 1.
01:30Now, the curve of the logistic function here indicates the likelihood of something which
01:37is like for example, the cancerous cell, whether the cell is cancerous or not, likelihood towards
01:44something is mainly specified here in logistic regression. Now, here it also provides the
01:54probabilities and classifies new data using continuous and discrete data sets. Now, you
02:01can see this, like I have said, it provides an S-shaped curve. So, here this is the S-shaped
02:10curve and you can find this is called the S-curve and there is a threshold value which
02:15is being specified and we will get the maximum and the minimum value which ranges from 0
02:22to 1. It is used to classify the observations using different types of data and we can easily
02:30determine the most effective variables which is used for this classification.
02:35Now, we are coming to the logistic function which is also called a sigmoid function. Here
02:42the sigmoid function is a mathematical function which is used to map the predicted probabilities
02:49or the predictive output which we have already obtained from the independent variable. Here
02:55the sigmoid function uses those probabilistic value and map it. It maps any real value into
03:05another value within the range of 0 to 1. Now, the value of the regression, logistic
03:11regression, it must be between 0 to 1. You should always see to it that the value comes
03:17between the range 0 to 1 which cannot go beyond this limit. So, that is why there is a S-shaped
03:27curve. Now, this S-shaped curve, that is why we are calling it as sigmoid function as well.
03:35Now, in logistic regression, we use the concept of threshold value. Threshold value which
03:40is between the value 0 to 1. Why this threshold value is given? Because we are specifying
03:48the values to be 0 to 1, we need to give a between value for between the both the values
03:560 to 1. The reason is so that when the range comes between 0 to 1, we can understand that
04:03this is the threshold limit for that function. That is why we are giving a threshold value
04:09here in the logistic function. Now, there should be certain assumptions as
04:16well which we have already learnt in the classification algorithm. So, the assumptions here for logistic
04:23regression is the dependent variable must be categorical in nature. It should be categorical
04:30in nature like I have said yes or no, true or false, the values between 0 and 1. So,
04:36either 0 or 1, this is how the categorical variable should be. Now, independent variable
04:42should not have multicollinearity. What is multicollinearity? The variables are highly
04:49dependent on each other. So, that should not be there in the case of independent variable.
04:54So, these are the assumptions done in the case of logistic regression. Now, logistic
05:01regression equation, how the equation works on it? Now, the logistic regression equation
05:06can be obtained by the linear regression equation and the mathematical function here is y is
05:13equal to 0. That is if the equation is of the straight line, y is equal to b 0 plus
05:20b 1 x 1 plus b 2 x 2 etcetera up to b n x n. Now, in logistic regression, y can be between
05:290 and 1 only. So, we divide the above equation in this form whereby by 1 minus y where y
05:39is not in the range above 1. That means we are giving the value, we have already mentioned
05:48the value should be between, the output value should be between 0 to 1. So, that is why
05:54we are giving y is equal to 0 and infinity which is y is equal to 1 since the range is
06:01between 0 to 1. Now, but we need the range from the value minus infinity to plus infinity.
06:10So, that is why we give a logarithmic equation to the above form. So, log of y by 1 minus
06:18y is equal to b 0 plus b 1 x 1 plus b 2 x 2 etcetera up to b n x n. This is the general
06:28equation form which you need to understand when it is getting converted from the linear
06:34regression equation to logistic. Now, the types of logistic regression. First
06:42one is binomial. In binomial logistic regression, there can be only two possible values that
06:48is dependent variable such as 0 to 1, then pass or fail, then true or false. This all
06:57comes under binomial regression. The next is multinomial regression. In this multinomial
07:05logistic regression, there can be three or more possible unordered types of dependent
07:12variable. Three or more and just remember the word unordered that is cats, dogs, sheep.
07:21Is there any relation between cats, dogs and sheep like where we can say that they are
07:26under ordered category? No, this is coming under unordered category.
07:32Now, next is ordinal regression, logistic regression. Ordinal logistic regression, here
07:39there could be three or more possible values and it should be of an ordered type. Three
07:47or more possible values and it should be of ordered type such as low, medium, high. We
07:54know this is how they specify. It is of an order. It is low, medium and high. So, this
08:01is the three types of logistic regression.
08:05Now, how it is implemented using Python? We will be going into Python in later on classes.
08:12So, using the Python concept, how the logistic regression is being implemented? First one
08:18is data pre-processing step. This is basically where you have a file, we have a file type
08:24and how it is being converted and what are the functions used for the processing step
08:30to be taken for that particular file. Then, we are using the fitting logistic regression
08:36to the training set. We are just like the best fit line in the linear regression. We
08:41are using the same concept here, best fitting logistic regression to the training set which
08:48we have maintained. Now, after that we are predicting the test result. Once the training
08:55set has been given, we do the required implementation and we will be getting a desired output or
09:01we might not be getting the desired output, but we will definitely get a result. So, we
09:05are predicting the test results here. Now, once the test result is obtained, we are checking
09:12the accuracy of the test result. Is it really matching our expected output and for that
09:19here we are creating a confusion matrix which we have studied in our previous classes. Now,
09:26after that once the accurate results are obtained, we are visualizing the test result. How the
09:33expected output comes in the form of a graph which will be easier for the viewer to understand.
09:40So, this is how the implementation of logistic regression here is done. Now, coming into
09:45advantages and disadvantages. Now, advantages here it is easy to implement, interpret and
09:52very efficient to train. It makes no assumptions about the classes. We make here, we will be
10:03getting a desired output or we might not be getting the desired output, but we will definitely
10:07get a result. So, we are predicting the test results here. Now, once the test result is
10:13obtained, we are checking the accuracy of the test result. Is it really matching our
10:20expected output and for that here we are creating a confusion matrix which we have studied in
10:27our previous classes. Now, after that once the accurate results are obtained, we are
10:33visualizing the test result. How the expected output comes in the form of a graph which
10:40will be easier for the viewer to understand. So, this is how the implementation of logistic
10:46regression here is done. Now, coming into advantages and disadvantages. Now, advantages
10:52here it is easy to implement, interpret and very efficient to train. It makes no assumptions
11:00about the classes. We make here no assumptions about the distribution of classes in feature
11:07space. Now, it can easily extend here multiple classes. We are using multinomial regression
11:15here and a natural probabilistic view is possible here for the class predictions. Now, it also
11:24provides a measure of how appropriate a predictor that is the coefficient sizes and also its
11:31direction of association whether it is positive or negative. Now, it is fast at classifying
11:39unknown records ok. When we do not have ideas about the records that we are getting, it
11:44helps us to classify those records in much faster way. Now, there is good accuracy even
11:51for the simple data set. We have a good accuracy here when we are going for the logistic regression
11:58and it performs well when the data set is linearly separable ok. When the data set is
12:05linearly separable, we are able to simplify the data set and give accurate results. It
12:13can interpret model coefficients as indicators of feature importance. Now, logistic regression
12:21here is less inclined to overfitting and the problem here is the overfitting of high dimensional
12:30data set ok. It has overfitting issue. It is much less inclined to the overfitting issue,
12:37but when it comes to high dimensional data sets, we will have problem with the overfitting.
12:43So, here we are using the regularization techniques such as L 1 and L 2 to avoid such overfitting
12:51scenarios for high dimensional data sets. So, this is something which is for the logistic
12:58regression easier techniques and analysis which helps us the usage of logistic regression
13:05more comparing the previous one. Now, disadvantages here is if the number of observations here
13:12are lesser than the number of features, logistic regression should not be used ok. When the
13:19observations that we have obtained is lesser than the number of features, we cannot use
13:24here the logistic regression concepts as it may lead to overfitting.
13:30Now, it constructs linear boundaries. Again, we cannot go in we have a limited boundary
13:36concept here and we cannot cross the boundary. Now, the major limitation here of this is
13:43the assumption of linearity between the dependent and the independent variable. We are assuming
13:49the linearity between the dependent and the independent variable. It can also be used
13:55to predict discrete function. Hence, dependent variable of logistic regression is bound to
14:02become discrete number set. This is under disadvantage. Now, non-linear problems cannot
14:09be solved by logistic regression ok. Non-linear problems cannot be solved by logistic regression.
14:17Now, logistic regression requires average or no multicolinearity that is what over there
14:24we are assuming there is no multicolinearity between the independent variables. The same
14:29that is a disadvantages also one of the disadvantage here that we have no multicolinearity is
14:36being assumed. Now, it is tough to obtain complex relationships
14:40using logistic regression. When the equation turns to be much more complex or the data
14:46sets that are given to us are much more complex, we will not be able to solve it using logistic
14:53regression. Because, there are more power and we are specifying there are much more
14:58powerful algorithms for neural networks such as neural networks and other outstanding performing
15:05algorithms. Now, in the case of linear regression independent
15:10and dependent variables are related linearly. Then, the logistic regression needs to be
15:16independent and we are linearly relating this to be as log of p by 1 minus p. This
15:25is the equation which gives us much more complexity. So, this is the disadvantages here of logistic
15:33regression where we are considering is it in the form of s shape form. We are giving
15:39a threshold value and the range should be between 0 to 1. This is just the basic concept
15:46to understand the logistic regression. Now, coming to the k nearest algorithm which
15:51is the next algorithm coming under supervised learning technique. Now, it assumes the similarity
15:57between the data set which is available and the data set which we have received. New data
16:04set which we have received and the available cases and we classify them into a category
16:10that is most similar to the available data set which is given before.
16:15Now, it stores all available data and classifies new data based on the similarity. We are assuming
16:23here the similarity concept the data which is given how much similar it is to the data
16:30which is already available. Now, it can be used for regression as well as classification,
16:36but it is mostly used for the classification problems and it is a non-parametric algorithm
16:42which does not make any assumptions. It is also called as lazy learner. We have learned
16:48about lazy learner and active learner. Lazy learner here because it does not learn from
16:54the training set immediately. It waits for the new one and then only it works on the
17:00training data set. So, this is the basic introduction about the k nearest.
17:04Now, we have a figure here. This is a data new data and we already have a previous two
17:11categories of data which is already being classified. When a new data comes in we apply
17:16this regression this k nearest algorithm on the new data set and we try to analyze which
17:25category to it belongs ok. The new data set to which category it belongs category A or
17:33category B. So, once applying this the figure here before KNN and after KNN after applying
17:39the KNN algorithm the new data set belongs to category A. So, the steps here is we select
17:48the number of k neighbors ok. This is a data set and we are selecting set of neighbors
17:55from that data set and we are calculating the Euclidean distance of that ok. We are
18:01calculating the data set from the data set we are calculating the Euclidean distance
18:07and taking the nearest neighbor as per the calculated Euclidean distance.
18:12So, we are calculating the Euclidean distance and then we are trying to analyze which is
18:19the nearest to that data set. Among these k neighbors count the number of data points
18:25in each category and we are assigning the new data point to the category where the number
18:33of neighbors is maximum. So, this is how the model works. Now example here is now we have
18:42this new data point we have category A and category B that is a data sets we choose the
18:49number of neighbors for example, here we are choosing 5 neighbors of from the data point
18:55ok we are avoiding the data point and we are choosing 5 from total of 5 from both category
19:02A and category B and we are calculating the Euclidean distance between the data points
19:07ok. Now after the Euclidean so, this is how the
19:11Euclidean distance is being calculated we get the nearest neighbor as 3 nearest neighbors
19:18of category A and 2. When you are calculating the Euclidean distance here we are taking
19:243 data points from category A and 2 data points from category B and there is a formula applied
19:31for it which we have already known. Now this is how the figure looks like from A we have
19:36got 3 points and from B we have got 2 and there is a new data point between these 2.
19:43Once the algorithm has been applied ok what do we get? We understand that the data point
19:51belongs to category A ok. So, there is no particular way to determine the best value
19:57for K. So, the trial keeps on going how the model works.
20:05Now example here is now we have this new data point we have category A and category B that
20:14is a data sets we choose the number of neighbors for example, here we are choosing 5 neighbors
20:20of from the data point ok we are avoiding the data point and we are choosing 5 from
20:26total of 5 from both category A and category B and we are calculating the Euclidean distance
20:32between the data points ok. Now after the Euclidean so, this is how the Euclidean distance
20:39is being calculated we get the nearest neighbor as 3 nearest neighbors of category A and 2.
20:47So, when you are calculating the Euclidean distance here we are taking 3 data points
20:52from category A and 2 data points from category B and there is a formula applied for it which
20:58we have already known. Now this is how the figure looks like from A we have got 3 points
21:04and from B we have got 2 and there is a new data point between these 2.
21:10Once the algorithm has been applied ok what do we get? We understand that the data point
21:18belongs to category A ok. So, there is no particular way to determine the best value
21:23for k. So, the trial keeps on going until we find the best out of them and usually we
21:30take the value to be 5 ok minimum of 5 should be taken a very low value such as k 1 k 2
21:37if you are taking 1 or 2 points there will be noisy issues and we will not get the accurate
21:42results. So, minimum of 5 values are considered here and then from that we are calculating
21:49the nearest point or to which category the new data point is considered.
21:56Now the implementation is also the same with respect to the python we are definitely data
22:01pre processing step is being used we are using the best fit for the KNN algorithm using the
22:06training set we are predicting the test result we are testing the accuracy of the result
22:11and we are visualizing the test result. Now advantages here is it is simple it is robust
22:17compared to the noisy training data set and it is more effective if the training data
22:22set is large. Now comparing the disadvantages here always
22:27it needs to determine the value of k which may be complex at some time ok and the computation
22:33cost here is high because we have to calculate the distance between the data points. So,
22:39the time consuming is also more. Now the linear regression and logistic regression difference
22:44here it is used to predict continuous the other one is used to predict the categorical
22:50here we are used for we are using it for problem solving and the other one is using classification
22:56for problem solving. In linear regression we are predicting continuous and the other
23:00one we are predicting categorical values and here in linear regression we find the best fit line
23:07in the case of logistic we are using the s curve and finally, least square estimation method is
23:14used for accuracy here and here in logistic regression we are using maximum likelihood
23:20estimation to get the best results. With this we have completed this session. Thank you so much.