DEAR SIR/MADAM, I NEED THIS CHANNEL SPONSORSHIP FROM YOU TOO IF POSSIBLE https://www.dailymotion.com/mokkamodina19
DEAR SIR/MADAM, PLEASE ACCEPT AND PERMISSION DAILY MOTION, VIMEO AND RUMBLE.COM IN BANGLADESH.
Mohammad Ali Ashraf, [6/29/2024 7:07 PM]
https://ea.ebs.bankofchina.com/contactUs_en.html
ashrafm703@gmail.com IS MY EMAIL AND
DEAR SIR/MADAM, I NEED SUPPORT AND LOAN FROM CHINESE PRESIDENT AND GOVT.
DEAR SIR/MADAM, I WANT LOAN FROM YOU TO LEARN AND EARN MONEY BY THE SUPER AFFILIATE MARKETING OF JOHN CRESTANI.
DEAR SIR/MADAM.I AM FROM BANGLADESH MY PAYONEER ID IS mokkamodina19@gmail.com , IN ANY BANK ACCOUNTS OF BANGLADESH IS NOT SAFE FOR MY MONEY YOU KNOW THAT. WILL PAYONEER BANK ACCOUNTS OF MINE WILL BE SAFE FOR MY MONEY.? ALL OF MY ASSETS OR PROPERTIES ARE SEIZED BY FAKE WAYS IN BANGLADESH. I WANT TO LOAN FROM YOU TO BUY READYMADE HOUSES OR HOMES FOR ME. ALSO NEED LOANS FROM YOU WORK OR JOB VISA FOR USA/CANADA. ALSO NEED LOAN FROM YOU TO SETTLE AND STUDY IN THE USA/CANADA.. I WANT TO HANDOVER ALL OF MY ASSETS OR PROPERTIES WHICH ARE SEIZED BY FAKE WAYS TO THE USA/CANADA UNIVERSITIES.
https://www.aiddata.org/how-china-lends
mokkamodina19@gmail.com is my payoneer id. i want my youtube channel sponsorship from you.https://www.youtube.com/channel/UCrNbtmpYbyYMgmzD9aPtGMQ
and also help me to monetize my tiktok https://www.tiktok.com/@mohammadaliashr3
AFTER MONETIZATION I WANT THIS LINK SPONSORSHIP FROM YOU TOO. https://rutube.ru/channel/37901720/ ,,,,mokkamodina19@gmail.com is my payoneer id.
DEAR SIR/MADAM, PLEASE ACCEPT AND PERMISSION DAILY MOTION, VIMEO AND RUMBLE.COM IN BANGLADESH.
Mohammad Ali Ashraf, [6/29/2024 7:07 PM]
https://ea.ebs.bankofchina.com/contactUs_en.html
ashrafm703@gmail.com IS MY EMAIL AND
DEAR SIR/MADAM, I NEED SUPPORT AND LOAN FROM CHINESE PRESIDENT AND GOVT.
DEAR SIR/MADAM, I WANT LOAN FROM YOU TO LEARN AND EARN MONEY BY THE SUPER AFFILIATE MARKETING OF JOHN CRESTANI.
DEAR SIR/MADAM.I AM FROM BANGLADESH MY PAYONEER ID IS mokkamodina19@gmail.com , IN ANY BANK ACCOUNTS OF BANGLADESH IS NOT SAFE FOR MY MONEY YOU KNOW THAT. WILL PAYONEER BANK ACCOUNTS OF MINE WILL BE SAFE FOR MY MONEY.? ALL OF MY ASSETS OR PROPERTIES ARE SEIZED BY FAKE WAYS IN BANGLADESH. I WANT TO LOAN FROM YOU TO BUY READYMADE HOUSES OR HOMES FOR ME. ALSO NEED LOANS FROM YOU WORK OR JOB VISA FOR USA/CANADA. ALSO NEED LOAN FROM YOU TO SETTLE AND STUDY IN THE USA/CANADA.. I WANT TO HANDOVER ALL OF MY ASSETS OR PROPERTIES WHICH ARE SEIZED BY FAKE WAYS TO THE USA/CANADA UNIVERSITIES.
https://www.aiddata.org/how-china-lends
mokkamodina19@gmail.com is my payoneer id. i want my youtube channel sponsorship from you.https://www.youtube.com/channel/UCrNbtmpYbyYMgmzD9aPtGMQ
and also help me to monetize my tiktok https://www.tiktok.com/@mohammadaliashr3
AFTER MONETIZATION I WANT THIS LINK SPONSORSHIP FROM YOU TOO. https://rutube.ru/channel/37901720/ ,,,,mokkamodina19@gmail.com is my payoneer id.
Category
đĻ
CreativityTranscript
00:00Hello everyone, this is the last lesson for this chapter 4, where we will be studying
00:17the overfitting and the evaluation techniques that is helpful for the data that we need
00:23to fit in for our problem solving, okay.
00:26So let us move on to the overfitting concept here.
00:31Overfitting in supervised learning, here we are using a function which is chosen to fit
00:36the training set from among a set of hypothesis.
00:40We have a set of hypothesis and we are choosing a function that fits the best in our training
00:47set.
00:48The function here is guessed based on the lot of hypothesis or the small function that
00:54is the function here is guessed belongs to a small subset of functions, okay.
01:01All the possible functions that we have will have a small subset and from those subsets
01:07we are determining the required function.
01:11Then even with an incomplete set of training samples, it is possible to reduce the subset
01:18of functions that are consistent with the training set and it sufficiently makes useful
01:26guesses about the value of the functions for inputs not in the training set, okay.
01:32So larger the training set, more likely it is that even a randomly selected consistent
01:39function will have an appropriate output for a pattern that we have not seen.
01:46Now however, in this biasing, if the training set is not sufficiently large when comparing
01:53the size of the hypothesis space, we know that the function here is guessed from the
01:57hypothesis space and if the training set is not sufficiently large, there will still be
02:04too many consistent functions that helps us to make guesses and we will be able to generalize
02:11the performance and when the generalization takes place, the performance will be poor
02:17if we do not have enough or sufficient training sets.
02:22So this is how the overfitting issues are related.
02:26Again when there are too many hypothesis that are consistent with the training set, we say
02:32that we are overfitting the training.
02:35So the training sets should be in such a way that where we should have enough consistent
02:42data and some inconsistent data.
02:44If it is too much consistent, we are saying that the data is overfitting.
02:48If it is least, then we are saying that there will be poor performance of that training
02:52data set.
02:53Now overfitting is a problem that we address in all learning methods.
02:58You must have seen me using the word overfitting in all my previous classes.
03:03So overfitting is a problem that generally occurs and it is a common problem.
03:08Since a decision tree is of sufficient size and it can implement a Boolean function, there
03:15is a danger of overfitting here also because Boolean function is definitely 0 and 1 and
03:20if there is lot of inconsistent data or minimum of inconsistent data, the same problem will
03:26occur in the case of decision tree especially if the training set is very small.
03:33That is even if a decision tree is synthesized to classify all the members of the training
03:39set correctly, it might perform poorly on new patterns that were not used to build the
03:48decision tree.
03:49So it might work fine for the patterns that we already know and we used or we developed
03:55for decision tree, but if the decision tree concept is not completely or certain patterns
04:02were not analyzed or used for that process, we will not be able to identify the new pattern.
04:10So this is how the overfitting issues are related to the decision tree concept.
04:16Now for this we will be using lot of validation method.
04:20So what is the basic concept here of validation method?
04:24The most straightforward way to estimate how well a hypothesized function performs on a
04:31test set is by testing it on a test set.
04:35So basically it is a twist hunger here, you are taking a test set and you are performing
04:41all the validation methods on the test set and we are seeing how the results are accurate
04:48for the test sets.
04:50So this is how it is being understood and compared.
04:54Now comparing several learning systems here, for example, we are comparing different decision
04:59trees.
05:01So what we do is we select one of the best performing decision trees and we are neglecting
05:07the others.
05:08So we select one that performs the best on the test set, then such a comparison amounts
05:16to the training on the test data set.
05:19So we take the best test data which gives us the best result and then the training is
05:26performed on that test data.
05:30Now training on the test data enlarges the training set, that is with a consequent expected
05:36improvement in generalization but there is still no danger of what?
05:44There could be a danger of overfitting if the generalization technique does not get
05:49applied here properly.
05:50If the generalization technique is very poor there could be a problem of overfitting when
05:57comparing different learning systems.
06:01Now another technique here is to split the training data set, okay.
06:06If we are splitting the training data set into say 2 by 3 and remaining is one third
06:13is kept for estimating, that is generalization of the performance.
06:17If someone asks you on what basis did you just split the data on 2 by 3 and 1 by 3,
06:23do you have an answer?
06:25Because this kind of splitting also will not be applied when we are training the data set.
06:30So therefore we cannot, if this technique is used it could help us in up to some extent
06:37but it is not applicable for every training data set because we do not have an accurate
06:42result to explain that why we have divided the data set into 2 and 2 by 3 and 1 by 3.
06:49So this is another technique where the difficulty of splitting the data into 2 by 3 and 1 by
06:543 takes place.
06:56But here splitting reduces the size of the training set and thereby increases the possibility
07:03of overfitting.
07:04When you are splitting the data in uneven form or any form of your choice you will find
07:12that either the training data set will be very large and the test data will be very
07:17small or the training set will be very small and the test data will be very large.
07:23So all this could lead to a problem of overfitting.
07:27So this is the validation concept here.
07:30So how the training data can be split in an efficient manner, this is what we are learning
07:37in validation method.
07:39The first one is cross validation.
07:41Cross validation is the best method used to overcome the overfitting issue.
07:47So here we are dividing the training set E into K mutually exclusive and exhaustive equal
07:55size subsets that is E1, E2 etc. into equal size.
08:00Remember the term equal size is very important here.
08:04For each subsets EI we train on the union of other subsets.
08:10So we have subsets, we take one subset, we union it with the other subset.
08:16So we are doing a unioning process here with each subset and we are determining the error
08:23rate which is small ei here from the training set that is from the subset EI.
08:30So the error rate here is formalized as the number of classification errors made on EI
08:38by the number of patterns that occur in EI.
08:42An estimate of the error rate that can be expected on new patterns of a classifier trained
08:48on all the patterns in training set E is then the average of EI.
08:54So this is how you will identify the error rate that is present in the training set.
09:01You equally divide the training set, then you are applying a union on each subsets and
09:07then you are calculating the error rate that is the patterns that occur in each pair or
09:13each combination of the subsets.
09:17Now next one is we leave one out validation technique, this is the same as the cross validation
09:24as well.
09:25Here K equals a number of patterns that occur in the training set E and EI consists of a
09:32single pattern.
09:33We are taking EK equals a number of patterns that occur in E and every EI has a single
09:44pattern.
09:45When testing is done on EI that is when testing is done on each single pattern an error could
09:51be generated or not, there could be a possibility of error being generated or not.
09:58If yes, if the error training set you equally divide the training set, then you are applying
10:08a union on each subsets and then you are calculating the error rate that is the patterns that occur
10:14in each pair or each combination of the subsets.
10:20Now next one is we leave one out validation techniques, the second technique this is the
10:25same as the cross validation as well.
10:29Here K equals a number of patterns that occur in the training set E and EI consists of a
10:35single pattern.
10:36We are taking EK equals a number of patterns that occur in E and EI every EI has a single
10:47pattern.
10:48When testing is done on EI that is when testing is done on each single pattern an error could
10:54be generated or not, there could be a possibility of error being generated or not.
11:01If yes, if the error is generated the total number of mistakes here is counted and it
11:07is divided by the K that is the number of patterns to get the estimated error rate.
11:15This type of validation is more expensive because there is lot of computational techniques
11:21that take place, but useful this is useful because we get a accurate estimate estimated
11:29error not the result we get the accurate error that occurs in each pattern.
11:35So this is the second technique here.
11:37Now how we are avoiding the so previously in validation methods to avoid the overfitting
11:44and other in the training data sets we are using two methods that is a cross validation
11:51and the previous one that is leave one out validation which is both similar, but in the
11:57cross validation we are trying to find out the error rate by unioning or by pairing up
12:02the each data subsets by equally dividing it.
12:06In the case of leave one out we also considering we are dividing the subsets, but we are considering
12:12the patterns for each EI for each EI we are considering the patterns and based on that
12:19we are trying to find out the error rate.
12:22Now how this overfitting issues is avoided in decision tree.
12:29It near the tips of the decision tree there may be only few patterns per node per node
12:36will have few patterns for these nodes we are selecting a test based on a very small
12:43sample and thus we are likely to be overfitting this is how the overfitting issue takes place.
12:50The problem here can be dealt by terminating the test generating procedure before all patterns
12:56are perfectly split into special categories.
13:00Before splitting the patterns into special categories we are terminating the test generating
13:06procedures that is a leaf node may contain pattern of more than one class ok there is
13:13example giving you an example here that is a leaf node may contain patterns of more than
13:20one class, but we can decide in favor of most number of classes ok in based on most of number
13:28of classes we can decide which pattern we can take ok.
13:34This procedure will result in few errors, but often accepting a small number of errors
13:40in the tree this will avoid big errors ok we cannot avoid small errors this will be
13:46avoiding major errors, but we will accept small small errors which is available in the
13:53training set in than the fewer errors on the testing set.
13:57So, once we use this cross validation techniques to determine when to stop splitting the nodes
14:05ok.
14:06So, once this overfitting problem is avoided we use the cross validation technique to determine
14:14when the splitting of the nodes should be stopped ok.
14:18So, this is how the overfitting problem is avoided in the decision tree you are categorizing
14:25it you are considering the we know that each node has a pattern it can contain one or more
14:31than one pattern that patterns are taken into consideration we are eliminating the
14:37test generating procedure that takes place before and we are considering each patterns
14:44in the classes to and we are accepting small amount of error in by doing this procedure
14:52ok.
14:53We are definitely it is not that the errors are being eliminated completely or it is becoming
14:57negligible we are accepting a small set of patterns in the training set and then we are
15:02using the cross validation technique in order to stop the further splitting of the nodes.
15:09Now after that if the cross validation error here increases ok as a consequence of the
15:16node split when the cross validation error increases what happens we are not going to
15:23split the node further.
15:26Now one has to be careful about when to stop though because underfitting will also be resulted
15:32because underfitting usually leads to more errors on the test set than the overfitting.
15:38So, once if you do not stop the node splitting technique at the right time it would lead
15:44to underfitting which will be more problem problematic than the overfitting issue.
15:50There is a general rule that the lowest error rate ok attainable by a subtree of a fully
15:58expanded tree can be no less than half of the error rate than the fully expanded tree
16:05rather than stopping the growth of the decision tree here one might grow it to its full size
16:11and then prune away the leaf node and their ancestors until the cross validation accuracy
16:19no longer increases.
16:21So, basically we will not be stopping the growth of the decision tree process here we
16:28will let it grow completely and then we will apply the pruning technique in by removing
16:34the nodes of the leaf nodes or the ancestor nodes until the cross validation value is
16:41accurate or it does not increase further.
16:44This technique is called as post pruning ok.
16:49So, in order to avoid the cross validation error to increase we are applying the technique
16:55called post pruning where we are letting the decision tree grow to grow fully into its
17:01expanded size and then we are removing the leaf nodes and the ancestor nodes until the
17:08cross validation value is accurate or it does not increase further.
17:14So, this is called as post pruning technique.
17:17Now next one is minimum description length method tree growing and pruning technique
17:22here is based on the MDL principle.
17:26The idea is that the simplest decision tree it can predict the classes into classes of
17:32the training patterns to the best one ok it is decision tree here it helps us to predict
17:38the classes of the training patterns in such a way that we are able to choose the best
17:44one.
17:45If the tree is small and accurately classifies all of the pattern it might be more economical
17:50to transmit the tree than transmitting the labels.
17:55If the tree is small and it is accurate we will be transferring the entire tree rather
18:01than the labels.
18:02So, in general the number of bits here is equal to t plus d where t is the length of
18:08the message required to transmit the tree and d is the length of the message required
18:14to transmit the label of the patterns misclassified by the tree.
18:19So, tree associated with the smallest value of t plus d is best ok the tree the where
18:26the t plus d value is the smallest we will be considering that tree as the best economical
18:32tree.
18:33The MDL method is the same as that of Occam's razor principle where we are keeping only
18:40the required data and pruning away or removing the unwanted data that is the same principle
18:47that is being used here in MDL method.
18:51Now noise in the data noise in the data means that one must be inevitably ready to accept
18:57some number of errors depending on the noise level.
19:01We should we are aware that there will be some number of errors.
19:05Now refusal to tolerate errors on the training set when there is noise leads to the problem
19:11of fitting the noise ok.
19:13If you are not ready to accept the noise in the error it would lead to a problem of fitting
19:18the noise where you can fit that particular error in the data.
19:23Dealing with noise then requires of course, when you are dealing with the noise you have
19:28to accept some errors at the leaf nodes ok that is the outcome just as does the fact
19:35that there are small number of patterns at leaf nodes.
19:38So basically in decision tree you have to accept some small amount of noise just as
19:45the same way we accept that leaf nodes will have some pattern.
19:49So noise in the data when you are using decision tree cannot be completely removed there will
19:54be a small rate of noise in the data in the case of decision tree concept.
20:03Unwanted data that is the same principle that is being used here is MDL method.
20:08Now noise in the data noise in the data means that one must be inevitably ready to accept
20:15some number of errors depending on the noise level.
20:19We should we are aware that there will be some number of errors.
20:22Now refusal to tolerate errors on the training set when there is noise leads to the problem
20:28of fitting the noise ok.
20:31If you are not ready to accept the noise in the error it would lead to a problem of fitting
20:36the noise where you can fit that particular error in the data.
20:41Dealing with noise then requires of course, when you are dealing with the noise you have
20:46to accept some errors at the leaf nodes ok that is the outcome just as does the fact
20:52that there are small number of patterns at leaf nodes.
20:56So basically in decision tree you have to accept some small amount of noise just as
21:02the same way we accept that leaf nodes will have some pattern.
21:07So noise in the data when you are using decision tree cannot be completely removed there will
21:12be a small rate of noise in the data in the case of decision tree concept.
21:19So in this we complete this section about overfitting and evaluation techniques in the
21:24decision tree algorithm.