Introduction to Decision Trees

  • 2 days ago
DEAR SIR/MADAM, I NEED THIS CHANNEL SPONSORSHIP FROM YOU TOO IF POSSIBLE https://www.dailymotion.com/mokkamodina19

DEAR SIR/MADAM, PLEASE ACCEPT AND PERMISSION DAILY MOTION, VIMEO AND RUMBLE.COM IN BANGLADESH.
Mohammad Ali Ashraf, [6/29/2024 7:07 PM]
https://ea.ebs.bankofchina.com/contactUs_en.html

ashrafm703@gmail.com IS MY EMAIL AND
DEAR SIR/MADAM, I NEED SUPPORT AND LOAN FROM CHINESE PRESIDENT AND GOVT.

DEAR SIR/MADAM, I WANT LOAN FROM YOU TO LEARN AND EARN MONEY BY THE SUPER AFFILIATE MARKETING OF JOHN CRESTANI.

DEAR SIR/MADAM.I AM FROM BANGLADESH MY PAYONEER ID IS mokkamodina19@gmail.com , IN ANY BANK ACCOUNTS OF BANGLADESH IS NOT SAFE FOR MY MONEY YOU KNOW THAT. WILL PAYONEER BANK ACCOUNTS OF MINE WILL BE SAFE FOR MY MONEY.? ALL OF MY ASSETS OR PROPERTIES ARE SEIZED BY FAKE WAYS IN BANGLADESH. I WANT TO LOAN FROM YOU TO BUY READYMADE HOUSES OR HOMES FOR ME. ALSO NEED LOANS FROM YOU WORK OR JOB VISA FOR USA/CANADA. ALSO NEED LOAN FROM YOU TO SETTLE AND STUDY IN THE USA/CANADA.. I WANT TO HANDOVER ALL OF MY ASSETS OR PROPERTIES WHICH ARE SEIZED BY FAKE WAYS TO THE USA/CANADA UNIVERSITIES.

https://www.aiddata.org/how-china-lends

mokkamodina19@gmail.com is my payoneer id. i want my youtube channel sponsorship from you.https://www.youtube.com/channel/UCrNbtmpYbyYMgmzD9aPtGMQ
and also help me to monetize my tiktok https://www.tiktok.com/@mohammadaliashr3


AFTER MONETIZATION I WANT THIS LINK SPONSORSHIP FROM YOU TOO. https://rutube.ru/channel/37901720/ ,,,,mokkamodina19@gmail.com is my payoneer id.
Transcript
00:00Hello everyone.
00:13We are moving on into the next session that is Decision Train where we are learning about
00:18how the decision concept algorithm works for the classification algorithm.
00:24It is just like how a human being thinks on making a decision process.
00:28So, we will go ahead and see how it works.
00:32Decision tree is nothing but a supervised learning algorithm which is used for both
00:36classification and regression algorithm, but mostly it is used for classification itself.
00:42It is just like a tree structure classified where we have a data set, we have branches,
00:49we have leaf node.
00:51The data set is basically on the root node and from there the decision making process
00:57is called as branching and it gives us decision nodes and finally, the outcome or the output
01:03is called as leaf node.
01:05Now, decision tree here we have most important two nodes which is the decision node and the
01:12leaf node.
01:13Decision nodes are here to make the decision based on our problems and it can have multiple
01:19branches whereas, leaf node is nothing but the output or the outcome and it cannot be
01:26divided further.
01:28So, decision tree is basically performed, test performed on the basis of the features
01:35of a given data set.
01:38So, now the graphical representation is definitely in the form of a tree.
01:45It has like how we plant a seed in it, then from seed a plant grows and from the plant
01:53the tree is developed.
01:55So, the same concept is used here.
01:58We have a problem.
01:59How the humans think about that problem?
02:01When there is a problem they try to find a solution and that solution will have two more
02:07two or three decision making processes.
02:09We will ask our friends, we will ask our family, we will get their opinions, but finally only
02:15one thing is decided and moving ahead.
02:18So, same way we will have a root node which has a data set and from there the decision
02:25is decided on based on yes or no and other conditions and we will have lot of decision
02:31making nodes.
02:32From those decision making nodes we will have a leaf node which has the output here.
02:38So, for tree like structure that is how the graphical representation comes for the decision
02:45tree and the algorithm used here is SCART which is classification and regression tree
02:51algorithm and the decision making is simply based on yes or no concept.
02:58So, this is how the decision tree figure looks like.
03:02We will have a decision node which is a root node here.
03:06Decision node has lot of features required data set features, then from there we will
03:11have subdivisions that is subdivisions of the decision nodes and from there we will
03:16have the required output.
03:18So, this subdivision here from the root node is called as sub tree.
03:27Now why we use decision tree?
03:29Decision tree usually mimics like a mimics like how a human think when we have a problem
03:37we know we do not know how to solve it ok, we do not know how to solve it.
03:42What we will do?
03:43We will ask our family members, we will ask our close friends, we will ask our colleagues
03:49or any xyz person, we will get 2, 3 solutions to it and based on our capability we decide
03:57on one important factor and we try to solve our problem.
04:01So, the same way decision tree works on the same idea, it is easier to understand and
04:07get into the solution.
04:10It has a problem, it gets the data sets required for it and from there the decision making
04:15process starts and finally, an output is produced.
04:19So, it is much more similar to how human think and it is easier to understand and the logic
04:29behind decision tree is a tree like structure which a person can understand easily that
04:35is why we use decision tree.
04:39Now certain terminologies which you need to be familiar when you are hearing the decision
04:43tree concept.
04:44First one is a root node, root node is where the decision tree starts ok, the first starts
04:50the first division of the decision node starts from the root node.
04:55It represents the entire data set ok, the root node has the entire data set and it is
05:02further divided based on 2 or more or a huge amount of data's.
05:08Now leaf node is a next one which is the final output node ok and the tree cannot be further
05:14divided from the leaf node.
05:18Now splitting, splitting here is a process of dividing the decision node ok, the decision
05:26node or the root node into sub node, this is this process is called as splitting.
05:31Now branch or sub tree, branching or sub tree is nothing but a tree formed from the decision
05:38node ok or a root node.
05:40If from a root node there is a tree being formed that is called as sub tree or branching.
05:47Now next is spruning, spruning here is a process of removing the unwanted data's just like
05:53how we have a problem ok and for that problem if we have 4 solutions, we will decide on
06:00one solution and remaining solutions are removed.
06:04So the same technique here is applied in the case of pruning, we are removing the unwanted
06:09branches from the data or from the tree.
06:12Now next is the parent or the child node, the root node of the tree is called the parent
06:17node and the other nodes are called as the child node.
06:21So these are the terminologies used for decision tree.
06:26Now how does the algorithm work here, the first one is it begins with a tree, it begins
06:34with a tree that is we have a root node and that root node we can name as S ok and it
06:41contains the complete data set, there is a root node and that root node has a complete
06:47data set.
06:48From there we find the best attribute ok or the best attribute from the data set using
06:55attribute selection measure that is ASM, using ASM technique we are trying to find out the
07:02best attribute.
07:04Then from there the S that is the root node is divided and it contains a lot of set of
07:11best fit attribute for our problem.
07:15So the step 4 is generating this decision tree that is which contains the best attribute,
07:21containing the decision tree is made further or it is divided further in step 4 based on
07:28the attribute.
07:30Now step 5 is nothing but the recursive process of step 3 and step 4 until we get the final
07:39output that is the leaf node and it cannot be further divided.
07:44So this is how the decision tree algorithm work, we have a root node which has a set
07:50of data sets or important data complete entire data set, from the root node we are making
07:56decision nodes, decision nodes have the best attribute features using the ASM technique
08:03and this branching or sub dividing of the decision nodes takes place until we get the
08:10desired output, for this repetitive process of step 3 and step 4 takes place.
08:17So this is how the algorithm works.
08:21Now the ASM technique that is the attribute selection measure, the process where we find
08:27the best attribute, here we have two techniques that is information gain and Gini index.
08:34So before getting into the information gain and Gini index, we need to know what is entropy.
08:41Entropy here is the metric which is used to measure the impurity in a given attribute
08:48because it specifies mainly in the randomness that occurs in a data.
08:54So we will have an impurity and that measurement of that impurity is called as entropy and
09:00for that formula is entropy that is S is equal to minus P of S log 2 times P of S minus
09:10probability of no log 2 times probability of no, P is probability and yes or no and
09:17that is how the formula works.
09:20So S is the total number of samples here, probability of yes and probability of no that
09:26is P of yes and P of no.
09:28So this is how you find out the impurity that is entropy in the given attribute.
09:35Now coming into information gain, from the word information gain itself you will understand
09:41what does it mean.
09:42We need maximum information from the given data set that is the information gain.
09:49Information gain is the measurement of changes in entropy after the segmentation of data
09:54set based on an attribute.
09:57It calculates how much information we need to know what is entropy.
10:06Entropy here is the metric which is used to measure the impurity in a given attribute
10:13because it specifies mainly in the randomness that occurs in a data.
10:19So we have an impurity and that measurement of that impurity is called as entropy and
10:25for that formula is entropy that is S is equal to minus P of S log 2 times P of S minus probability
10:36of no log 2 times probability of no, P is probability and yes or no and that is how
10:44the formula works.
10:46So S is the total number of samples here, probability of yes and probability of no that
10:51is P of yes and P of no.
10:53So this is how you find out the impurity that is entropy in the given attribute.
11:00Now coming into information gain, from the word information gain itself you will understand
11:07what does it mean.
11:08We need maximum information from the given data set that is the information gain.
11:14Information gain is the measurement of changes in entropy after the segmentation of data
11:19set based on an attribute ok.
11:23It calculates how much information a feature provides us about a class, according to the
11:29value of information gain we split the node and build the decision tree.
11:36Now the decision tree algorithm always tries to maximize the value of information gain
11:43that is and the node and attribute having highest information gain is split first ok.
11:50So the node which has the maximum information splits first ok and how the information gain
11:57is calculated?
11:58It is entropy of S minus the weighted average into entropy of each feature.
12:06This is how the information of that particular attribute is calculated ok.
12:12So this is how the information gain technology works, you need to gain the maximum information
12:18and the node that contains maximum information is split further into a decision tree node.
12:25Now next is Gini index, it is a measure of impurity or purity used while creating a decision
12:33tree that is in the CART algorithm ok.
12:35An attribute with low Gini index should be preferred here as compared to the high Gini
12:41index.
12:42So in the information gain we are using an attribute which has maximum information, in
12:47Gini index we are using the attribute which has low Gini index.
12:52It also creates binary split ok.
12:55So Gini index can also be used for the techniques when we need binary splits that is 0 and 1
13:03ok and this is the formula for Gini index.
13:07So this is how the best attribute is selected using attribute selection measure ok.
13:17Now coming into the pruning that is removing of unwanted trees or unwanted nodes, pruning
13:23is a process of deleting the unnecessary nodes from a tree in order to get the optimal desired
13:31tree ok.
13:32In order to we do not we if we go for lot of division we will have a big tree, but we
13:38do not need an entire tree.
13:40So for that we prune the unwanted trees or unwanted decision nodes ok.
13:47A too large tree increases the risk of overfitting, why do not we need a big tree because it increases
13:54the risk of overfitting and a small tree, if you go for a small one also it will not
14:02capture all the important features of a data set.
14:06So if you go for a large one it will lead to overfitting, if you go for a small one
14:11it will lead to not capturing of all the required informations of a data set.
14:17Therefore a technique that decreases the size of the learning tree without reducing the
14:22accuracy ok, it reduces the size of the learning tree without reducing the accuracy is called
14:30as pruning ok.
14:32Now for this we have two types cost complexity pruning and reduced error pruning, this I
14:38am not going into detail this is just for you to understand what is pruning technique
14:42here.
14:44Now the implementation, the implementation is also in the same process we are using data
14:48pre-processing step here, we are trying to find the best decision tree for the training
14:54set, we are predicting it, we are we are taking the training set and we are predicting the
15:00test results and for the accuracy again we are using the creation of confusion matrix
15:05here and finally we are visualizing the test result.
15:10Now advantages, it is simple to understand as it follows the same process which a human
15:16follow while making any decisions in real life, it can be very useful for solving decision
15:23related problems, it helps to think about all the possible outcomes for a problem and
15:28there is very less requirement of data cleaning comparing the other algorithms.
15:35So this is the advantage, disadvantage here is it has lot of layers ok and it might make
15:42the decision tree complex, it may have an overfitting issue which can be resolved ok,
15:48if there is an overfitting issue we can use the random forest algorithm, for more class
15:54labels the computation complexity of the decision tree may increase.
15:58So this is the disadvantages here of decision tree.
16:03So in this we complete this section, thank you so much.