Introduction to Decision Trees

MOHAMMAD ALI ASHRAF

DEAR SIR/MADAM, I NEED THIS CHANNEL SPONSORSHIP FROM YOU TOO IF POSSIBLE https://www.dailymotion.com/mokkamodina19  DEAR SIR/MADAM, PLEASE ACCEPT AND PERMISSION DAILY MOTION, VIMEO AND RUMBLE.COM IN BANGLADESH.  Mohammad Ali Ashraf, [6/29/2024 7:07 PM] https://ea.ebs.bankofchina.com/contactUs_en.html  ashrafm703@gmail.com IS MY EMAIL AND DEAR SIR/MADAM, I NEED SUPPORT AND LOAN FROM CHINESE PRESIDENT AND GOVT.  DEAR SIR/MADAM, I WANT LOAN FROM YOU TO LEARN AND EARN MONEY BY THE SUPER AFFILIATE MARKETING OF JOHN CRESTANI.  DEAR SIR/MADAM.I AM FROM BANGLADESH MY PAYONEER ID IS  mokkamodina19@gmail.com   , IN ANY BANK ACCOUNTS OF BANGLADESH IS NOT SAFE FOR MY MONEY YOU KNOW THAT. WILL PAYONEER BANK ACCOUNTS OF MINE WILL BE SAFE FOR MY MONEY.? ALL OF MY ASSETS OR PROPERTIES ARE SEIZED BY FAKE WAYS IN BANGLADESH. I WANT TO LOAN FROM YOU TO BUY  READYMADE HOUSES OR HOMES FOR ME. ALSO NEED LOANS FROM YOU WORK OR JOB VISA FOR USA/CANADA. ALSO NEED LOAN FROM YOU TO SETTLE AND STUDY IN THE USA/CANADA.. I WANT TO HANDOVER ALL OF MY ASSETS OR PROPERTIES WHICH ARE SEIZED BY FAKE WAYS TO THE USA/CANADA UNIVERSITIES.  https://www.aiddata.org/how-china-lends  mokkamodina19@gmail.com is my payoneer id. i want my youtube channel sponsorship from you.https://www.youtube.com/channel/UCrNbtmpYbyYMgmzD9aPtGMQ and also help me to monetize my tiktok https://www.tiktok.com/@mohammadaliashr3   AFTER MONETIZATION I WANT THIS LINK SPONSORSHIP FROM YOU TOO. https://rutube.ru/channel/37901720/ ,,,,mokkamodina19@gmail.com is my payoneer id.

Transcript

00:00Hello everyone.

00:13We are moving on into the next session that is Decision Train where we are learning about

00:18how the decision concept algorithm works for the classification algorithm.

00:24It is just like how a human being thinks on making a decision process.

00:28So, we will go ahead and see how it works.

00:32Decision tree is nothing but a supervised learning algorithm which is used for both

00:36classification and regression algorithm, but mostly it is used for classification itself.

00:42It is just like a tree structure classified where we have a data set, we have branches,

00:49we have leaf node.

00:51The data set is basically on the root node and from there the decision making process

00:57is called as branching and it gives us decision nodes and finally, the outcome or the output

01:03is called as leaf node.

01:05Now, decision tree here we have most important two nodes which is the decision node and the

01:12leaf node.

01:13Decision nodes are here to make the decision based on our problems and it can have multiple

01:19branches whereas, leaf node is nothing but the output or the outcome and it cannot be

01:26divided further.

01:28So, decision tree is basically performed, test performed on the basis of the features

01:35of a given data set.

01:38So, now the graphical representation is definitely in the form of a tree.

01:45It has like how we plant a seed in it, then from seed a plant grows and from the plant

01:53the tree is developed.

01:55So, the same concept is used here.

01:58We have a problem.

01:59How the humans think about that problem?

02:01When there is a problem they try to find a solution and that solution will have two more

02:07two or three decision making processes.

02:09We will ask our friends, we will ask our family, we will get their opinions, but finally only

02:15one thing is decided and moving ahead.

02:18So, same way we will have a root node which has a data set and from there the decision

02:25is decided on based on yes or no and other conditions and we will have lot of decision

02:31making nodes.

02:32From those decision making nodes we will have a leaf node which has the output here.

02:38So, for tree like structure that is how the graphical representation comes for the decision

02:45tree and the algorithm used here is SCART which is classification and regression tree

02:51algorithm and the decision making is simply based on yes or no concept.

02:58So, this is how the decision tree figure looks like.

03:02We will have a decision node which is a root node here.

03:06Decision node has lot of features required data set features, then from there we will

03:11have subdivisions that is subdivisions of the decision nodes and from there we will

03:16have the required output.

03:18So, this subdivision here from the root node is called as sub tree.

03:27Now why we use decision tree?

03:29Decision tree usually mimics like a mimics like how a human think when we have a problem

03:37we know we do not know how to solve it ok, we do not know how to solve it.

03:42What we will do?

03:43We will ask our family members, we will ask our close friends, we will ask our colleagues

03:49or any xyz person, we will get 2, 3 solutions to it and based on our capability we decide

03:57on one important factor and we try to solve our problem.

04:01So, the same way decision tree works on the same idea, it is easier to understand and

04:07get into the solution.

04:10It has a problem, it gets the data sets required for it and from there the decision making

04:15process starts and finally, an output is produced.

04:19So, it is much more similar to how human think and it is easier to understand and the logic

04:29behind decision tree is a tree like structure which a person can understand easily that

04:35is why we use decision tree.

04:39Now certain terminologies which you need to be familiar when you are hearing the decision

04:43tree concept.

04:44First one is a root node, root node is where the decision tree starts ok, the first starts

04:50the first division of the decision node starts from the root node.

04:55It represents the entire data set ok, the root node has the entire data set and it is

05:02further divided based on 2 or more or a huge amount of data's.

05:08Now leaf node is a next one which is the final output node ok and the tree cannot be further

05:14divided from the leaf node.

05:18Now splitting, splitting here is a process of dividing the decision node ok, the decision

05:26node or the root node into sub node, this is this process is called as splitting.

05:31Now branch or sub tree, branching or sub tree is nothing but a tree formed from the decision

05:38node ok or a root node.

05:40If from a root node there is a tree being formed that is called as sub tree or branching.

05:47Now next is spruning, spruning here is a process of removing the unwanted data's just like

05:53how we have a problem ok and for that problem if we have 4 solutions, we will decide on

06:00one solution and remaining solutions are removed.

06:04So the same technique here is applied in the case of pruning, we are removing the unwanted

06:09branches from the data or from the tree.

06:12Now next is the parent or the child node, the root node of the tree is called the parent

06:17node and the other nodes are called as the child node.

06:21So these are the terminologies used for decision tree.

06:26Now how does the algorithm work here, the first one is it begins with a tree, it begins

06:34with a tree that is we have a root node and that root node we can name as S ok and it

06:41contains the complete data set, there is a root node and that root node has a complete

06:47data set.

06:48From there we find the best attribute ok or the best attribute from the data set using

06:55attribute selection measure that is ASM, using ASM technique we are trying to find out the

07:02best attribute.

07:04Then from there the S that is the root node is divided and it contains a lot of set of

07:11best fit attribute for our problem.

07:15So the step 4 is generating this decision tree that is which contains the best attribute,

07:21containing the decision tree is made further or it is divided further in step 4 based on

07:28the attribute.

07:30Now step 5 is nothing but the recursive process of step 3 and step 4 until we get the final

07:39output that is the leaf node and it cannot be further divided.

07:44So this is how the decision tree algorithm work, we have a root node which has a set

07:50of data sets or important data complete entire data set, from the root node we are making

07:56decision nodes, decision nodes have the best attribute features using the ASM technique

08:03and this branching or sub dividing of the decision nodes takes place until we get the

08:10desired output, for this repetitive process of step 3 and step 4 takes place.

08:17So this is how the algorithm works.

08:21Now the ASM technique that is the attribute selection measure, the process where we find

08:27the best attribute, here we have two techniques that is information gain and Gini index.

08:34So before getting into the information gain and Gini index, we need to know what is entropy.

08:41Entropy here is the metric which is used to measure the impurity in a given attribute

08:48because it specifies mainly in the randomness that occurs in a data.

08:54So we will have an impurity and that measurement of that impurity is called as entropy and

09:00for that formula is entropy that is S is equal to minus P of S log 2 times P of S minus

09:10probability of no log 2 times probability of no, P is probability and yes or no and

09:17that is how the formula works.

09:20So S is the total number of samples here, probability of yes and probability of no that

09:26is P of yes and P of no.

09:28So this is how you find out the impurity that is entropy in the given attribute.

09:35Now coming into information gain, from the word information gain itself you will understand

09:41what does it mean.

09:42We need maximum information from the given data set that is the information gain.

09:49Information gain is the measurement of changes in entropy after the segmentation of data

09:54set based on an attribute.

09:57It calculates how much information we need to know what is entropy.

10:06Entropy here is the metric which is used to measure the impurity in a given attribute

10:13because it specifies mainly in the randomness that occurs in a data.

10:19So we have an impurity and that measurement of that impurity is called as entropy and

10:25for that formula is entropy that is S is equal to minus P of S log 2 times P of S minus probability

10:36of no log 2 times probability of no, P is probability and yes or no and that is how

10:44the formula works.

10:46So S is the total number of samples here, probability of yes and probability of no that

10:51is P of yes and P of no.

10:53So this is how you find out the impurity that is entropy in the given attribute.

11:00Now coming into information gain, from the word information gain itself you will understand

11:07what does it mean.

11:08We need maximum information from the given data set that is the information gain.

11:14Information gain is the measurement of changes in entropy after the segmentation of data

11:19set based on an attribute ok.

11:23It calculates how much information a feature provides us about a class, according to the

11:29value of information gain we split the node and build the decision tree.

11:36Now the decision tree algorithm always tries to maximize the value of information gain

11:43that is and the node and attribute having highest information gain is split first ok.

11:50So the node which has the maximum information splits first ok and how the information gain

11:57is calculated?

11:58It is entropy of S minus the weighted average into entropy of each feature.

12:06This is how the information of that particular attribute is calculated ok.

12:12So this is how the information gain technology works, you need to gain the maximum information

12:18and the node that contains maximum information is split further into a decision tree node.

12:25Now next is Gini index, it is a measure of impurity or purity used while creating a decision

12:33tree that is in the CART algorithm ok.

12:35An attribute with low Gini index should be preferred here as compared to the high Gini

12:41index.

12:42So in the information gain we are using an attribute which has maximum information, in

12:47Gini index we are using the attribute which has low Gini index.

12:52It also creates binary split ok.

12:55So Gini index can also be used for the techniques when we need binary splits that is 0 and 1

13:03ok and this is the formula for Gini index.

13:07So this is how the best attribute is selected using attribute selection measure ok.

13:17Now coming into the pruning that is removing of unwanted trees or unwanted nodes, pruning

13:23is a process of deleting the unnecessary nodes from a tree in order to get the optimal desired

13:31tree ok.

13:32In order to we do not we if we go for lot of division we will have a big tree, but we

13:38do not need an entire tree.

13:40So for that we prune the unwanted trees or unwanted decision nodes ok.

13:47A too large tree increases the risk of overfitting, why do not we need a big tree because it increases

13:54the risk of overfitting and a small tree, if you go for a small one also it will not

14:02capture all the important features of a data set.

14:06So if you go for a large one it will lead to overfitting, if you go for a small one

14:11it will lead to not capturing of all the required informations of a data set.

14:17Therefore a technique that decreases the size of the learning tree without reducing the

14:22accuracy ok, it reduces the size of the learning tree without reducing the accuracy is called

14:30as pruning ok.

14:32Now for this we have two types cost complexity pruning and reduced error pruning, this I

14:38am not going into detail this is just for you to understand what is pruning technique

14:42here.

14:44Now the implementation, the implementation is also in the same process we are using data

14:48pre-processing step here, we are trying to find the best decision tree for the training

14:54set, we are predicting it, we are we are taking the training set and we are predicting the

15:00test results and for the accuracy again we are using the creation of confusion matrix

15:05here and finally we are visualizing the test result.

15:10Now advantages, it is simple to understand as it follows the same process which a human

15:16follow while making any decisions in real life, it can be very useful for solving decision

15:23related problems, it helps to think about all the possible outcomes for a problem and

15:28there is very less requirement of data cleaning comparing the other algorithms.

15:35So this is the advantage, disadvantage here is it has lot of layers ok and it might make

15:42the decision tree complex, it may have an overfitting issue which can be resolved ok,

15:48if there is an overfitting issue we can use the random forest algorithm, for more class

15:54labels the computation complexity of the decision tree may increase.

15:58So this is the disadvantages here of decision tree.

16:03So in this we complete this section, thank you so much.

Category

Transcript

Recommended