AI_DL_Assignment / 11. Assessing Model Performance /2. Understanding the Confusion Matrix.srt

Add files using upload-large-folder tool

0182da2 verified 3 months ago

16.9 kB

	1
	00:00:00,510 --> 00:00:00,940
	Hey.

	2
	00:00:00,960 --> 00:00:05,520
	And welcome back to chapter eleven point one where we go into the confusion matrix and then calculate

	3
	00:00:05,520 --> 00:00:07,180
	precision and recall.

	4
	00:00:07,190 --> 00:00:08,530
	So let's get started.

	5
	00:00:08,880 --> 00:00:15,480
	So before I dive into Python that book and we actually would you an example using sikat confusion matrix

	6
	00:00:15,480 --> 00:00:16,240
	function.

	7
	00:00:16,530 --> 00:00:19,310
	Let's take a look at what the confusion matrix actually looks like.

	8
	00:00:19,350 --> 00:00:22,700
	Now typically which I shown you before.

	9
	00:00:22,980 --> 00:00:29,490
	You have basically true true positives true negatives false positives and false negatives.

	10
	00:00:29,610 --> 00:00:31,510
	And that was in a binary class analysis.

	11
	00:00:31,550 --> 00:00:37,400
	Now I'm going to go a step further and make you understand this concept using a multiclass problem and

	12
	00:00:37,400 --> 00:00:41,010
	we're using actual real world results for amnesty the set.

	13
	00:00:41,250 --> 00:00:45,410
	So this big worry kind of looks like this big matrix looks a bit strange.

	14
	00:00:45,450 --> 00:00:48,720
	However you do pick up initially that is a pattern right here.

	15
	00:00:48,900 --> 00:00:50,170
	That is a diagonal way.

	16
	00:00:50,190 --> 00:00:53,470
	These are very large numbers and then there are small numbers on the outskirts.

	17
	00:00:53,760 --> 00:00:55,740
	What do these numbers actually mean.

	18
	00:00:56,250 --> 00:00:57,770
	So let's take a look now.

	19
	00:00:58,230 --> 00:00:59,110
	I've made it simple.

	20
	00:00:59,130 --> 00:01:00,820
	We know we're looking at amnesty to set.

	21
	00:01:00,840 --> 00:01:04,210
	So we have 10 classes that's 0 2 9 9.

	22
	00:01:04,290 --> 00:01:05,100
	Likewise here.

	23
	00:01:05,190 --> 00:01:08,320
	And what this call them here is the predicted value.

	24
	00:01:08,340 --> 00:01:15,230
	So these these numbers here mean that classify a predicted zero and true value was actually zero.

	25
	00:01:15,240 --> 00:01:17,180
	So that's why it is a big number here.

	26
	00:01:17,440 --> 00:01:20,170
	So to one shoot value was actually 1.

	27
	00:01:20,370 --> 00:01:23,810
	So having large numbers in this diagonal is a good thing.

	28
	00:01:24,000 --> 00:01:29,550
	Having large numbers outside of his batting so we can see we have some large numbers here.

	29
	00:01:29,550 --> 00:01:30,360
	We have an 11.

	30
	00:01:30,360 --> 00:01:32,940
	We have six to five here.

	31
	00:01:33,220 --> 00:01:36,950
	So now let's take a look at these numbers so I've highlighted them here.

	32
	00:01:37,220 --> 00:01:38,790
	So what do they actually mean.

	33
	00:01:38,790 --> 00:01:44,350
	It means that our classify predicted to when it was actually 7.

	34
	00:01:44,460 --> 00:01:49,920
	So classify as confusing systems with two is it seeing a seven but it's classifying it as a two which

	35
	00:01:49,920 --> 00:01:50,660
	is wrong.

	36
	00:01:50,940 --> 00:01:56,070
	And likewise for sixes and zeroes you know nines and fours here.

	37
	00:01:56,970 --> 00:01:58,680
	Sorry this one here.

	38
	00:01:59,120 --> 00:02:00,990
	And it's as well.

	39
	00:02:00,990 --> 00:02:08,550
	So what I mean is that let's look at nines and fours classify a predicted four but five times it was

	40
	00:02:08,550 --> 00:02:10,160
	actually a 9.

	41
	00:02:10,260 --> 00:02:11,660
	So we can see the biggest problem.

	42
	00:02:11,670 --> 00:02:17,400
	Oh emulous classify as facing is confusing to his and sevens which is actually a real problem for a

	43
	00:02:17,400 --> 00:02:18,630
	lot of humans.

	44
	00:02:18,660 --> 00:02:20,070
	My handwriting isn't very good.

	45
	00:02:20,070 --> 00:02:21,690
	I'll be the first one to admit that.

	46
	00:02:21,810 --> 00:02:25,700
	And lots of times I'm looking at numbers I write and I'm like Is that a 2.

	47
	00:02:25,700 --> 00:02:26,550
	Was it a 7.

	48
	00:02:26,550 --> 00:02:34,050
	So we can see all calcify sort of leaning generally like a human with lean and interpreting all misinterpreting

	49
	00:02:34,080 --> 00:02:36,840
	results just like a human like ourselves would.

	50
	00:02:37,080 --> 00:02:41,860
	So let's actually work out or recall value based on this real world data.

	51
	00:02:42,210 --> 00:02:43,770
	So let's take a look at the number seven.

	52
	00:02:43,860 --> 00:02:44,350
	All right.

	53
	00:02:44,550 --> 00:02:47,040
	So this is the true classes for number 7 here.

	54
	00:02:47,040 --> 00:02:49,220
	So we saw a classified got it right.

	55
	00:02:49,410 --> 00:02:52,800
	One thousand and ten times that's true positives here.

	56
	00:02:53,130 --> 00:02:55,020
	So how do we get the number of false negatives.

	57
	00:02:55,020 --> 00:02:56,970
	Now for times number of false negatives.

	58
	00:02:56,970 --> 00:03:03,330
	Basically how many times are classified I predicted a number would have been a number that was supposed

	59
	00:03:03,330 --> 00:03:04,250
	to be a 7.

	60
	00:03:04,500 --> 00:03:06,530
	So you can see all these all these crimes here.

	61
	00:03:06,570 --> 00:03:11,730
	Numbers were supposed to be a seven but it was actually predicted to be indifferent to us like 0 1 2

	62
	00:03:11,730 --> 00:03:12,960
	especially.

	63
	00:03:12,960 --> 00:03:14,700
	So let's sum this up.

	64
	00:03:14,700 --> 00:03:18,980
	This rule here everything here except to tell then we'll give you 18.

	65
	00:03:19,230 --> 00:03:21,090
	And that's exactly what we calculate here.

	66
	00:03:21,100 --> 00:03:25,310
	And we get ninety eight point to four percent and that's our recall.

	67
	00:03:25,370 --> 00:03:29,810
	And now let's move on to precision.

	68
	00:03:29,830 --> 00:03:34,660
	So looking at precision we know it's number of correct predictions over how many occurrences of that

	69
	00:03:34,660 --> 00:03:36,650
	class when the test data set.

	70
	00:03:36,730 --> 00:03:38,770
	That's another way of seeing what this is here.

	71
	00:03:38,980 --> 00:03:43,630
	True positives over true positives Plus are false positives.

	72
	00:03:43,630 --> 00:03:45,230
	So again let's look at number seven.

	73
	00:03:45,310 --> 00:03:45,870
	OK.

	74
	00:03:46,030 --> 00:03:47,240
	So now we go the other way.

	75
	00:03:47,290 --> 00:03:48,900
	This is interesting.

	76
	00:03:48,970 --> 00:03:51,610
	So are true positives again.

	77
	00:03:51,840 --> 00:03:52,980
	It isn't 10.

	78
	00:03:53,320 --> 00:03:57,400
	And what about false positives or false positives here.

	79
	00:03:57,400 --> 00:04:05,610
	Basically all this time the classify was predicting something to be a 7 when it was actually 0 to a

	80
	00:04:05,650 --> 00:04:07,080
	tree above or below.

	81
	00:04:07,450 --> 00:04:14,620
	So those are the false positives for sevenths and we just sum this up everything here gives us a thousand

	82
	00:04:14,620 --> 00:04:15,240
	seventeen.

	83
	00:04:15,300 --> 00:04:18,060
	One two tree and then three plus four.

	84
	00:04:18,060 --> 00:04:25,560
	It seems that that's exactly how we get ninety nine point one percent so basically we don't actually

	85
	00:04:25,560 --> 00:04:26,710
	have to do it manually.

	86
	00:04:26,910 --> 00:04:32,370
	So I could learn actually does it generate generates a report for us automatically that gives us record

	87
	00:04:32,370 --> 00:04:35,000
	precision F1 and support.

	88
	00:04:35,010 --> 00:04:37,390
	I think you remember and you guys know what one is.

	89
	00:04:37,560 --> 00:04:41,040
	I haven't actually dealt with support but I'll talk about it now in the next slide.

	90
	00:04:41,340 --> 00:04:46,140
	But you can see we have precision we have recall and we have all one score here.

	91
	00:04:46,190 --> 00:04:47,350
	All right.

	92
	00:04:47,640 --> 00:04:49,500
	Now actually I can talk about support right now.

	93
	00:04:49,500 --> 00:04:50,820
	It's actually quite easy.

	94
	00:04:50,850 --> 00:04:53,180
	Support is basically you see the numbers here.

	95
	00:04:53,460 --> 00:04:54,530
	Ten twenty eight.

	96
	00:04:54,540 --> 00:04:55,840
	What was that.

	97
	00:04:55,980 --> 00:05:01,920
	Go back to it here 10 20 it is basically true positives plus false negatives here.

	98
	00:05:02,220 --> 00:05:05,430
	So support just gives us that some here in the column.

	99
	00:05:05,430 --> 00:05:09,630
	So if you look at column 0 Let's go back to it.

	100
	00:05:10,080 --> 00:05:12,760
	Column zero would be everything here.

	101
	00:05:13,760 --> 00:05:14,130
	Sorry.

	102
	00:05:14,170 --> 00:05:14,920
	Everything in disarray.

	103
	00:05:14,980 --> 00:05:15,970
	Added up here.

	104
	00:05:15,970 --> 00:05:20,200
	So this is 977 plus 1 2 3 980.

	105
	00:05:20,200 --> 00:05:23,550
	And that gives us our support here and support.

	106
	00:05:23,550 --> 00:05:24,490
	It basically is useful.

	107
	00:05:24,490 --> 00:05:31,540
	How many times of classify is actually basically missing data essentially missing classifications because

	108
	00:05:31,630 --> 00:05:39,560
	think about it intuitively we have nine hundred and eighty zeroes represented here in US in this report.

	109
	00:05:39,560 --> 00:05:46,350
	All right now what's what is telling us here is that we basically had a that's how much is zero.

	110
	00:05:46,390 --> 00:05:48,790
	How many zeros were in our report.

	111
	00:05:48,790 --> 00:05:53,290
	So now we can actually use that as a basis to can gauge what is happening here.

	112
	00:05:53,320 --> 00:05:57,650
	So you can also see if any class imbalances in this are in the data here.

	113
	00:05:57,950 --> 00:06:03,430
	So what about this and imbalances are essentially which you can usually check before you even reach

	114
	00:06:03,430 --> 00:06:03,870
	this point.

	115
	00:06:03,880 --> 00:06:06,810
	You can easily check it when you have your test entering data.

	116
	00:06:07,150 --> 00:06:15,520
	Just check to see how many quantities it seems are the conchs I should see of data of each class of

	117
	00:06:15,520 --> 00:06:15,780
	data.

	118
	00:06:15,820 --> 00:06:20,040
	You have no data set.

	119
	00:06:20,060 --> 00:06:23,020
	So how do we analyze overclassification report.

	120
	00:06:23,030 --> 00:06:29,830
	So basically we can just quickly interpret something here high record with low precision that is bad.

	121
	00:06:30,080 --> 00:06:35,700
	And let me tell you what this tells us that most of the positive examples are being correctly recognize.

	122
	00:06:35,840 --> 00:06:40,350
	That means a lot of false negatives but there are a lot of false positives.

	123
	00:06:40,670 --> 00:06:43,330
	And that means a lot of the other classes have been predicted.

	124
	00:06:43,400 --> 00:06:51,450
	As a class in question and Alternatively we can have lower recall with high precision.

	125
	00:06:51,720 --> 00:06:52,630
	What does that mean.

	126
	00:06:52,650 --> 00:06:58,640
	It means are classified as missing a lot of the positive examples has a high false negative rate but

	127
	00:06:58,720 --> 00:07:03,760
	do as I predict as positive on the positive side of false positives.

	128
	00:07:04,110 --> 00:07:10,570
	So we can use our prosecution classification report to sort of gauge what's actually happening.

	129
	00:07:10,710 --> 00:07:13,220
	In this example everything looks pretty good.

	130
	00:07:13,250 --> 00:07:13,740
	All right.

	131
	00:07:13,920 --> 00:07:18,720
	But later on you look at some examples where we generate these reports and you can actually analyze

	132
	00:07:18,720 --> 00:07:22,350
	and figure out which class overclassify is having trouble with.

	133
	00:07:24,110 --> 00:07:28,360
	So let's quickly take a look at the code to generate this confusion matrix.

	134
	00:07:28,360 --> 00:07:29,000
	All right.

	135
	00:07:29,300 --> 00:07:35,760
	And later on we're likely to see him could look at misclassified data for now this is the generic administrating

	136
	00:07:36,130 --> 00:07:36,470
	code.

	137
	00:07:36,500 --> 00:07:40,480
	You've seen a few times before it's using a lot of my examples.

	138
	00:07:40,910 --> 00:07:43,700
	There's one thing I wanted to show you here in this file.

	139
	00:07:43,790 --> 00:07:50,720
	Basically when we see of we still history here some people have asked me How can I can I see if my history

	140
	00:07:50,720 --> 00:07:51,200
	file.

	141
	00:07:51,500 --> 00:07:56,900
	And look at it again because I've spent like hours or maybe a week or days training a classifier and

	142
	00:07:56,900 --> 00:08:01,090
	I want to actually see if the plots were not seeing the image when I should see the file.

	143
	00:08:01,430 --> 00:08:04,330
	Yes you can you can use pite one function called pickle.

	144
	00:08:04,700 --> 00:08:09,980
	And basically what it does here just stores a file as a pickled file pickle file is basically an array

	145
	00:08:09,980 --> 00:08:12,360
	of data as a method of storage.

	146
	00:08:12,410 --> 00:08:16,920
	I'm not going to get into the detail now but just know it's a way we can store files.

	147
	00:08:17,570 --> 00:08:22,910
	So what we do here we just pick allowed to create a file we've been it's basically us telling us we're

	148
	00:08:22,910 --> 00:08:27,260
	going to create this pickle file and then we just dump this file.

	149
	00:08:27,260 --> 00:08:28,880
	This is some history to history file.

	150
	00:08:28,880 --> 00:08:33,240
	We want to save and then close the file and it's done.

	151
	00:08:33,620 --> 00:08:39,170
	And similarly if we want to look at this file because it's simply just loaded back here and here we

	152
	00:08:39,170 --> 00:08:39,390
	go.

	153
	00:08:39,420 --> 00:08:44,360
	Well we have all just for one epoch there was more than one book it would look a lot bigger.

	154
	00:08:44,690 --> 00:08:47,540
	But basically it's a dictionary file or just on file.

	155
	00:08:47,540 --> 00:08:50,110
	For those of you who come from a javascript background.

	156
	00:08:50,810 --> 00:08:52,760
	And basically this is how it looks.

	157
	00:08:52,760 --> 00:08:57,460
	We have a loss accuracy validation accuracy of Allision loss and values for it here.

	158
	00:08:58,500 --> 00:09:01,380
	As a key these are the keys and these are the values.

	159
	00:09:01,380 --> 00:09:08,370
	So now we can get some plots but these plots obviously for one epoch are pretty much not fun to look

	160
	00:09:08,370 --> 00:09:08,730
	at.

	161
	00:09:08,760 --> 00:09:15,040
	Just one point here and an accuracy chart one point here and one point here that one can actually see

	162
	00:09:15,040 --> 00:09:17,240
	it's really quite good.

	163
	00:09:17,820 --> 00:09:19,050
	This is what I wanted to show you.

	164
	00:09:19,230 --> 00:09:22,560
	This here is a good fusion matrix at ossification report.

	165
	00:09:22,560 --> 00:09:25,400
	So we import from Escaflowne metrics.

	166
	00:09:25,560 --> 00:09:27,330
	Both of these functions here.

	167
	00:09:27,600 --> 00:09:30,850
	And basically what we do we just get our predictions here.

	168
	00:09:30,960 --> 00:09:38,010
	So we run ex-s accesses or test data or validation data through our models are pretty classes and basically

	169
	00:09:38,010 --> 00:09:43,870
	we just print out the classification or report classification report just takes two arguments.

	170
	00:09:43,950 --> 00:09:49,490
	We tested the X test labels here to White US labels here and predictions.

	171
	00:09:49,500 --> 00:09:54,640
	So basically what we're doing here we're comparing labels to labels.

	172
	00:09:54,870 --> 00:10:00,570
	And the reason we have to use the max function is because our labels before will not want encoded.

	173
	00:10:00,630 --> 00:10:06,420
	So it's not like for like we actually have to knock reconvicted back into basically a one for one type

	174
	00:10:06,420 --> 00:10:07,500
	matching.

	175
	00:10:07,530 --> 00:10:11,010
	So this is basically what our protection matrix give us.

	176
	00:10:11,130 --> 00:10:16,500
	And basically this is what the classification of what looks like which we saw in our slides.

	177
	00:10:16,840 --> 00:10:17,890
	Are It averages here.

	178
	00:10:17,910 --> 00:10:23,310
	They don't really tell us that much closer but the same may be far more interesting datasets and of

	179
	00:10:23,300 --> 00:10:28,560
	course those values would differ and confusion matrix is done here.

	180
	00:10:28,980 --> 00:10:33,720
	Basically the same thing same exact arguments as above and we get it here.

	181
	00:10:34,350 --> 00:10:38,940
	So that's it for confusion metrics and our misclassification.