AI_DL_Assignment / 11. Assessing Model Performance /2. Understanding the Confusion Matrix.srt
Prince-1's picture
Add files using upload-large-folder tool
0182da2 verified
1
00:00:00,510 --> 00:00:00,940
Hey.
2
00:00:00,960 --> 00:00:05,520
And welcome back to chapter eleven point one where we go into the confusion matrix and then calculate
3
00:00:05,520 --> 00:00:07,180
precision and recall.
4
00:00:07,190 --> 00:00:08,530
So let's get started.
5
00:00:08,880 --> 00:00:15,480
So before I dive into Python that book and we actually would you an example using sikat confusion matrix
6
00:00:15,480 --> 00:00:16,240
function.
7
00:00:16,530 --> 00:00:19,310
Let's take a look at what the confusion matrix actually looks like.
8
00:00:19,350 --> 00:00:22,700
Now typically which I shown you before.
9
00:00:22,980 --> 00:00:29,490
You have basically true true positives true negatives false positives and false negatives.
10
00:00:29,610 --> 00:00:31,510
And that was in a binary class analysis.
11
00:00:31,550 --> 00:00:37,400
Now I'm going to go a step further and make you understand this concept using a multiclass problem and
12
00:00:37,400 --> 00:00:41,010
we're using actual real world results for amnesty the set.
13
00:00:41,250 --> 00:00:45,410
So this big worry kind of looks like this big matrix looks a bit strange.
14
00:00:45,450 --> 00:00:48,720
However you do pick up initially that is a pattern right here.
15
00:00:48,900 --> 00:00:50,170
That is a diagonal way.
16
00:00:50,190 --> 00:00:53,470
These are very large numbers and then there are small numbers on the outskirts.
17
00:00:53,760 --> 00:00:55,740
What do these numbers actually mean.
18
00:00:56,250 --> 00:00:57,770
So let's take a look now.
19
00:00:58,230 --> 00:00:59,110
I've made it simple.
20
00:00:59,130 --> 00:01:00,820
We know we're looking at amnesty to set.
21
00:01:00,840 --> 00:01:04,210
So we have 10 classes that's 0 2 9 9.
22
00:01:04,290 --> 00:01:05,100
Likewise here.
23
00:01:05,190 --> 00:01:08,320
And what this call them here is the predicted value.
24
00:01:08,340 --> 00:01:15,230
So these these numbers here mean that classify a predicted zero and true value was actually zero.
25
00:01:15,240 --> 00:01:17,180
So that's why it is a big number here.
26
00:01:17,440 --> 00:01:20,170
So to one shoot value was actually 1.
27
00:01:20,370 --> 00:01:23,810
So having large numbers in this diagonal is a good thing.
28
00:01:24,000 --> 00:01:29,550
Having large numbers outside of his batting so we can see we have some large numbers here.
29
00:01:29,550 --> 00:01:30,360
We have an 11.
30
00:01:30,360 --> 00:01:32,940
We have six to five here.
31
00:01:33,220 --> 00:01:36,950
So now let's take a look at these numbers so I've highlighted them here.
32
00:01:37,220 --> 00:01:38,790
So what do they actually mean.
33
00:01:38,790 --> 00:01:44,350
It means that our classify predicted to when it was actually 7.
34
00:01:44,460 --> 00:01:49,920
So classify as confusing systems with two is it seeing a seven but it's classifying it as a two which
35
00:01:49,920 --> 00:01:50,660
is wrong.
36
00:01:50,940 --> 00:01:56,070
And likewise for sixes and zeroes you know nines and fours here.
37
00:01:56,970 --> 00:01:58,680
Sorry this one here.
38
00:01:59,120 --> 00:02:00,990
And it's as well.
39
00:02:00,990 --> 00:02:08,550
So what I mean is that let's look at nines and fours classify a predicted four but five times it was
40
00:02:08,550 --> 00:02:10,160
actually a 9.
41
00:02:10,260 --> 00:02:11,660
So we can see the biggest problem.
42
00:02:11,670 --> 00:02:17,400
Oh emulous classify as facing is confusing to his and sevens which is actually a real problem for a
43
00:02:17,400 --> 00:02:18,630
lot of humans.
44
00:02:18,660 --> 00:02:20,070
My handwriting isn't very good.
45
00:02:20,070 --> 00:02:21,690
I'll be the first one to admit that.
46
00:02:21,810 --> 00:02:25,700
And lots of times I'm looking at numbers I write and I'm like Is that a 2.
47
00:02:25,700 --> 00:02:26,550
Was it a 7.
48
00:02:26,550 --> 00:02:34,050
So we can see all calcify sort of leaning generally like a human with lean and interpreting all misinterpreting
49
00:02:34,080 --> 00:02:36,840
results just like a human like ourselves would.
50
00:02:37,080 --> 00:02:41,860
So let's actually work out or recall value based on this real world data.
51
00:02:42,210 --> 00:02:43,770
So let's take a look at the number seven.
52
00:02:43,860 --> 00:02:44,350
All right.
53
00:02:44,550 --> 00:02:47,040
So this is the true classes for number 7 here.
54
00:02:47,040 --> 00:02:49,220
So we saw a classified got it right.
55
00:02:49,410 --> 00:02:52,800
One thousand and ten times that's true positives here.
56
00:02:53,130 --> 00:02:55,020
So how do we get the number of false negatives.
57
00:02:55,020 --> 00:02:56,970
Now for times number of false negatives.
58
00:02:56,970 --> 00:03:03,330
Basically how many times are classified I predicted a number would have been a number that was supposed
59
00:03:03,330 --> 00:03:04,250
to be a 7.
60
00:03:04,500 --> 00:03:06,530
So you can see all these all these crimes here.
61
00:03:06,570 --> 00:03:11,730
Numbers were supposed to be a seven but it was actually predicted to be indifferent to us like 0 1 2
62
00:03:11,730 --> 00:03:12,960
especially.
63
00:03:12,960 --> 00:03:14,700
So let's sum this up.
64
00:03:14,700 --> 00:03:18,980
This rule here everything here except to tell then we'll give you 18.
65
00:03:19,230 --> 00:03:21,090
And that's exactly what we calculate here.
66
00:03:21,100 --> 00:03:25,310
And we get ninety eight point to four percent and that's our recall.
67
00:03:25,370 --> 00:03:29,810
And now let's move on to precision.
68
00:03:29,830 --> 00:03:34,660
So looking at precision we know it's number of correct predictions over how many occurrences of that
69
00:03:34,660 --> 00:03:36,650
class when the test data set.
70
00:03:36,730 --> 00:03:38,770
That's another way of seeing what this is here.
71
00:03:38,980 --> 00:03:43,630
True positives over true positives Plus are false positives.
72
00:03:43,630 --> 00:03:45,230
So again let's look at number seven.
73
00:03:45,310 --> 00:03:45,870
OK.
74
00:03:46,030 --> 00:03:47,240
So now we go the other way.
75
00:03:47,290 --> 00:03:48,900
This is interesting.
76
00:03:48,970 --> 00:03:51,610
So are true positives again.
77
00:03:51,840 --> 00:03:52,980
It isn't 10.
78
00:03:53,320 --> 00:03:57,400
And what about false positives or false positives here.
79
00:03:57,400 --> 00:04:05,610
Basically all this time the classify was predicting something to be a 7 when it was actually 0 to a
80
00:04:05,650 --> 00:04:07,080
tree above or below.
81
00:04:07,450 --> 00:04:14,620
So those are the false positives for sevenths and we just sum this up everything here gives us a thousand
82
00:04:14,620 --> 00:04:15,240
seventeen.
83
00:04:15,300 --> 00:04:18,060
One two tree and then three plus four.
84
00:04:18,060 --> 00:04:25,560
It seems that that's exactly how we get ninety nine point one percent so basically we don't actually
85
00:04:25,560 --> 00:04:26,710
have to do it manually.
86
00:04:26,910 --> 00:04:32,370
So I could learn actually does it generate generates a report for us automatically that gives us record
87
00:04:32,370 --> 00:04:35,000
precision F1 and support.
88
00:04:35,010 --> 00:04:37,390
I think you remember and you guys know what one is.
89
00:04:37,560 --> 00:04:41,040
I haven't actually dealt with support but I'll talk about it now in the next slide.
90
00:04:41,340 --> 00:04:46,140
But you can see we have precision we have recall and we have all one score here.
91
00:04:46,190 --> 00:04:47,350
All right.
92
00:04:47,640 --> 00:04:49,500
Now actually I can talk about support right now.
93
00:04:49,500 --> 00:04:50,820
It's actually quite easy.
94
00:04:50,850 --> 00:04:53,180
Support is basically you see the numbers here.
95
00:04:53,460 --> 00:04:54,530
Ten twenty eight.
96
00:04:54,540 --> 00:04:55,840
What was that.
97
00:04:55,980 --> 00:05:01,920
Go back to it here 10 20 it is basically true positives plus false negatives here.
98
00:05:02,220 --> 00:05:05,430
So support just gives us that some here in the column.
99
00:05:05,430 --> 00:05:09,630
So if you look at column 0 Let's go back to it.
100
00:05:10,080 --> 00:05:12,760
Column zero would be everything here.
101
00:05:13,760 --> 00:05:14,130
Sorry.
102
00:05:14,170 --> 00:05:14,920
Everything in disarray.
103
00:05:14,980 --> 00:05:15,970
Added up here.
104
00:05:15,970 --> 00:05:20,200
So this is 977 plus 1 2 3 980.
105
00:05:20,200 --> 00:05:23,550
And that gives us our support here and support.
106
00:05:23,550 --> 00:05:24,490
It basically is useful.
107
00:05:24,490 --> 00:05:31,540
How many times of classify is actually basically missing data essentially missing classifications because
108
00:05:31,630 --> 00:05:39,560
think about it intuitively we have nine hundred and eighty zeroes represented here in US in this report.
109
00:05:39,560 --> 00:05:46,350
All right now what's what is telling us here is that we basically had a that's how much is zero.
110
00:05:46,390 --> 00:05:48,790
How many zeros were in our report.
111
00:05:48,790 --> 00:05:53,290
So now we can actually use that as a basis to can gauge what is happening here.
112
00:05:53,320 --> 00:05:57,650
So you can also see if any class imbalances in this are in the data here.
113
00:05:57,950 --> 00:06:03,430
So what about this and imbalances are essentially which you can usually check before you even reach
114
00:06:03,430 --> 00:06:03,870
this point.
115
00:06:03,880 --> 00:06:06,810
You can easily check it when you have your test entering data.
116
00:06:07,150 --> 00:06:15,520
Just check to see how many quantities it seems are the conchs I should see of data of each class of
117
00:06:15,520 --> 00:06:15,780
data.
118
00:06:15,820 --> 00:06:20,040
You have no data set.
119
00:06:20,060 --> 00:06:23,020
So how do we analyze overclassification report.
120
00:06:23,030 --> 00:06:29,830
So basically we can just quickly interpret something here high record with low precision that is bad.
121
00:06:30,080 --> 00:06:35,700
And let me tell you what this tells us that most of the positive examples are being correctly recognize.
122
00:06:35,840 --> 00:06:40,350
That means a lot of false negatives but there are a lot of false positives.
123
00:06:40,670 --> 00:06:43,330
And that means a lot of the other classes have been predicted.
124
00:06:43,400 --> 00:06:51,450
As a class in question and Alternatively we can have lower recall with high precision.
125
00:06:51,720 --> 00:06:52,630
What does that mean.
126
00:06:52,650 --> 00:06:58,640
It means are classified as missing a lot of the positive examples has a high false negative rate but
127
00:06:58,720 --> 00:07:03,760
do as I predict as positive on the positive side of false positives.
128
00:07:04,110 --> 00:07:10,570
So we can use our prosecution classification report to sort of gauge what's actually happening.
129
00:07:10,710 --> 00:07:13,220
In this example everything looks pretty good.
130
00:07:13,250 --> 00:07:13,740
All right.
131
00:07:13,920 --> 00:07:18,720
But later on you look at some examples where we generate these reports and you can actually analyze
132
00:07:18,720 --> 00:07:22,350
and figure out which class overclassify is having trouble with.
133
00:07:24,110 --> 00:07:28,360
So let's quickly take a look at the code to generate this confusion matrix.
134
00:07:28,360 --> 00:07:29,000
All right.
135
00:07:29,300 --> 00:07:35,760
And later on we're likely to see him could look at misclassified data for now this is the generic administrating
136
00:07:36,130 --> 00:07:36,470
code.
137
00:07:36,500 --> 00:07:40,480
You've seen a few times before it's using a lot of my examples.
138
00:07:40,910 --> 00:07:43,700
There's one thing I wanted to show you here in this file.
139
00:07:43,790 --> 00:07:50,720
Basically when we see of we still history here some people have asked me How can I can I see if my history
140
00:07:50,720 --> 00:07:51,200
file.
141
00:07:51,500 --> 00:07:56,900
And look at it again because I've spent like hours or maybe a week or days training a classifier and
142
00:07:56,900 --> 00:08:01,090
I want to actually see if the plots were not seeing the image when I should see the file.
143
00:08:01,430 --> 00:08:04,330
Yes you can you can use pite one function called pickle.
144
00:08:04,700 --> 00:08:09,980
And basically what it does here just stores a file as a pickled file pickle file is basically an array
145
00:08:09,980 --> 00:08:12,360
of data as a method of storage.
146
00:08:12,410 --> 00:08:16,920
I'm not going to get into the detail now but just know it's a way we can store files.
147
00:08:17,570 --> 00:08:22,910
So what we do here we just pick allowed to create a file we've been it's basically us telling us we're
148
00:08:22,910 --> 00:08:27,260
going to create this pickle file and then we just dump this file.
149
00:08:27,260 --> 00:08:28,880
This is some history to history file.
150
00:08:28,880 --> 00:08:33,240
We want to save and then close the file and it's done.
151
00:08:33,620 --> 00:08:39,170
And similarly if we want to look at this file because it's simply just loaded back here and here we
152
00:08:39,170 --> 00:08:39,390
go.
153
00:08:39,420 --> 00:08:44,360
Well we have all just for one epoch there was more than one book it would look a lot bigger.
154
00:08:44,690 --> 00:08:47,540
But basically it's a dictionary file or just on file.
155
00:08:47,540 --> 00:08:50,110
For those of you who come from a javascript background.
156
00:08:50,810 --> 00:08:52,760
And basically this is how it looks.
157
00:08:52,760 --> 00:08:57,460
We have a loss accuracy validation accuracy of Allision loss and values for it here.
158
00:08:58,500 --> 00:09:01,380
As a key these are the keys and these are the values.
159
00:09:01,380 --> 00:09:08,370
So now we can get some plots but these plots obviously for one epoch are pretty much not fun to look
160
00:09:08,370 --> 00:09:08,730
at.
161
00:09:08,760 --> 00:09:15,040
Just one point here and an accuracy chart one point here and one point here that one can actually see
162
00:09:15,040 --> 00:09:17,240
it's really quite good.
163
00:09:17,820 --> 00:09:19,050
This is what I wanted to show you.
164
00:09:19,230 --> 00:09:22,560
This here is a good fusion matrix at ossification report.
165
00:09:22,560 --> 00:09:25,400
So we import from Escaflowne metrics.
166
00:09:25,560 --> 00:09:27,330
Both of these functions here.
167
00:09:27,600 --> 00:09:30,850
And basically what we do we just get our predictions here.
168
00:09:30,960 --> 00:09:38,010
So we run ex-s accesses or test data or validation data through our models are pretty classes and basically
169
00:09:38,010 --> 00:09:43,870
we just print out the classification or report classification report just takes two arguments.
170
00:09:43,950 --> 00:09:49,490
We tested the X test labels here to White US labels here and predictions.
171
00:09:49,500 --> 00:09:54,640
So basically what we're doing here we're comparing labels to labels.
172
00:09:54,870 --> 00:10:00,570
And the reason we have to use the max function is because our labels before will not want encoded.
173
00:10:00,630 --> 00:10:06,420
So it's not like for like we actually have to knock reconvicted back into basically a one for one type
174
00:10:06,420 --> 00:10:07,500
matching.
175
00:10:07,530 --> 00:10:11,010
So this is basically what our protection matrix give us.
176
00:10:11,130 --> 00:10:16,500
And basically this is what the classification of what looks like which we saw in our slides.
177
00:10:16,840 --> 00:10:17,890
Are It averages here.
178
00:10:17,910 --> 00:10:23,310
They don't really tell us that much closer but the same may be far more interesting datasets and of
179
00:10:23,300 --> 00:10:28,560
course those values would differ and confusion matrix is done here.
180
00:10:28,980 --> 00:10:33,720
Basically the same thing same exact arguments as above and we get it here.
181
00:10:34,350 --> 00:10:38,940
So that's it for confusion metrics and our misclassification.