AI_DL_Assignment / 6. Neural Networks Explained /9. Regularization, Overfitting, Generalization and Test Datasets.srt
| 1 | |
| 00:00:00,390 --> 00:00:00,710 | |
| OK. | |
| 2 | |
| 00:00:00,750 --> 00:00:02,420 | |
| So welcome to Section Six point. | |
| 3 | |
| 00:00:02,550 --> 00:00:05,970 | |
| Actually just corrected it from the last section I had a six point seven here. | |
| 4 | |
| 00:00:06,060 --> 00:00:06,910 | |
| My bad. | |
| 5 | |
| 00:00:07,320 --> 00:00:11,500 | |
| So this section deals with a regularisation which was a very important concept. | |
| 6 | |
| 00:00:11,520 --> 00:00:16,740 | |
| Also you're going to understand what overfitting is and why it's bad and why we need to have a model. | |
| 7 | |
| 00:00:16,920 --> 00:00:21,090 | |
| Generalize well and you understand basically what a tested assets. | |
| 8 | |
| 00:00:21,090 --> 00:00:25,110 | |
| I think I mentioned it previously but I'll go into it a bit more here. | |
| 9 | |
| 00:00:25,500 --> 00:00:30,920 | |
| Effectively we want to know how or what when and how our trade model becomes good. | |
| 10 | |
| 00:00:32,310 --> 00:00:33,920 | |
| So what makes a good model. | |
| 11 | |
| 00:00:33,930 --> 00:00:40,270 | |
| Now this is a very very basic explanation of what makes a good model good model is accurate generalizes | |
| 12 | |
| 00:00:40,350 --> 00:00:44,050 | |
| well and does not overfit would have these kind of I mean the same thing. | |
| 13 | |
| 00:00:44,080 --> 00:00:46,130 | |
| You'll understand that's shortly. | |
| 14 | |
| 00:00:46,290 --> 00:00:52,150 | |
| And I deliberately made a slight vague because accuracy all depends on your domain. | |
| 15 | |
| 00:00:52,160 --> 00:00:55,660 | |
| You're looking at sometimes you won ninety nine point ninety nine percent accuracy. | |
| 16 | |
| 00:00:55,830 --> 00:00:57,050 | |
| Sometimes you can. | |
| 17 | |
| 00:00:57,180 --> 00:00:59,060 | |
| You can be happy with 80 percent accuracy. | |
| 18 | |
| 00:00:59,070 --> 00:01:00,490 | |
| It all depends on the application. | |
| 19 | |
| 00:01:03,020 --> 00:01:05,780 | |
| So let's look at the models here. | |
| 20 | |
| 00:01:05,900 --> 00:01:08,990 | |
| Let's look at these two classes one in green one in blue. | |
| 21 | |
| 00:01:09,410 --> 00:01:17,360 | |
| And this is a model a model B model see the red line here is basically the decision boundary for each | |
| 22 | |
| 00:01:17,540 --> 00:01:19,010 | |
| data set of data here. | |
| 23 | |
| 00:01:19,330 --> 00:01:21,540 | |
| Which model I should say so. | |
| 24 | |
| 00:01:23,280 --> 00:01:29,430 | |
| What's happening here is that how do you know which model intuitively what would you say is a best model | |
| 25 | |
| 00:01:29,430 --> 00:01:29,980 | |
| here. | |
| 26 | |
| 00:01:30,030 --> 00:01:32,630 | |
| Now let's look at Mullaly closely. | |
| 27 | |
| 00:01:32,890 --> 00:01:36,350 | |
| Muddly actually separates all the data accurately. | |
| 28 | |
| 00:01:36,510 --> 00:01:43,890 | |
| It sees a blue ball over here and actually adjust its decision boundary to encapsulate it Model B. | |
| 29 | |
| 00:01:43,890 --> 00:01:49,530 | |
| Basically it doesn't do that model B basically does it nice smooth curve here it doesn't push itself | |
| 30 | |
| 00:01:49,560 --> 00:01:54,480 | |
| all the way out here to capture this blue ball and basically it forms a nice clean decision boundary | |
| 31 | |
| 00:01:54,480 --> 00:01:55,150 | |
| here. | |
| 32 | |
| 00:01:55,260 --> 00:02:01,680 | |
| Model C takes a much more simplistic approach giving you a straight line separating these boundaries | |
| 33 | |
| 00:02:01,680 --> 00:02:02,050 | |
| here. | |
| 34 | |
| 00:02:02,230 --> 00:02:05,790 | |
| Now what would you say is a best model here. | |
| 35 | |
| 00:02:06,150 --> 00:02:13,080 | |
| Now I would say B and I'll tell you why even though B doesn't capture the blue ball here as you can | |
| 36 | |
| 00:02:13,080 --> 00:02:17,580 | |
| see from the nature of this data the blue ball technically is in the Green Zone. | |
| 37 | |
| 00:02:18,050 --> 00:02:18,320 | |
| Yeah. | |
| 38 | |
| 00:02:18,390 --> 00:02:20,310 | |
| So this ball here is an anomaly. | |
| 39 | |
| 00:02:20,310 --> 00:02:21,920 | |
| He's basically an outlier. | |
| 40 | |
| 00:02:22,260 --> 00:02:25,790 | |
| He contends that he may have ended up here from being mislabeled. | |
| 41 | |
| 00:02:25,800 --> 00:02:29,470 | |
| Maybe he was supposed to be agreeable or maybe he just highly unusual. | |
| 42 | |
| 00:02:29,670 --> 00:02:35,430 | |
| And generally we don't want our models to basically cover this blue ball here. | |
| 43 | |
| 00:02:35,640 --> 00:02:43,960 | |
| This is called overfitting and it's bad because in all in most likely case a most likely scenario. | |
| 44 | |
| 00:02:44,080 --> 00:02:45,470 | |
| Green balls are going to be right here. | |
| 45 | |
| 00:02:45,540 --> 00:02:48,090 | |
| So what happens when a green ball is Heyliger unseen. | |
| 46 | |
| 00:02:48,110 --> 00:02:54,030 | |
| GREENE Well in the future where we give it the fetus green ball is x y coordinates that is right here | |
| 47 | |
| 00:02:54,530 --> 00:02:55,860 | |
| into this model. | |
| 48 | |
| 00:02:55,860 --> 00:02:59,630 | |
| This overfitting model is going to label it as blue. | |
| 49 | |
| 00:02:59,730 --> 00:03:06,600 | |
| Unfortunately when in reality if you see it here this nice clean model B boundary it is supposed to | |
| 50 | |
| 00:03:06,600 --> 00:03:09,330 | |
| be green green glass. | |
| 51 | |
| 00:03:09,570 --> 00:03:16,080 | |
| So that's an example of a model that is overfit probably too complicated for its own good as opposed | |
| 52 | |
| 00:03:16,080 --> 00:03:19,610 | |
| to a Model B which generalizes Well model C in it. | |
| 53 | |
| 00:03:19,650 --> 00:03:23,380 | |
| On the other hand is way too general and it's not going to be a good model. | |
| 54 | |
| 00:03:27,480 --> 00:03:32,190 | |
| As I explained before this model fits this model this ideal of balance. | |
| 55 | |
| 00:03:32,190 --> 00:03:33,990 | |
| And this is what it's called unbefitting. | |
| 56 | |
| 00:03:34,110 --> 00:03:35,250 | |
| He doesn't fit the data. | |
| 57 | |
| 00:03:35,430 --> 00:03:38,930 | |
| Well he just gives you a generic boundary and tells you. | |
| 58 | |
| 00:03:39,210 --> 00:03:43,020 | |
| Yeah I tried my best but it's under fitting. | |
| 59 | |
| 00:03:43,100 --> 00:03:47,550 | |
| So let's go into overfitting overfitting is what leads to poor models. | |
| 60 | |
| 00:03:47,570 --> 00:03:54,650 | |
| And that's one of the most common problems faced in machine learning and that's a problem I face continuously | |
| 61 | |
| 00:03:54,650 --> 00:03:57,500 | |
| when training my convolutional neural nets. | |
| 62 | |
| 00:03:57,890 --> 00:04:01,570 | |
| It always happens it always tends to overfit when the training data. | |
| 63 | |
| 00:04:01,940 --> 00:04:04,960 | |
| And we will we will experience that later on in discourse. | |
| 64 | |
| 00:04:05,390 --> 00:04:12,450 | |
| But what it means basically is overfitting is when all muddled fits perfectly still treating it as in | |
| 65 | |
| 00:04:12,710 --> 00:04:19,370 | |
| He has very high accuracy when the data he was trained on maybe even high 90s and then nine point nine | |
| 66 | |
| 00:04:19,370 --> 00:04:20,820 | |
| something. | |
| 67 | |
| 00:04:20,820 --> 00:04:27,740 | |
| However on the test dataset which is the unseen data he is going to be perform poorly because he has | |
| 68 | |
| 00:04:27,740 --> 00:04:33,710 | |
| no basically modeled after detecting data but can't generalize well to data he hasn't seen. | |
| 69 | |
| 00:04:34,190 --> 00:04:39,100 | |
| So what happens when you try to pass and you point to this position here. | |
| 70 | |
| 00:04:39,290 --> 00:04:40,910 | |
| Exactly what I mentioned before. | |
| 71 | |
| 00:04:41,450 --> 00:04:45,290 | |
| It's true colors are supposed to be green but it will be classified as blue. | |
| 72 | |
| 00:04:45,590 --> 00:04:48,020 | |
| Models don't necessarily need to be too complex to be good. | |
| 73 | |
| 00:04:48,030 --> 00:04:54,330 | |
| They need to generalize well so how do you know if you're overfit. | |
| 74 | |
| 00:04:54,820 --> 00:04:59,130 | |
| Well that's why I mentioned testier previously in the beginning of the slide. | |
| 75 | |
| 00:04:59,350 --> 00:05:05,420 | |
| We need to test a model on test data and an all machine learning algorithms and stuff. | |
| 76 | |
| 00:05:05,560 --> 00:05:08,040 | |
| We always use a test data set. | |
| 77 | |
| 00:05:08,560 --> 00:05:15,230 | |
| And basically if we have entire data sets save for 2000 images we take seven hundred and vitrine and | |
| 78 | |
| 00:05:15,250 --> 00:05:20,310 | |
| those 700 labeled images and we reserve tree hundred as tested. | |
| 79 | |
| 00:05:20,440 --> 00:05:27,930 | |
| 200 is critical because it tells us how well or algorithm or model performs on data. | |
| 80 | |
| 00:05:28,030 --> 00:05:29,680 | |
| The model has never seen before. | |
| 81 | |
| 00:05:31,970 --> 00:05:36,120 | |
| This is a very common case of AMSAT overfitting 95 percent plus accuracy. | |
| 82 | |
| 00:05:36,260 --> 00:05:38,690 | |
| But on test data you get like 70 percent. | |
| 83 | |
| 00:05:38,690 --> 00:05:41,500 | |
| It's a perfect example of overfitting. | |
| 84 | |
| 00:05:42,280 --> 00:05:44,260 | |
| So overfitting that graphically. | |
| 85 | |
| 00:05:44,330 --> 00:05:45,810 | |
| Here's what happens. | |
| 86 | |
| 00:05:45,890 --> 00:05:48,390 | |
| Mumblin mentioned ebox No. | |
| 87 | |
| 00:05:48,920 --> 00:05:54,040 | |
| 1 indice in this chapter I discuss it discuss exactly what ebox are. | |
| 88 | |
| 00:05:54,320 --> 00:06:00,750 | |
| It's basically every time we send the full treating the research into our training algorithm it's we. | |
| 89 | |
| 00:06:00,860 --> 00:06:06,680 | |
| We have completed one IPAC and we need to train for maybe hundreds of ebox sometimes to get a good model. | |
| 90 | |
| 00:06:06,770 --> 00:06:07,790 | |
| Usually that's not the case. | |
| 91 | |
| 00:06:07,790 --> 00:06:13,490 | |
| Usually you can get away with treating 22:00 ebox but generally that's what we have to do to get the | |
| 92 | |
| 00:06:13,490 --> 00:06:15,960 | |
| best models. | |
| 93 | |
| 00:06:15,980 --> 00:06:22,120 | |
| So this is an illustration of what overfitting looks like. | |
| 94 | |
| 00:06:22,130 --> 00:06:29,360 | |
| So look at treating loss here in red and accuracy losses going down quite well accuracy is going up | |
| 95 | |
| 00:06:29,360 --> 00:06:30,620 | |
| close to 100 percent. | |
| 96 | |
| 00:06:30,640 --> 00:06:37,370 | |
| The scale of accuracy is and decide and losses and decide and ebox or X and all look at the test the | |
| 97 | |
| 00:06:37,370 --> 00:06:43,750 | |
| test loss between us fluctuates above between 1 to 1.5 and actually goes up in the end. | |
| 98 | |
| 00:06:43,820 --> 00:06:44,970 | |
| It's not good at all. | |
| 99 | |
| 00:06:45,380 --> 00:06:51,700 | |
| And look at our test accuracy he's hovering at abysmal rates of like below 50 percent. | |
| 100 | |
| 00:06:52,190 --> 00:06:57,800 | |
| And while his training data is at 100 percent that is a very this is actually extreme overfitting to | |
| 101 | |
| 00:06:57,800 --> 00:07:00,240 | |
| be fair doesn't actually have to get this bad. | |
| 102 | |
| 00:07:00,320 --> 00:07:02,370 | |
| At least I've never gotten it to be disbarred. | |
| 103 | |
| 00:07:02,510 --> 00:07:05,850 | |
| That is a good example of what we actually see happening in the real world. | |
| 104 | |
| 00:07:06,290 --> 00:07:13,100 | |
| Good training accuracy Porchester accuracy and that's overfitting. | |
| 105 | |
| 00:07:13,120 --> 00:07:15,090 | |
| So how do we avoid overfitting. | |
| 106 | |
| 00:07:15,360 --> 00:07:16,630 | |
| Are many techniques to avoid it. | |
| 107 | |
| 00:07:16,630 --> 00:07:19,030 | |
| And I'll discuss it slowly soon. | |
| 108 | |
| 00:07:19,060 --> 00:07:26,200 | |
| In the slide in this section now of overfitting as a consequence of our we it's always been tuned to | |
| 109 | |
| 00:07:26,200 --> 00:07:32,080 | |
| fit our training data but don't fit over to you and in a way so that they don't perform well on testing. | |
| 110 | |
| 00:07:32,740 --> 00:07:39,310 | |
| And we know we had a decent model just too sensitive to the training data. | |
| 111 | |
| 00:07:39,340 --> 00:07:40,990 | |
| So is there a way to fix this. | |
| 112 | |
| 00:07:41,040 --> 00:07:42,420 | |
| I mean way too sensitive. | |
| 113 | |
| 00:07:42,430 --> 00:07:46,750 | |
| I mean it's just optimized exclusively for treating data | |
| 114 | |
| 00:07:49,660 --> 00:07:55,320 | |
| so we can avoid well-fitting by using a smaller less deep model. | |
| 115 | |
| 00:07:55,820 --> 00:08:01,720 | |
| Deeper models can sometimes find features or interpret noise to be important noise was example of this | |
| 116 | |
| 00:08:01,720 --> 00:08:02,210 | |
| here. | |
| 117 | |
| 00:08:03,000 --> 00:08:08,790 | |
| This clearly wasn't that important but yet a deep model will actually try to figure out and model this | |
| 118 | |
| 00:08:10,810 --> 00:08:11,740 | |
| to do today. | |
| 119 | |
| 00:08:11,830 --> 00:08:17,920 | |
| That's because a deeper ability is deeper that folks have abilities to memorize more complicated features | |
| 120 | |
| 00:08:18,850 --> 00:08:21,580 | |
| and that's called the memorization cup capacity. | |
| 121 | |
| 00:08:24,960 --> 00:08:28,230 | |
| But there's another way and that's called regularisation. | |
| 122 | |
| 00:08:28,470 --> 00:08:35,550 | |
| I don't often recommend I shouldn't recommend using less models to get better more better results because | |
| 123 | |
| 00:08:35,550 --> 00:08:40,440 | |
| there are other ways around that and you want to actually always have a deep enough model to have to | |
| 124 | |
| 00:08:40,440 --> 00:08:42,890 | |
| represent complicated patterns in your data. | |
| 125 | |
| 00:08:43,260 --> 00:08:47,750 | |
| So let's see what techniques we can use to regularize. | |
| 126 | |
| 00:08:47,820 --> 00:08:49,400 | |
| So what is regularisation. | |
| 127 | |
| 00:08:49,400 --> 00:08:53,750 | |
| It's a method of making all model more general to a data set. | |
| 128 | |
| 00:08:53,760 --> 00:08:59,880 | |
| So basically regularisation will take a model that produces a decision boundary like this and sort of | |
| 129 | |
| 00:08:59,880 --> 00:09:02,870 | |
| make it tweak it so that it becomes like this. | |
| 130 | |
| 00:09:02,970 --> 00:09:05,150 | |
| And let's see now it's not actually doing it. | |
| 131 | |
| 00:09:05,150 --> 00:09:07,990 | |
| That's the actual tomb of regularisation. | |
| 132 | |
| 00:09:08,010 --> 00:09:11,240 | |
| We basically want to get a model like this and not like this. | |
| 133 | |
| 00:09:11,280 --> 00:09:13,000 | |
| And let's find out how we do that. | |
| 134 | |
| 00:09:14,750 --> 00:09:18,090 | |
| So there are a few types of regularisation you're actually more in this. | |
| 135 | |
| 00:09:18,110 --> 00:09:19,800 | |
| But these are the basic types. | |
| 136 | |
| 00:09:19,800 --> 00:09:28,010 | |
| There's L1 L2 regularisation cross-validation stopping dropout and data augmentation. | |
| 137 | |
| 00:09:28,040 --> 00:09:31,030 | |
| So let's look at L1 L2 regularisation. | |
| 138 | |
| 00:09:31,070 --> 00:09:37,640 | |
| These are techniques we use to penalize large weights large weights of gradients manifest themselves | |
| 139 | |
| 00:09:37,640 --> 00:09:43,660 | |
| as abrupt changes in our models decision boundary and by penalizing them effectively making them small | |
| 140 | |
| 00:09:44,240 --> 00:09:45,850 | |
| L2 is known as original Russian. | |
| 141 | |
| 00:09:45,870 --> 00:09:47,900 | |
| L What is laso regression. | |
| 142 | |
| 00:09:48,240 --> 00:09:50,570 | |
| No it is a lot more theory behind these things here. | |
| 143 | |
| 00:09:50,720 --> 00:09:54,110 | |
| I'm just basically showing you the formulas of what they actually are. | |
| 144 | |
| 00:09:54,590 --> 00:10:02,060 | |
| So as you can see this is basically what we're L2 is here. | |
| 145 | |
| 00:10:02,200 --> 00:10:03,760 | |
| This is MSCE here. | |
| 146 | |
| 00:10:03,950 --> 00:10:06,690 | |
| But we're actually doing something with it here. | |
| 147 | |
| 00:10:06,830 --> 00:10:08,340 | |
| What are we doing. | |
| 148 | |
| 00:10:08,360 --> 00:10:14,630 | |
| We're playing a constant here of two and some of the weights squared and L-1 is not square. | |
| 149 | |
| 00:10:14,690 --> 00:10:16,940 | |
| It's just absolute value of some of the weight. | |
| 150 | |
| 00:10:17,390 --> 00:10:26,180 | |
| So this promise here controls the penalty we apply via propagation to penalty under which it is applied | |
| 151 | |
| 00:10:26,340 --> 00:10:28,320 | |
| to wait a bit. | |
| 152 | |
| 00:10:28,350 --> 00:10:34,150 | |
| So the differences between them basically is that L-1 brings the widths of unimportant features to zero | |
| 153 | |
| 00:10:34,920 --> 00:10:37,220 | |
| thus acting as a feature selection algorithm. | |
| 154 | |
| 00:10:37,290 --> 00:10:42,720 | |
| You may not know what feature selection is but if you want to know what feature selection is it's basically | |
| 155 | |
| 00:10:43,110 --> 00:10:43,960 | |
| trying to find out. | |
| 156 | |
| 00:10:44,010 --> 00:10:48,230 | |
| We have like 20 inputs what input is most important to our. | |
| 157 | |
| 00:10:48,240 --> 00:10:52,160 | |
| But it's also on the past models as well. | |
| 158 | |
| 00:10:52,560 --> 00:10:56,550 | |
| Whereas L2 penalises even more Does that bring it down to zero. | |
| 159 | |
| 00:10:56,710 --> 00:10:57,630 | |
| OK. | |
| 160 | |
| 00:10:58,260 --> 00:11:04,950 | |
| So effectively what we're doing here disremember L1 L2 prevents it from being too large so that we don't | |
| 161 | |
| 00:11:04,950 --> 00:11:11,830 | |
| have abrupt changes no model of abrupt changes basically mean things like go back to here. | |
| 162 | |
| 00:11:13,360 --> 00:11:18,080 | |
| You basically try to make this instead of having this abrupt gritty and changes here. | |
| 163 | |
| 00:11:21,290 --> 00:11:23,610 | |
| So let's go into cross-validation now. | |
| 164 | |
| 00:11:23,630 --> 00:11:29,720 | |
| Cross-ventilation is something I rarely ever use because they're not used to using it to use to use | |
| 165 | |
| 00:11:29,720 --> 00:11:30,140 | |
| it a lot. | |
| 166 | |
| 00:11:30,140 --> 00:11:35,000 | |
| Doing previous machine learning stuff but in deep learning I don't use it often but if you want to know | |
| 167 | |
| 00:11:35,000 --> 00:11:43,160 | |
| what it is it's quite simple actually basically cross-validation and that is key for the course validation | |
| 168 | |
| 00:11:43,730 --> 00:11:49,650 | |
| is how we see is the way we split our data set are trying that a set into different folds and we train | |
| 169 | |
| 00:11:49,730 --> 00:11:54,000 | |
| those fools and we basically test on the articles afterward. | |
| 170 | |
| 00:11:54,500 --> 00:12:00,600 | |
| So let's look at to see if this here is or test full of allegations that no training set. | |
| 171 | |
| 00:12:00,740 --> 00:12:08,650 | |
| So what happens is that we train on these four folds here and we test us then we train on these 4:48 | |
| 172 | |
| 00:12:08,750 --> 00:12:09,070 | |
| tests. | |
| 173 | |
| 00:12:09,080 --> 00:12:15,190 | |
| And that's what this does is that we don't actually have any unseen data in this model. | |
| 174 | |
| 00:12:15,260 --> 00:12:21,410 | |
| What we're doing is just we're continuously testing on segments of the data and then creating on segments | |
| 175 | |
| 00:12:21,410 --> 00:12:25,320 | |
| of the data and testing on a different segment. | |
| 176 | |
| 00:12:25,370 --> 00:12:30,630 | |
| It is a fitting Yes but it also slows down the training process. | |
| 177 | |
| 00:12:32,210 --> 00:12:36,030 | |
| Now is nothing is something we can actually automatically do in Paris. | |
| 178 | |
| 00:12:36,080 --> 00:12:43,070 | |
| Basically we can set something in Paris that tells us if all a loss tops decreasing stop stop reading. | |
| 179 | |
| 00:12:43,290 --> 00:12:50,600 | |
| So if we set our model for 100 ebox but see it Epopt number two something it stops the last stop decreasing | |
| 180 | |
| 00:12:51,080 --> 00:12:55,030 | |
| it's going to stop happening somewhere around here when he realizes that and is going to give you this | |
| 181 | |
| 00:12:55,040 --> 00:12:55,670 | |
| model here. | |
| 182 | |
| 00:12:55,670 --> 00:12:58,670 | |
| The best one with the best loss. | |
| 183 | |
| 00:12:58,700 --> 00:13:05,840 | |
| The reason we do this stopping is that sometimes we keep continually training over and over number of | |
| 184 | |
| 00:13:05,900 --> 00:13:08,430 | |
| books on our training data. | |
| 185 | |
| 00:13:08,510 --> 00:13:11,840 | |
| It tends to basically fit on that data. | |
| 186 | |
| 00:13:11,840 --> 00:13:14,540 | |
| So we need to actually stop it really sometimes. | |
| 187 | |
| 00:13:14,760 --> 00:13:18,610 | |
| So that release that overfitting does not occur. | |
| 188 | |
| 00:13:20,040 --> 00:13:21,670 | |
| And let's talk about drop out. | |
| 189 | |
| 00:13:21,710 --> 00:13:26,000 | |
| It is actually very easy to implement and extremely useful. | |
| 190 | |
| 00:13:26,520 --> 00:13:28,610 | |
| So drop out refers to dropping nodes. | |
| 191 | |
| 00:13:28,700 --> 00:13:33,290 | |
| Hidden and visible in a neural network with him off reducing overfitting. | |
| 192 | |
| 00:13:33,530 --> 00:13:35,240 | |
| What do you mean by dropping nodes. | |
| 193 | |
| 00:13:35,460 --> 00:13:40,830 | |
| What it means is that in the training process certain parts of the network are ignored during a forward | |
| 194 | |
| 00:13:40,830 --> 00:13:42,680 | |
| and back propagations. | |
| 195 | |
| 00:13:42,720 --> 00:13:48,660 | |
| This is a way of actually making that work have some redundancy in a way but what it also does is that | |
| 196 | |
| 00:13:48,660 --> 00:13:53,100 | |
| it actually adds regularisation to one that works. | |
| 197 | |
| 00:13:53,160 --> 00:13:59,030 | |
| It helps reduce the interdependency between winning on your own as well. | |
| 198 | |
| 00:13:59,450 --> 00:14:04,840 | |
| So it actually leads to more robust and meaningful features is in dropout. | |
| 199 | |
| 00:14:04,910 --> 00:14:10,580 | |
| Does one prominent we use it's called P and P is a probability that the nodes are kept or dropped out | |
| 200 | |
| 00:14:11,270 --> 00:14:12,930 | |
| in the training process. | |
| 201 | |
| 00:14:12,980 --> 00:14:18,620 | |
| So one are the consequences of using dropout is that it almost doubles the training time to converge | |
| 202 | |
| 00:14:18,620 --> 00:14:21,090 | |
| during training. | |
| 203 | |
| 00:14:21,110 --> 00:14:23,960 | |
| So this is a good illustration of dropout. | |
| 204 | |
| 00:14:24,020 --> 00:14:26,680 | |
| This is a standard neural networks when we train it. | |
| 205 | |
| 00:14:26,990 --> 00:14:30,670 | |
| And after playing drop out let's say it was a fairly high value. | |
| 206 | |
| 00:14:31,100 --> 00:14:33,060 | |
| These notes here are ignored. | |
| 207 | |
| 00:14:33,650 --> 00:14:38,040 | |
| And these notes are used in training. | |
| 208 | |
| 00:14:38,140 --> 00:14:44,170 | |
| And lastly was not the last method of regularisation but the last one I'll teach in the schools because | |
| 209 | |
| 00:14:44,170 --> 00:14:48,690 | |
| the others are actually quite exotic and not commonly used. | |
| 210 | |
| 00:14:48,730 --> 00:14:51,890 | |
| So this one is called the augmentation. | |
| 211 | |
| 00:14:51,890 --> 00:14:55,230 | |
| Remember I said you need lots of data to train network. | |
| 212 | |
| 00:14:55,450 --> 00:15:02,800 | |
| What if you don't have well incomplete vision especially It lends itself naturally to data augmentation. | |
| 213 | |
| 00:15:02,800 --> 00:15:06,570 | |
| What we do is we take a dataset we have one picture of a dog. | |
| 214 | |
| 00:15:06,910 --> 00:15:09,910 | |
| How about we just make some manipulations to this image. | |
| 215 | |
| 00:15:09,910 --> 00:15:13,050 | |
| We can rotate it completely. | |
| 216 | |
| 00:15:13,390 --> 00:15:19,450 | |
| We can I think this one is just mirrored back here and this one has actually zoomed in a bit as well. | |
| 217 | |
| 00:15:19,960 --> 00:15:24,400 | |
| So that is where we can actually expand the desert. | |