AI_DL_Assignment / 15. Transfer Learning Build a Flower & Monkey Breed Classifier /2. What is Transfer Learning and Fine Tuning.srt
| 1 | |
| 00:00:01,080 --> 00:00:06,390 | |
| And welcome to Chapter 15 points one way I explained to you what exactly is transfer learning and a | |
| 2 | |
| 00:00:06,390 --> 00:00:07,490 | |
| fine tuning. | |
| 3 | |
| 00:00:07,980 --> 00:00:13,620 | |
| So as we know from before trining complicated and deep CNN is very slow. | |
| 4 | |
| 00:00:13,620 --> 00:00:21,120 | |
| Alex net and Visagie and partic. a deep parameter Laden networks Viji also and has 128 million parameters | |
| 5 | |
| 00:00:21,600 --> 00:00:26,600 | |
| and resident 50 has 50 hidden layers despite being having less parameters. | |
| 6 | |
| 00:00:26,620 --> 00:00:29,180 | |
| Still a lot of layers take some time to train. | |
| 7 | |
| 00:00:29,700 --> 00:00:35,580 | |
| So at least that works a teen at relatively excellent performance on image at training them on you is | |
| 8 | |
| 00:00:35,730 --> 00:00:37,450 | |
| definitely not recommended. | |
| 9 | |
| 00:00:37,650 --> 00:00:39,950 | |
| You are not going to get anywhere close to good results. | |
| 10 | |
| 00:00:39,960 --> 00:00:47,310 | |
| Even training for a moment when the C.P you Sidis CNN's are often trained for a couple of weeks or more | |
| 11 | |
| 00:00:47,370 --> 00:00:49,160 | |
| using an array is of deep use. | |
| 12 | |
| 00:00:49,350 --> 00:00:54,390 | |
| That's to tell you how complicated and how long it takes to get good results on image net. | |
| 13 | |
| 00:00:54,480 --> 00:01:00,930 | |
| So what if there was a way we could reuse those pre-trained models and make it one classify as we've | |
| 14 | |
| 00:01:00,930 --> 00:01:06,270 | |
| seen Scarus actually ships with Pretorian models on image net and those models with the models we showed | |
| 15 | |
| 00:01:06,270 --> 00:01:07,780 | |
| you before in chapter 14. | |
| 16 | |
| 00:01:08,070 --> 00:01:12,530 | |
| And these words are already tuned to detect dozens of low mid and high level features. | |
| 17 | |
| 00:01:12,570 --> 00:01:17,450 | |
| What if we can use these that is already Treen that works now to below and classify as. | |
| 18 | |
| 00:01:17,480 --> 00:01:24,780 | |
| And well we can introducing transfer learning and fine tuning this solves the problem. | |
| 19 | |
| 00:01:24,810 --> 00:01:26,450 | |
| We just explained. | |
| 20 | |
| 00:01:27,110 --> 00:01:29,900 | |
| So let's talk a bit about transitioning and fine tuning now. | |
| 21 | |
| 00:01:30,700 --> 00:01:36,920 | |
| So fine tuning the concept of fine tuning is often and justifiably confused with transfer meaning. | |
| 22 | |
| 00:01:37,140 --> 00:01:39,880 | |
| However it merely is a type of translating. | |
| 23 | |
| 00:01:40,200 --> 00:01:43,140 | |
| Fine tuning is where we take a pre-trained deep CNN. | |
| 24 | |
| 00:01:43,140 --> 00:01:46,730 | |
| So one of those like resonant or Viji as we've seen before. | |
| 25 | |
| 00:01:47,070 --> 00:01:54,840 | |
| And we used the already trained typically on image net model to aid you image transformation tests typically | |
| 26 | |
| 00:01:54,840 --> 00:02:00,450 | |
| in fine tuning We are taking an already trained CNN and turning it on to a new data set. | |
| 27 | |
| 00:02:00,690 --> 00:02:01,090 | |
| OK. | |
| 28 | |
| 00:02:01,350 --> 00:02:05,010 | |
| So basically what I'm seeing which I'll explain shortly is | |
| 29 | |
| 00:02:08,170 --> 00:02:08,670 | |
| start of | |
| 30 | |
| 00:02:12,500 --> 00:02:14,640 | |
| so let's take a look at these concepts. | |
| 31 | |
| 00:02:14,660 --> 00:02:20,210 | |
| Firstly let's talk about fine tuning not the concept of fine tuning as often and very just by justifiably | |
| 32 | |
| 00:02:20,210 --> 00:02:25,530 | |
| confused with transfer learning and that's because it's very similar and is merely a type of translating. | |
| 33 | |
| 00:02:25,790 --> 00:02:30,500 | |
| Fine tuning is where we take a pre-trained deep CNN and we use this model. | |
| 34 | |
| 00:02:30,540 --> 00:02:35,240 | |
| It's already been trained most likely on image net to basically asystole it. | |
| 35 | |
| 00:02:35,330 --> 00:02:37,340 | |
| When you image classification tests. | |
| 36 | |
| 00:02:37,340 --> 00:02:43,810 | |
| And typically in fine tuning We are taking an already trained CNN and turning it on when you a sense. | |
| 37 | |
| 00:02:43,820 --> 00:02:50,300 | |
| So what we do is we then freeze the lower layers of this model and I'll illustrate to you what this | |
| 38 | |
| 00:02:50,300 --> 00:02:51,210 | |
| means shortly. | |
| 39 | |
| 00:02:51,470 --> 00:02:54,920 | |
| And we train only the top or fully connected layers. | |
| 40 | |
| 00:02:55,610 --> 00:02:57,480 | |
| And that's how we actually train it. | |
| 41 | |
| 00:02:57,630 --> 00:02:58,910 | |
| And you model here. | |
| 42 | |
| 00:02:59,270 --> 00:03:03,600 | |
| So effectively we're just replacing the class or parts of an train model. | |
| 43 | |
| 00:03:04,070 --> 00:03:08,450 | |
| And sometimes you can actually go back and unfreeze little weights and train them again to get even | |
| 44 | |
| 00:03:08,450 --> 00:03:10,300 | |
| better performance. | |
| 45 | |
| 00:03:10,310 --> 00:03:13,740 | |
| So let me explain this to you in the strictly what's happening. | |
| 46 | |
| 00:03:13,890 --> 00:03:17,340 | |
| So imagine this is a deep much deeper than this. | |
| 47 | |
| 00:03:17,510 --> 00:03:20,900 | |
| But imagine this is a deep CNN that's already been trained. | |
| 48 | |
| 00:03:21,230 --> 00:03:21,960 | |
| All right. | |
| 49 | |
| 00:03:22,040 --> 00:03:27,680 | |
| So when I say we freeze the layers we're freezing all the convolutional layers from the heat between | |
| 50 | |
| 00:03:27,680 --> 00:03:30,750 | |
| the input and up to fully connectedly here. | |
| 51 | |
| 00:03:30,980 --> 00:03:36,440 | |
| So these we imagine these we have already been trained and they're very good at picking up high low | |
| 52 | |
| 00:03:36,440 --> 00:03:38,120 | |
| and mid-level features. | |
| 53 | |
| 00:03:38,120 --> 00:03:45,110 | |
| So what we do know is we just basically change the classes that we want for our model and we basically | |
| 54 | |
| 00:03:45,110 --> 00:03:49,390 | |
| just manipulate the top layer and Trinite on our data set now. | |
| 55 | |
| 00:03:49,820 --> 00:03:56,390 | |
| So this is this is the first impulse here and this is the part we have basically modified and unfroze | |
| 56 | |
| 00:03:56,510 --> 00:03:59,210 | |
| and are going to train the spot separately. | |
| 57 | |
| 00:03:59,840 --> 00:04:06,770 | |
| So in fine tuning in most of CNN's THE FIRST FEW convolutional has long been low level features as explained. | |
| 58 | |
| 00:04:07,080 --> 00:04:15,130 | |
| Those are like things like edge of textures of color blobs and that kind of stuff and so on. | |
| 59 | |
| 00:04:15,740 --> 00:04:20,010 | |
| And as we progressed through a network it lends more high and mid-level features. | |
| 60 | |
| 00:04:20,180 --> 00:04:26,750 | |
| So in fine tuning we just keep to the low levels frozen and we can also just treat the high level features | |
| 61 | |
| 00:04:26,750 --> 00:04:32,800 | |
| as well so there's little steps here pretty much just went through this for you. | |
| 62 | |
| 00:04:33,050 --> 00:04:40,460 | |
| But basically we freeze layers and we add or modify in a fully committed layer and we use a very tiny | |
| 63 | |
| 00:04:40,460 --> 00:04:43,520 | |
| linning rate and we just initiate training again. | |
| 64 | |
| 00:04:43,880 --> 00:04:49,530 | |
| It's quite easy to do in carrousel get to that code shortly and it's quite powerful. | |
| 65 | |
| 00:04:49,540 --> 00:04:59,790 | |
| We by using these already well-trained models we can get superbly good accuracy on new image to us. | |
| 66 | |
| 00:04:59,810 --> 00:05:05,480 | |
| So what about transitioning now as you've seen in finding we have taken an already Pretorian network | |
| 67 | |
| 00:05:06,080 --> 00:05:12,420 | |
| and trained it or segments of it on some new data for a new image the best fit classification tests. | |
| 68 | |
| 00:05:13,100 --> 00:05:13,940 | |
| All right. | |
| 69 | |
| 00:05:13,940 --> 00:05:17,230 | |
| So translating is pretty much almost the same thing. | |
| 70 | |
| 00:05:17,540 --> 00:05:22,380 | |
| And a lot of researchers and a lot of people in the industry use these terms interchangeably. | |
| 71 | |
| 00:05:22,480 --> 00:05:28,700 | |
| However what transfinite linning really implies is that we're taking the knowledge from a Pretorian | |
| 72 | |
| 00:05:28,700 --> 00:05:34,490 | |
| network and basically applying it to a similar tasks and therefore not really retreating much of the | |
| 73 | |
| 00:05:34,490 --> 00:05:35,600 | |
| network. | |
| 74 | |
| 00:05:35,600 --> 00:05:42,750 | |
| So what that means effectively is that let's go back to attack the in fine tuning. | |
| 75 | |
| 00:05:42,860 --> 00:05:49,600 | |
| The reason why we call it fine tuning for is that where we can actually train these lawyers here. | |
| 76 | |
| 00:05:49,920 --> 00:05:56,160 | |
| So it would transfer learning and in functioning we're basically unfreezing the top layer here and modifying | |
| 77 | |
| 00:05:56,160 --> 00:05:57,660 | |
| it for our classes. | |
| 78 | |
| 00:05:57,930 --> 00:06:02,090 | |
| But in fine tuning we can tend to go back and try the sleep here. | |
| 79 | |
| 00:06:02,450 --> 00:06:06,630 | |
| That's to that's pretty much a core difference. | |
| 80 | |
| 00:06:06,630 --> 00:06:09,000 | |
| So here's a quick quote from a deep learning book. | |
| 81 | |
| 00:06:09,000 --> 00:06:13,860 | |
| I'm pretty sure you can click this link on the PDA slide that I give you basically a chance for cleaning | |
| 82 | |
| 00:06:13,920 --> 00:06:19,680 | |
| and demean adaptation referred to a situation where what has already been learned or what has been learned | |
| 83 | |
| 00:06:19,710 --> 00:06:25,560 | |
| in one setting is now exploited to improve generalization in another setting. | |
| 84 | |
| 00:06:25,620 --> 00:06:28,600 | |
| That's effectively what transfer learning means. | |
| 85 | |
| 00:06:28,800 --> 00:06:30,900 | |
| And we're going to do some practical examples now. | |
| 86 | |
| 00:06:30,950 --> 00:06:35,290 | |
| We're going to use mobile net to create a monkey beach ossify. | |
| 87 | |
| 00:06:35,580 --> 00:06:39,160 | |
| And then we're going to use Viji to create a flow or classify. | |
| 88 | |
| 00:06:39,540 --> 00:06:42,610 | |
| So stay tuned and we are going to have some fun with these models. | |
| 89 | |
| 00:06:42,640 --> 00:06:43,140 | |
| I guarantee. | |