1 00:00:01,080 --> 00:00:06,390 And welcome to Chapter 15 points one way I explained to you what exactly is transfer learning and a 2 00:00:06,390 --> 00:00:07,490 fine tuning. 3 00:00:07,980 --> 00:00:13,620 So as we know from before trining complicated and deep CNN is very slow. 4 00:00:13,620 --> 00:00:21,120 Alex net and Visagie and partic. a deep parameter Laden networks Viji also and has 128 million parameters 5 00:00:21,600 --> 00:00:26,600 and resident 50 has 50 hidden layers despite being having less parameters. 6 00:00:26,620 --> 00:00:29,180 Still a lot of layers take some time to train. 7 00:00:29,700 --> 00:00:35,580 So at least that works a teen at relatively excellent performance on image at training them on you is 8 00:00:35,730 --> 00:00:37,450 definitely not recommended. 9 00:00:37,650 --> 00:00:39,950 You are not going to get anywhere close to good results. 10 00:00:39,960 --> 00:00:47,310 Even training for a moment when the C.P you Sidis CNN's are often trained for a couple of weeks or more 11 00:00:47,370 --> 00:00:49,160 using an array is of deep use. 12 00:00:49,350 --> 00:00:54,390 That's to tell you how complicated and how long it takes to get good results on image net. 13 00:00:54,480 --> 00:01:00,930 So what if there was a way we could reuse those pre-trained models and make it one classify as we've 14 00:01:00,930 --> 00:01:06,270 seen Scarus actually ships with Pretorian models on image net and those models with the models we showed 15 00:01:06,270 --> 00:01:07,780 you before in chapter 14. 16 00:01:08,070 --> 00:01:12,530 And these words are already tuned to detect dozens of low mid and high level features. 17 00:01:12,570 --> 00:01:17,450 What if we can use these that is already Treen that works now to below and classify as. 18 00:01:17,480 --> 00:01:24,780 And well we can introducing transfer learning and fine tuning this solves the problem. 19 00:01:24,810 --> 00:01:26,450 We just explained. 20 00:01:27,110 --> 00:01:29,900 So let's talk a bit about transitioning and fine tuning now. 21 00:01:30,700 --> 00:01:36,920 So fine tuning the concept of fine tuning is often and justifiably confused with transfer meaning. 22 00:01:37,140 --> 00:01:39,880 However it merely is a type of translating. 23 00:01:40,200 --> 00:01:43,140 Fine tuning is where we take a pre-trained deep CNN. 24 00:01:43,140 --> 00:01:46,730 So one of those like resonant or Viji as we've seen before. 25 00:01:47,070 --> 00:01:54,840 And we used the already trained typically on image net model to aid you image transformation tests typically 26 00:01:54,840 --> 00:02:00,450 in fine tuning We are taking an already trained CNN and turning it on to a new data set. 27 00:02:00,690 --> 00:02:01,090 OK. 28 00:02:01,350 --> 00:02:05,010 So basically what I'm seeing which I'll explain shortly is 29 00:02:08,170 --> 00:02:08,670 start of 30 00:02:12,500 --> 00:02:14,640 so let's take a look at these concepts. 31 00:02:14,660 --> 00:02:20,210 Firstly let's talk about fine tuning not the concept of fine tuning as often and very just by justifiably 32 00:02:20,210 --> 00:02:25,530 confused with transfer learning and that's because it's very similar and is merely a type of translating. 33 00:02:25,790 --> 00:02:30,500 Fine tuning is where we take a pre-trained deep CNN and we use this model. 34 00:02:30,540 --> 00:02:35,240 It's already been trained most likely on image net to basically asystole it. 35 00:02:35,330 --> 00:02:37,340 When you image classification tests. 36 00:02:37,340 --> 00:02:43,810 And typically in fine tuning We are taking an already trained CNN and turning it on when you a sense. 37 00:02:43,820 --> 00:02:50,300 So what we do is we then freeze the lower layers of this model and I'll illustrate to you what this 38 00:02:50,300 --> 00:02:51,210 means shortly. 39 00:02:51,470 --> 00:02:54,920 And we train only the top or fully connected layers. 40 00:02:55,610 --> 00:02:57,480 And that's how we actually train it. 41 00:02:57,630 --> 00:02:58,910 And you model here. 42 00:02:59,270 --> 00:03:03,600 So effectively we're just replacing the class or parts of an train model. 43 00:03:04,070 --> 00:03:08,450 And sometimes you can actually go back and unfreeze little weights and train them again to get even 44 00:03:08,450 --> 00:03:10,300 better performance. 45 00:03:10,310 --> 00:03:13,740 So let me explain this to you in the strictly what's happening. 46 00:03:13,890 --> 00:03:17,340 So imagine this is a deep much deeper than this. 47 00:03:17,510 --> 00:03:20,900 But imagine this is a deep CNN that's already been trained. 48 00:03:21,230 --> 00:03:21,960 All right. 49 00:03:22,040 --> 00:03:27,680 So when I say we freeze the layers we're freezing all the convolutional layers from the heat between 50 00:03:27,680 --> 00:03:30,750 the input and up to fully connectedly here. 51 00:03:30,980 --> 00:03:36,440 So these we imagine these we have already been trained and they're very good at picking up high low 52 00:03:36,440 --> 00:03:38,120 and mid-level features. 53 00:03:38,120 --> 00:03:45,110 So what we do know is we just basically change the classes that we want for our model and we basically 54 00:03:45,110 --> 00:03:49,390 just manipulate the top layer and Trinite on our data set now. 55 00:03:49,820 --> 00:03:56,390 So this is this is the first impulse here and this is the part we have basically modified and unfroze 56 00:03:56,510 --> 00:03:59,210 and are going to train the spot separately. 57 00:03:59,840 --> 00:04:06,770 So in fine tuning in most of CNN's THE FIRST FEW convolutional has long been low level features as explained. 58 00:04:07,080 --> 00:04:15,130 Those are like things like edge of textures of color blobs and that kind of stuff and so on. 59 00:04:15,740 --> 00:04:20,010 And as we progressed through a network it lends more high and mid-level features. 60 00:04:20,180 --> 00:04:26,750 So in fine tuning we just keep to the low levels frozen and we can also just treat the high level features 61 00:04:26,750 --> 00:04:32,800 as well so there's little steps here pretty much just went through this for you. 62 00:04:33,050 --> 00:04:40,460 But basically we freeze layers and we add or modify in a fully committed layer and we use a very tiny 63 00:04:40,460 --> 00:04:43,520 linning rate and we just initiate training again. 64 00:04:43,880 --> 00:04:49,530 It's quite easy to do in carrousel get to that code shortly and it's quite powerful. 65 00:04:49,540 --> 00:04:59,790 We by using these already well-trained models we can get superbly good accuracy on new image to us. 66 00:04:59,810 --> 00:05:05,480 So what about transitioning now as you've seen in finding we have taken an already Pretorian network 67 00:05:06,080 --> 00:05:12,420 and trained it or segments of it on some new data for a new image the best fit classification tests. 68 00:05:13,100 --> 00:05:13,940 All right. 69 00:05:13,940 --> 00:05:17,230 So translating is pretty much almost the same thing. 70 00:05:17,540 --> 00:05:22,380 And a lot of researchers and a lot of people in the industry use these terms interchangeably. 71 00:05:22,480 --> 00:05:28,700 However what transfinite linning really implies is that we're taking the knowledge from a Pretorian 72 00:05:28,700 --> 00:05:34,490 network and basically applying it to a similar tasks and therefore not really retreating much of the 73 00:05:34,490 --> 00:05:35,600 network. 74 00:05:35,600 --> 00:05:42,750 So what that means effectively is that let's go back to attack the in fine tuning. 75 00:05:42,860 --> 00:05:49,600 The reason why we call it fine tuning for is that where we can actually train these lawyers here. 76 00:05:49,920 --> 00:05:56,160 So it would transfer learning and in functioning we're basically unfreezing the top layer here and modifying 77 00:05:56,160 --> 00:05:57,660 it for our classes. 78 00:05:57,930 --> 00:06:02,090 But in fine tuning we can tend to go back and try the sleep here. 79 00:06:02,450 --> 00:06:06,630 That's to that's pretty much a core difference. 80 00:06:06,630 --> 00:06:09,000 So here's a quick quote from a deep learning book. 81 00:06:09,000 --> 00:06:13,860 I'm pretty sure you can click this link on the PDA slide that I give you basically a chance for cleaning 82 00:06:13,920 --> 00:06:19,680 and demean adaptation referred to a situation where what has already been learned or what has been learned 83 00:06:19,710 --> 00:06:25,560 in one setting is now exploited to improve generalization in another setting. 84 00:06:25,620 --> 00:06:28,600 That's effectively what transfer learning means. 85 00:06:28,800 --> 00:06:30,900 And we're going to do some practical examples now. 86 00:06:30,950 --> 00:06:35,290 We're going to use mobile net to create a monkey beach ossify. 87 00:06:35,580 --> 00:06:39,160 And then we're going to use Viji to create a flow or classify. 88 00:06:39,540 --> 00:06:42,610 So stay tuned and we are going to have some fun with these models. 89 00:06:42,640 --> 00:06:43,140 I guarantee.