1
00:00:01,080 --> 00:00:06,390
And welcome to Chapter 15 points one way I explained to you what exactly is transfer learning and a

2
00:00:06,390 --> 00:00:07,490
fine tuning.

3
00:00:07,980 --> 00:00:13,620
So as we know from before trining complicated and deep CNN is very slow.

4
00:00:13,620 --> 00:00:21,120
Alex net and Visagie and partic. a deep parameter Laden networks Viji also and has 128 million parameters

5
00:00:21,600 --> 00:00:26,600
and resident 50 has 50 hidden layers despite being having less parameters.

6
00:00:26,620 --> 00:00:29,180
Still a lot of layers take some time to train.

7
00:00:29,700 --> 00:00:35,580
So at least that works a teen at relatively excellent performance on image at training them on you is

8
00:00:35,730 --> 00:00:37,450
definitely not recommended.

9
00:00:37,650 --> 00:00:39,950
You are not going to get anywhere close to good results.

10
00:00:39,960 --> 00:00:47,310
Even training for a moment when the C.P you Sidis CNN's are often trained for a couple of weeks or more

11
00:00:47,370 --> 00:00:49,160
using an array is of deep use.

12
00:00:49,350 --> 00:00:54,390
That's to tell you how complicated and how long it takes to get good results on image net.

13
00:00:54,480 --> 00:01:00,930
So what if there was a way we could reuse those pre-trained models and make it one classify as we've

14
00:01:00,930 --> 00:01:06,270
seen Scarus actually ships with Pretorian models on image net and those models with the models we showed

15
00:01:06,270 --> 00:01:07,780
you before in chapter 14.

16
00:01:08,070 --> 00:01:12,530
And these words are already tuned to detect dozens of low mid and high level features.

17
00:01:12,570 --> 00:01:17,450
What if we can use these that is already Treen that works now to below and classify as.

18
00:01:17,480 --> 00:01:24,780
And well we can introducing transfer learning and fine tuning this solves the problem.

19
00:01:24,810 --> 00:01:26,450
We just explained.

20
00:01:27,110 --> 00:01:29,900
So let's talk a bit about transitioning and fine tuning now.

21
00:01:30,700 --> 00:01:36,920
So fine tuning the concept of fine tuning is often and justifiably confused with transfer meaning.

22
00:01:37,140 --> 00:01:39,880
However it merely is a type of translating.

23
00:01:40,200 --> 00:01:43,140
Fine tuning is where we take a pre-trained deep CNN.

24
00:01:43,140 --> 00:01:46,730
So one of those like resonant or Viji as we've seen before.

25
00:01:47,070 --> 00:01:54,840
And we used the already trained typically on image net model to aid you image transformation tests typically

26
00:01:54,840 --> 00:02:00,450
in fine tuning We are taking an already trained CNN and turning it on to a new data set.

27
00:02:00,690 --> 00:02:01,090
OK.

28
00:02:01,350 --> 00:02:05,010
So basically what I'm seeing which I'll explain shortly is

29
00:02:08,170 --> 00:02:08,670
start of

30
00:02:12,500 --> 00:02:14,640
so let's take a look at these concepts.

31
00:02:14,660 --> 00:02:20,210
Firstly let's talk about fine tuning not the concept of fine tuning as often and very just by justifiably

32
00:02:20,210 --> 00:02:25,530
confused with transfer learning and that's because it's very similar and is merely a type of translating.

33
00:02:25,790 --> 00:02:30,500
Fine tuning is where we take a pre-trained deep CNN and we use this model.

34
00:02:30,540 --> 00:02:35,240
It's already been trained most likely on image net to basically asystole it.

35
00:02:35,330 --> 00:02:37,340
When you image classification tests.

36
00:02:37,340 --> 00:02:43,810
And typically in fine tuning We are taking an already trained CNN and turning it on when you a sense.

37
00:02:43,820 --> 00:02:50,300
So what we do is we then freeze the lower layers of this model and I'll illustrate to you what this

38
00:02:50,300 --> 00:02:51,210
means shortly.

39
00:02:51,470 --> 00:02:54,920
And we train only the top or fully connected layers.

40
00:02:55,610 --> 00:02:57,480
And that's how we actually train it.

41
00:02:57,630 --> 00:02:58,910
And you model here.

42
00:02:59,270 --> 00:03:03,600
So effectively we're just replacing the class or parts of an train model.

43
00:03:04,070 --> 00:03:08,450
And sometimes you can actually go back and unfreeze little weights and train them again to get even

44
00:03:08,450 --> 00:03:10,300
better performance.

45
00:03:10,310 --> 00:03:13,740
So let me explain this to you in the strictly what's happening.

46
00:03:13,890 --> 00:03:17,340
So imagine this is a deep much deeper than this.

47
00:03:17,510 --> 00:03:20,900
But imagine this is a deep CNN that's already been trained.

48
00:03:21,230 --> 00:03:21,960
All right.

49
00:03:22,040 --> 00:03:27,680
So when I say we freeze the layers we're freezing all the convolutional layers from the heat between

50
00:03:27,680 --> 00:03:30,750
the input and up to fully connectedly here.

51
00:03:30,980 --> 00:03:36,440
So these we imagine these we have already been trained and they're very good at picking up high low

52
00:03:36,440 --> 00:03:38,120
and mid-level features.

53
00:03:38,120 --> 00:03:45,110
So what we do know is we just basically change the classes that we want for our model and we basically

54
00:03:45,110 --> 00:03:49,390
just manipulate the top layer and Trinite on our data set now.

55
00:03:49,820 --> 00:03:56,390
So this is this is the first impulse here and this is the part we have basically modified and unfroze

56
00:03:56,510 --> 00:03:59,210
and are going to train the spot separately.

57
00:03:59,840 --> 00:04:06,770
So in fine tuning in most of CNN's THE FIRST FEW convolutional has long been low level features as explained.

58
00:04:07,080 --> 00:04:15,130
Those are like things like edge of textures of color blobs and that kind of stuff and so on.

59
00:04:15,740 --> 00:04:20,010
And as we progressed through a network it lends more high and mid-level features.

60
00:04:20,180 --> 00:04:26,750
So in fine tuning we just keep to the low levels frozen and we can also just treat the high level features

61
00:04:26,750 --> 00:04:32,800
as well so there's little steps here pretty much just went through this for you.

62
00:04:33,050 --> 00:04:40,460
But basically we freeze layers and we add or modify in a fully committed layer and we use a very tiny

63
00:04:40,460 --> 00:04:43,520
linning rate and we just initiate training again.

64
00:04:43,880 --> 00:04:49,530
It's quite easy to do in carrousel get to that code shortly and it's quite powerful.

65
00:04:49,540 --> 00:04:59,790
We by using these already well-trained models we can get superbly good accuracy on new image to us.

66
00:04:59,810 --> 00:05:05,480
So what about transitioning now as you've seen in finding we have taken an already Pretorian network

67
00:05:06,080 --> 00:05:12,420
and trained it or segments of it on some new data for a new image the best fit classification tests.

68
00:05:13,100 --> 00:05:13,940
All right.

69
00:05:13,940 --> 00:05:17,230
So translating is pretty much almost the same thing.

70
00:05:17,540 --> 00:05:22,380
And a lot of researchers and a lot of people in the industry use these terms interchangeably.

71
00:05:22,480 --> 00:05:28,700
However what transfinite linning really implies is that we're taking the knowledge from a Pretorian

72
00:05:28,700 --> 00:05:34,490
network and basically applying it to a similar tasks and therefore not really retreating much of the

73
00:05:34,490 --> 00:05:35,600
network.

74
00:05:35,600 --> 00:05:42,750
So what that means effectively is that let's go back to attack the in fine tuning.

75
00:05:42,860 --> 00:05:49,600
The reason why we call it fine tuning for is that where we can actually train these lawyers here.

76
00:05:49,920 --> 00:05:56,160
So it would transfer learning and in functioning we're basically unfreezing the top layer here and modifying

77
00:05:56,160 --> 00:05:57,660
it for our classes.

78
00:05:57,930 --> 00:06:02,090
But in fine tuning we can tend to go back and try the sleep here.

79
00:06:02,450 --> 00:06:06,630
That's to that's pretty much a core difference.

80
00:06:06,630 --> 00:06:09,000
So here's a quick quote from a deep learning book.

81
00:06:09,000 --> 00:06:13,860
I'm pretty sure you can click this link on the PDA slide that I give you basically a chance for cleaning

82
00:06:13,920 --> 00:06:19,680
and demean adaptation referred to a situation where what has already been learned or what has been learned

83
00:06:19,710 --> 00:06:25,560
in one setting is now exploited to improve generalization in another setting.

84
00:06:25,620 --> 00:06:28,600
That's effectively what transfer learning means.

85
00:06:28,800 --> 00:06:30,900
And we're going to do some practical examples now.

86
00:06:30,950 --> 00:06:35,290
We're going to use mobile net to create a monkey beach ossify.

87
00:06:35,580 --> 00:06:39,160
And then we're going to use Viji to create a flow or classify.

88
00:06:39,540 --> 00:06:42,610
So stay tuned and we are going to have some fun with these models.

89
00:06:42,640 --> 00:06:43,140
I guarantee.