File size: 9,476 Bytes
0182da2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | 1
00:00:01,080 --> 00:00:06,390
And welcome to Chapter 15 points one way I explained to you what exactly is transfer learning and a
2
00:00:06,390 --> 00:00:07,490
fine tuning.
3
00:00:07,980 --> 00:00:13,620
So as we know from before trining complicated and deep CNN is very slow.
4
00:00:13,620 --> 00:00:21,120
Alex net and Visagie and partic. a deep parameter Laden networks Viji also and has 128 million parameters
5
00:00:21,600 --> 00:00:26,600
and resident 50 has 50 hidden layers despite being having less parameters.
6
00:00:26,620 --> 00:00:29,180
Still a lot of layers take some time to train.
7
00:00:29,700 --> 00:00:35,580
So at least that works a teen at relatively excellent performance on image at training them on you is
8
00:00:35,730 --> 00:00:37,450
definitely not recommended.
9
00:00:37,650 --> 00:00:39,950
You are not going to get anywhere close to good results.
10
00:00:39,960 --> 00:00:47,310
Even training for a moment when the C.P you Sidis CNN's are often trained for a couple of weeks or more
11
00:00:47,370 --> 00:00:49,160
using an array is of deep use.
12
00:00:49,350 --> 00:00:54,390
That's to tell you how complicated and how long it takes to get good results on image net.
13
00:00:54,480 --> 00:01:00,930
So what if there was a way we could reuse those pre-trained models and make it one classify as we've
14
00:01:00,930 --> 00:01:06,270
seen Scarus actually ships with Pretorian models on image net and those models with the models we showed
15
00:01:06,270 --> 00:01:07,780
you before in chapter 14.
16
00:01:08,070 --> 00:01:12,530
And these words are already tuned to detect dozens of low mid and high level features.
17
00:01:12,570 --> 00:01:17,450
What if we can use these that is already Treen that works now to below and classify as.
18
00:01:17,480 --> 00:01:24,780
And well we can introducing transfer learning and fine tuning this solves the problem.
19
00:01:24,810 --> 00:01:26,450
We just explained.
20
00:01:27,110 --> 00:01:29,900
So let's talk a bit about transitioning and fine tuning now.
21
00:01:30,700 --> 00:01:36,920
So fine tuning the concept of fine tuning is often and justifiably confused with transfer meaning.
22
00:01:37,140 --> 00:01:39,880
However it merely is a type of translating.
23
00:01:40,200 --> 00:01:43,140
Fine tuning is where we take a pre-trained deep CNN.
24
00:01:43,140 --> 00:01:46,730
So one of those like resonant or Viji as we've seen before.
25
00:01:47,070 --> 00:01:54,840
And we used the already trained typically on image net model to aid you image transformation tests typically
26
00:01:54,840 --> 00:02:00,450
in fine tuning We are taking an already trained CNN and turning it on to a new data set.
27
00:02:00,690 --> 00:02:01,090
OK.
28
00:02:01,350 --> 00:02:05,010
So basically what I'm seeing which I'll explain shortly is
29
00:02:08,170 --> 00:02:08,670
start of
30
00:02:12,500 --> 00:02:14,640
so let's take a look at these concepts.
31
00:02:14,660 --> 00:02:20,210
Firstly let's talk about fine tuning not the concept of fine tuning as often and very just by justifiably
32
00:02:20,210 --> 00:02:25,530
confused with transfer learning and that's because it's very similar and is merely a type of translating.
33
00:02:25,790 --> 00:02:30,500
Fine tuning is where we take a pre-trained deep CNN and we use this model.
34
00:02:30,540 --> 00:02:35,240
It's already been trained most likely on image net to basically asystole it.
35
00:02:35,330 --> 00:02:37,340
When you image classification tests.
36
00:02:37,340 --> 00:02:43,810
And typically in fine tuning We are taking an already trained CNN and turning it on when you a sense.
37
00:02:43,820 --> 00:02:50,300
So what we do is we then freeze the lower layers of this model and I'll illustrate to you what this
38
00:02:50,300 --> 00:02:51,210
means shortly.
39
00:02:51,470 --> 00:02:54,920
And we train only the top or fully connected layers.
40
00:02:55,610 --> 00:02:57,480
And that's how we actually train it.
41
00:02:57,630 --> 00:02:58,910
And you model here.
42
00:02:59,270 --> 00:03:03,600
So effectively we're just replacing the class or parts of an train model.
43
00:03:04,070 --> 00:03:08,450
And sometimes you can actually go back and unfreeze little weights and train them again to get even
44
00:03:08,450 --> 00:03:10,300
better performance.
45
00:03:10,310 --> 00:03:13,740
So let me explain this to you in the strictly what's happening.
46
00:03:13,890 --> 00:03:17,340
So imagine this is a deep much deeper than this.
47
00:03:17,510 --> 00:03:20,900
But imagine this is a deep CNN that's already been trained.
48
00:03:21,230 --> 00:03:21,960
All right.
49
00:03:22,040 --> 00:03:27,680
So when I say we freeze the layers we're freezing all the convolutional layers from the heat between
50
00:03:27,680 --> 00:03:30,750
the input and up to fully connectedly here.
51
00:03:30,980 --> 00:03:36,440
So these we imagine these we have already been trained and they're very good at picking up high low
52
00:03:36,440 --> 00:03:38,120
and mid-level features.
53
00:03:38,120 --> 00:03:45,110
So what we do know is we just basically change the classes that we want for our model and we basically
54
00:03:45,110 --> 00:03:49,390
just manipulate the top layer and Trinite on our data set now.
55
00:03:49,820 --> 00:03:56,390
So this is this is the first impulse here and this is the part we have basically modified and unfroze
56
00:03:56,510 --> 00:03:59,210
and are going to train the spot separately.
57
00:03:59,840 --> 00:04:06,770
So in fine tuning in most of CNN's THE FIRST FEW convolutional has long been low level features as explained.
58
00:04:07,080 --> 00:04:15,130
Those are like things like edge of textures of color blobs and that kind of stuff and so on.
59
00:04:15,740 --> 00:04:20,010
And as we progressed through a network it lends more high and mid-level features.
60
00:04:20,180 --> 00:04:26,750
So in fine tuning we just keep to the low levels frozen and we can also just treat the high level features
61
00:04:26,750 --> 00:04:32,800
as well so there's little steps here pretty much just went through this for you.
62
00:04:33,050 --> 00:04:40,460
But basically we freeze layers and we add or modify in a fully committed layer and we use a very tiny
63
00:04:40,460 --> 00:04:43,520
linning rate and we just initiate training again.
64
00:04:43,880 --> 00:04:49,530
It's quite easy to do in carrousel get to that code shortly and it's quite powerful.
65
00:04:49,540 --> 00:04:59,790
We by using these already well-trained models we can get superbly good accuracy on new image to us.
66
00:04:59,810 --> 00:05:05,480
So what about transitioning now as you've seen in finding we have taken an already Pretorian network
67
00:05:06,080 --> 00:05:12,420
and trained it or segments of it on some new data for a new image the best fit classification tests.
68
00:05:13,100 --> 00:05:13,940
All right.
69
00:05:13,940 --> 00:05:17,230
So translating is pretty much almost the same thing.
70
00:05:17,540 --> 00:05:22,380
And a lot of researchers and a lot of people in the industry use these terms interchangeably.
71
00:05:22,480 --> 00:05:28,700
However what transfinite linning really implies is that we're taking the knowledge from a Pretorian
72
00:05:28,700 --> 00:05:34,490
network and basically applying it to a similar tasks and therefore not really retreating much of the
73
00:05:34,490 --> 00:05:35,600
network.
74
00:05:35,600 --> 00:05:42,750
So what that means effectively is that let's go back to attack the in fine tuning.
75
00:05:42,860 --> 00:05:49,600
The reason why we call it fine tuning for is that where we can actually train these lawyers here.
76
00:05:49,920 --> 00:05:56,160
So it would transfer learning and in functioning we're basically unfreezing the top layer here and modifying
77
00:05:56,160 --> 00:05:57,660
it for our classes.
78
00:05:57,930 --> 00:06:02,090
But in fine tuning we can tend to go back and try the sleep here.
79
00:06:02,450 --> 00:06:06,630
That's to that's pretty much a core difference.
80
00:06:06,630 --> 00:06:09,000
So here's a quick quote from a deep learning book.
81
00:06:09,000 --> 00:06:13,860
I'm pretty sure you can click this link on the PDA slide that I give you basically a chance for cleaning
82
00:06:13,920 --> 00:06:19,680
and demean adaptation referred to a situation where what has already been learned or what has been learned
83
00:06:19,710 --> 00:06:25,560
in one setting is now exploited to improve generalization in another setting.
84
00:06:25,620 --> 00:06:28,600
That's effectively what transfer learning means.
85
00:06:28,800 --> 00:06:30,900
And we're going to do some practical examples now.
86
00:06:30,950 --> 00:06:35,290
We're going to use mobile net to create a monkey beach ossify.
87
00:06:35,580 --> 00:06:39,160
And then we're going to use Viji to create a flow or classify.
88
00:06:39,540 --> 00:06:42,610
So stay tuned and we are going to have some fun with these models.
89
00:06:42,640 --> 00:06:43,140
I guarantee.
|