| 1
|
| 00:00:00,570 --> 00:00:07,440
|
| Hi and welcome to chapter seven point one way introduce comp concept of convolutional neural nets basically
|
|
|
| 2
|
| 00:00:07,440 --> 00:00:14,590
|
| CNN's I'll be referring to them Truthout discourse so why is needed.
|
|
|
| 3
|
| 00:00:14,620 --> 00:00:20,170
|
| We spent a while discussing neural nets previously and maybe wondering why would we have spent so much
|
|
|
| 4
|
| 00:00:20,170 --> 00:00:26,260
|
| time on your own nets if we're not just going to suddenly discard them and go into CNN as well understanding
|
|
|
| 5
|
| 00:00:26,260 --> 00:00:32,140
|
| your own that's critical at understanding CNN's as they basically form the same foundation for all deeply
|
|
|
| 6
|
| 00:00:32,140 --> 00:00:37,990
|
| wrong types of networks all of them basically require you to understand stochastic gritty and descent
|
|
|
| 7
|
| 00:00:38,710 --> 00:00:43,880
|
| back propagation to training process botches iterations all of those things.
|
|
|
| 8
|
| 00:00:43,930 --> 00:00:50,250
|
| Basically CNN is just a different form of neural that's And you'll find out why and how and why.
|
|
|
| 9
|
| 00:00:51,010 --> 00:00:57,400
|
| So why CNN's because mainly because neural networks don't skill well to image data.
|
|
|
| 10
|
| 00:00:59,550 --> 00:01:05,010
|
| Remember you know intro slides we discussed how images are stored which was basically this.
|
|
|
| 11
|
| 00:01:05,010 --> 00:01:08,000
|
| You have a grid let's say this is a 10 by 10 grid here.
|
|
|
| 12
|
| 00:01:08,280 --> 00:01:13,170
|
| So we have 100 inputs here technically and each input has different colors.
|
|
|
| 13
|
| 00:01:13,200 --> 00:01:18,570
|
| So a lot of it is data and decrease your vision of this as well.
|
|
|
| 14
|
| 00:01:18,790 --> 00:01:22,220
|
| Our job would just be this.
|
|
|
| 15
|
| 00:01:22,280 --> 00:01:25,730
|
| So as I said neural nets don't skill well to image data.
|
|
|
| 16
|
| 00:01:25,740 --> 00:01:26,920
|
| And why is that.
|
|
|
| 17
|
| 00:01:27,070 --> 00:01:32,660
|
| Let's consider an image called image a small image 64 by 64 pixels.
|
|
|
| 18
|
| 00:01:32,670 --> 00:01:35,620
|
| So how many how many inputs is that 64.
|
|
|
| 19
|
| 00:01:35,630 --> 00:01:44,230
|
| By 64 times tree that's twelve thousand inputs already for what is a tiny image here.
|
|
|
| 20
|
| 00:01:44,410 --> 00:01:50,120
|
| So if all input lead will have at least twelve hundred twelve thousand weights and that is large.
|
|
|
| 21
|
| 00:01:50,380 --> 00:01:52,370
|
| And basically if we even go higher.
|
|
|
| 22
|
| 00:01:52,370 --> 00:01:58,920
|
| We have ninety nine some thousand weights in the piddly loon and that's not even considering how many.
|
|
|
| 23
|
| 00:01:59,020 --> 00:02:02,330
|
| How many more promises we have in the Himalayas as well.
|
|
|
| 24
|
| 00:02:02,380 --> 00:02:08,150
|
| So we need to find basically CNN doesn't actually reduce the weights in the beginning and the end.
|
|
|
| 25
|
| 00:02:08,160 --> 00:02:14,710
|
| But what it does do it [REMOVED] finds a representation internally in hidden layers to basically take
|
|
|
| 26
|
| 00:02:14,710 --> 00:02:22,660
|
| advantage of how images are formed so that we can actually make a neural net much more effective on
|
|
|
| 27
|
| 00:02:22,690 --> 00:02:23,390
|
| image data.
|
|
|
| 28
|
| 00:02:23,700 --> 00:02:25,380
|
| Let's see how this is.
|
|
|
| 29
|
| 00:02:25,780 --> 00:02:31,740
|
| So introducing CNN here this is effectively what it is here.
|
|
|
| 30
|
| 00:02:32,200 --> 00:02:35,570
|
| Oh I'm going to go to each of these in detail into little slides.
|
|
|
| 31
|
| 00:02:35,590 --> 00:02:39,790
|
| But for now conceptually So this is what it is.
|
|
|
| 32
|
| 00:02:39,820 --> 00:02:42,790
|
| Remember in neural nets the head in Podolia and hidden layers.
|
|
|
| 33
|
| 00:02:43,030 --> 00:02:49,330
|
| Well these are the hidden layers in a in a convolutional and that's what happens first is that we have
|
|
|
| 34
|
| 00:02:49,330 --> 00:02:54,630
|
| what is called convolution where Realto activation unit is applied to the convolution here.
|
|
|
| 35
|
| 00:02:55,210 --> 00:03:00,280
|
| And this convolution here it slides across image here producing values here.
|
|
|
| 36
|
| 00:03:00,490 --> 00:03:02,680
|
| Don't worry if you don't understand this just yet.
|
|
|
| 37
|
| 00:03:02,680 --> 00:03:06,990
|
| I'm going to go into detail in each one later on.
|
|
|
| 38
|
| 00:03:07,270 --> 00:03:10,150
|
| So just feel free to just basically look at this.
|
|
|
| 39
|
| 00:03:10,150 --> 00:03:13,050
|
| Get familiar with the terms and diagram how it looks.
|
|
|
| 40
|
| 00:03:13,210 --> 00:03:15,420
|
| But you don't actually have to know what these are yet.
|
|
|
| 41
|
| 00:03:15,460 --> 00:03:18,760
|
| So this is what convolutional neural nets look like.
|
|
|
| 42
|
| 00:03:19,850 --> 00:03:22,170
|
| So why should you leave.
|
|
|
| 43
|
| 00:03:22,250 --> 00:03:25,000
|
| I didn't mention truly is here before.
|
|
|
| 44
|
| 00:03:25,310 --> 00:03:30,710
|
| But you can see here that there are depths of Leia's stacks going to sway in this way.
|
|
|
| 45
|
| 00:03:30,980 --> 00:03:34,720
|
| Whereas New York and that's represented in north a flat diagram here.
|
|
|
| 46
|
| 00:03:35,090 --> 00:03:44,240
|
| So is allow us to use convolutions that least image features and by living image features you'll get
|
|
|
| 47
|
| 00:03:44,260 --> 00:03:46,810
|
| and so on what images are very soon.
|
|
|
| 48
|
| 00:03:48,230 --> 00:03:53,450
|
| And therefore this allows us to use Folles we you know deep that look allowing for significantly faster
|
|
|
| 49
|
| 00:03:53,450 --> 00:03:55,490
|
| training and a lot less parameters to.
|
|
|
| 50
|
| 00:03:57,440 --> 00:04:03,890
|
| So this is an arrangement of how neural nets use truly Vol. 1 am and when I said truly volume I'm referring
|
|
|
| 51
|
| 00:04:03,890 --> 00:04:11,360
|
| to the input to being in three dimensions here because we have height wit an adept call it up here.
|
|
|
| 52
|
| 00:04:11,430 --> 00:04:12,820
|
| This is for color image here.
|
|
|
| 53
|
| 00:04:13,020 --> 00:04:20,050
|
| So input go back to it here is effectively a true dimensional input that is fed into here.
|
|
|
| 54
|
| 00:04:20,390 --> 00:04:22,510
|
| And these are all in different dimensions as well.
|
|
|
| 55
|
| 00:04:28,350 --> 00:04:32,590
|
| So just effectively go is a as often for CNN.
|
|
|
| 56
|
| 00:04:32,760 --> 00:04:40,140
|
| We have the input layer or convolutional there are real new layer pooling and fully connectedly are
|
|
|
| 57
|
| 00:04:40,320 --> 00:04:40,760
|
| here.
|
|
|
| 58
|
| 00:04:41,130 --> 00:04:44,560
|
| That's it basically they can get more complex later on.
|
|
|
| 59
|
| 00:04:44,790 --> 00:04:48,420
|
| But these are the Corley's of CNN.
|
|
|
| 60
|
| 00:04:48,450 --> 00:04:50,350
|
| So why is it called CNN.
|
|
|
| 61
|
| 00:04:50,430 --> 00:04:51,130
|
| Well it's all.
|
|
|
| 62
|
| 00:04:51,310 --> 00:04:53,750
|
| Liam convolutional live here.
|
|
|
| 63
|
| 00:04:53,940 --> 00:04:59,100
|
| That's this one here also and this one here comes in sequences as well you can have.
|
|
|
| 64
|
| 00:04:59,310 --> 00:05:04,750
|
| This is how we adapt to CNN by having multiple stacked convolutional layers.
|
|
|
| 65
|
| 00:05:05,070 --> 00:05:08,960
|
| Now convolution is what allows us to actually learn image features.
|
|
|
| 66
|
| 00:05:09,300 --> 00:05:12,540
|
| And I will explain to you again what image features are.
|
|
|
| 67
|
| 00:05:13,160 --> 00:05:19,590
|
| But basically what I would classify uses to sort of detect what's in an image but what exactly is a
|
|
|
| 68
|
| 00:05:19,590 --> 00:05:20,450
|
| convolution.
|
|
|
| 69
|
| 00:05:20,730 --> 00:05:23,710
|
| Well let's find out and chopped up to 7.2.
|
| |