Add files using upload-large-folder tool

d157f08 verified 3 months ago

17.8 kB

	1
	00:00:00,670 --> 00:00:06,470
	OK so in 7.2 we are going to learn what convolutions all and what image features.

	2
	00:00:06,630 --> 00:00:08,960
	So let's get.

	3
	00:00:09,490 --> 00:00:11,460
	So before we dive into convolutions.

	4
	00:00:11,470 --> 00:00:14,300
	Let's take a look at what image features actually are.

	5
	00:00:14,830 --> 00:00:21,280
	So when I say image features I'm talking about interesting things in an image as a kind of a vague term

	6
	00:00:21,450 --> 00:00:25,590
	but it basically encapsulates things like edges colors patterns and shapes.

	7
	00:00:25,600 --> 00:00:26,690
	This is a dog here.

	8
	00:00:26,700 --> 00:00:27,430
	It has been.

	9
	00:00:27,520 --> 00:00:30,490
	The edges have been extracted using canny edge detector.

	10
	00:00:30,910 --> 00:00:38,200
	So image feature is basically just that just basically one narrow thing that we find interesting in

	11
	00:00:38,200 --> 00:00:43,410
	an image by narrow I mean like a type of category like edges or colors.

	12
	00:00:43,440 --> 00:00:53,270
	This could have easily been a brown color also or blue or whatever putting the color or shape so before

	13
	00:00:53,270 --> 00:00:58,650
	CNN's came into the picture scientists did feature engineering manually.

	14
	00:00:58,940 --> 00:01:05,390
	You see how I just mentioned that these are these are edges or colored extractors or patterns or shapes.

	15
	00:01:05,390 --> 00:01:10,730
	Now what we did is scientists and I actually had to do this at one time was extract different features

	16
	00:01:10,730 --> 00:01:17,300
	such as histogram of radiance color histograms by intercessions means a structural image.

	17
	00:01:17,630 --> 00:01:18,740
	Many different things.

	18
	00:01:18,860 --> 00:01:23,930
	And it was tedious to actually do this in engineering because a lot of times you're just kind of like

	19
	00:01:23,930 --> 00:01:27,890
	messing around trying different things and you don't even know what works.

	20
	00:01:27,920 --> 00:01:33,410
	And in the end because you don't have a good complicated model that does non-linear representations

	21
	00:01:33,440 --> 00:01:39,190
	Well you still end up getting basically not that great accuracy.

	22
	00:01:41,250 --> 00:01:47,490
	So decent examples of filters learned by this guy here in this publication.

	23
	00:01:47,490 --> 00:01:53,700
	This has been tech here multiple edges actors with whites in black and white as you brush stripes all

	24
	00:01:53,700 --> 00:01:54,550
	over here.

	25
	00:01:54,780 --> 00:01:56,400
	Different color patterns together.

	26
	00:01:56,590 --> 00:02:03,180
	Now one of the image features here exactly what I'm talking what image features but what does it have

	27
	00:02:03,180 --> 00:02:04,680
	to do with convolutions now.

	28
	00:02:04,890 --> 00:02:08,160
	So what are conditions now a convolution.

	29
	00:02:08,160 --> 00:02:14,010
	Before I even tell you how it relates to features convolution is effectively a mathematical two that

	30
	00:02:14,010 --> 00:02:18,600
	describes a process of combining two functions to produce a tiered function.

	31
	00:02:18,600 --> 00:02:24,810
	Now that sounds kind of vague until I tell you the function is a feature map and a feature map is effectively

	32
	00:02:24,810 --> 00:02:25,870
	these things here.

	33
	00:02:26,220 --> 00:02:30,820
	So now we imagine we're applying convolution to an image.

	34
	00:02:30,860 --> 00:02:38,750
	So that's playing a process of two functions so we apply to convolutional to an image to get them up.

	35
	00:02:38,840 --> 00:02:44,560
	So convolution is an action of using a filter or Kunaal we use both interchangeably in discourse and

	36
	00:02:44,560 --> 00:02:47,180
	in research and Terry.

	37
	00:02:47,190 --> 00:02:53,300
	So it's applied to the input and I will keast input being input image and the convolutions.

	38
	00:02:53,370 --> 00:02:55,110
	This is basically the convolutional process.

	39
	00:02:55,110 --> 00:02:57,240
	Now let me just go back to the slide here.

	40
	00:02:57,390 --> 00:03:03,870
	So I just want to reiterate that in pluckiest input which is the first convolution here input is applied

	41
	00:03:03,960 --> 00:03:06,450
	to what function called philatelic.

	42
	00:03:06,750 --> 00:03:15,530
	And that is if you chinup So the convolutional process is basically executed by sliding the filter that's

	43
	00:03:15,530 --> 00:03:22,820
	a filter function over the input image and this slutting process is basically a simple multiplication

	44
	00:03:22,910 --> 00:03:27,940
	matrix multiplication or dot product over to produce Atid function.

	45
	00:03:27,950 --> 00:03:28,990
	So how is it done.

	46
	00:03:29,150 --> 00:03:32,090
	So imagine this is basically an input image.

	47
	00:03:32,090 --> 00:03:38,330
	This is a 2D is not truly in reality but this is for explanation purposes and this is a convolution

	48
	00:03:38,330 --> 00:03:41,570
	filter here to some values in a smaller matrix.

	49
	00:03:41,840 --> 00:03:43,870
	And this is the output feature map.

	50
	00:03:43,880 --> 00:03:47,220
	So what's going to happen in the convolution process.

	51
	00:03:47,340 --> 00:03:58,070
	Well we're going to basically slide this image here over go back here over this area here and then again

	52
	00:03:58,280 --> 00:04:00,490
	and again and you'll see it slowly.

	53
	00:04:00,500 --> 00:04:06,020
	So when I mean convolve that kind of thing means we basically multiply them here.

	54
	00:04:06,200 --> 00:04:12,050
	So as you can see it devalues 1 0 1 1 0 0 0 0 1 1.

	55
	00:04:12,050 --> 00:04:20,090
	And these values here with 0 1 0 1 0 above or below that I actually didn't use these values mainly for

	56
	00:04:20,090 --> 00:04:21,760
	simplicity purposes.

	57
	00:04:21,760 --> 00:04:26,070
	Search engines to these values here to make this calculation far easier for us.

	58
	00:04:26,090 --> 00:04:33,980
	So by multiplying these two together we get zero by 1 1 by 0 0 by 1.

	59
	00:04:33,980 --> 00:04:37,490
	You'll see it here 1 by 0 0 by 1 1 0.

	60
	00:04:37,730 --> 00:04:42,040
	And so on and so on and we just add it up and we get two.

	61
	00:04:42,280 --> 00:04:46,050
	And that forms of force I put future in this box here.

	62
	00:04:46,400 --> 00:04:54,350
	So how many times can this tree by tree Matrix this slidden this or even that we're going to be slighted

	63
	00:04:54,470 --> 00:04:56,220
	over this.

	64
	00:04:56,220 --> 00:04:58,310
	This image here.

	65
	00:04:58,310 --> 00:05:00,500
	So imagine this the good here.

	66
	00:05:00,740 --> 00:05:05,990
	We have one box here and we can shift it again here too just like this here.

	67
	00:05:06,350 --> 00:05:08,020
	And then tree again.

	68
	00:05:08,420 --> 00:05:15,260
	So by studying it up this box we fill up now a second value of FICCI matrix and you don't have to add

	69
	00:05:15,260 --> 00:05:21,100
	one but imagine it as you'd want to hear and then start again at a second row here.

	70
	00:05:21,410 --> 00:05:28,940
	So we have one here to here tree here and then again four five six.

	71
	00:05:28,940 --> 00:05:29,530
	All right.

	72
	00:05:29,780 --> 00:05:34,030
	So we have basically enough values.

	73
	00:05:34,300 --> 00:05:39,030
	So we have in each row one two tree and tree times can misled across.

	74
	00:05:39,050 --> 00:05:40,880
	We have nine values in all.

	75
	00:05:41,300 --> 00:05:47,450
	So we can actually fill out this entire thing by sliding it across nine times.

	76
	00:05:47,600 --> 00:05:49,200
	That's how we build those features.

	77
	00:05:49,400 --> 00:05:56,630
	So by using what I would call tree by tree filter convolution kernel we produce feature map tree by

	78
	00:05:56,630 --> 00:06:00,260
	tree where it produces the filters here.

	79
	00:06:00,770 --> 00:06:03,270
	Now you understand this process.

	80
	00:06:03,350 --> 00:06:04,930
	Basically it's simple not.

	81
	00:06:05,030 --> 00:06:11,870
	But what exactly are effects of doing this and why is this important so fiercely.

	82
	00:06:12,050 --> 00:06:22,700
	Depending on the values of the kernel that was the killer being this blue box here on pollution we produce

	83
	00:06:22,700 --> 00:06:27,860
	different maps obviously because we can have different guilds with different values and they'll all

	84
	00:06:27,860 --> 00:06:29,720
	produce different feature maps.

	85
	00:06:29,720 --> 00:06:36,050
	So playing an artist is skill but as we just saw convolving with different Canal's produces interesting

	86
	00:06:36,050 --> 00:06:38,890
	feature maps that can be used to detect different features.

	87
	00:06:38,900 --> 00:06:40,240
	This is what makes it important.

	88
	00:06:40,310 --> 00:06:49,670
	So imagine we have several filters here each with different sets of values here and we're sliding it

	89
	00:06:49,700 --> 00:06:50,290
	over here.

	90
	00:06:50,290 --> 00:06:51,990
	We're producing different Fincham apps.

	91
	00:06:52,250 --> 00:06:57,890
	So what this means now is that we've now processed input image into basically features that have been

	92
	00:06:57,890 --> 00:06:58,710
	extracted.

	93
	00:06:59,980 --> 00:07:00,920
	So let's keep going.

	94
	00:07:02,000 --> 00:07:08,570
	So it's important to know the convolution keeps a special kinship between pixels by linning image features

	95
	00:07:08,630 --> 00:07:11,120
	over the small segments we pass over.

	96
	00:07:11,120 --> 00:07:17,530
	This means that convolution even though it's reduced in size here it's still sort of retained some for

	97
	00:07:17,540 --> 00:07:18,470
	spatial information.

	98
	00:07:18,470 --> 00:07:22,400
	In this large image just now it's in a more compressed type form.

	99
	00:07:25,770 --> 00:07:32,070
	So these are all examples of Kindles here basically identical kernel does nothing.

	100
	00:07:32,250 --> 00:07:38,460
	We have education canals that simply having these values in the signal changes an input image into this

	101
	00:07:38,740 --> 00:07:40,590
	is quite quite remarkable.

	102
	00:07:40,590 --> 00:07:44,660
	But you can actually write some code or try an open CV and see for yourself.

	103
	00:07:44,670 --> 00:07:50,910
	You can specify Cardinals find you in kennels and open C-v and runs in one form solutions and produce

	104
	00:07:51,050 --> 00:07:53,910
	lose lose sharpen images.

	105
	00:07:53,920 --> 00:07:55,730
	Detection is actually pretty cool.

	106
	00:07:56,090 --> 00:08:03,000
	So let's take a look at an example of a feature applied convolution kernel applied to an image that

	107
	00:08:03,000 --> 00:08:04,690
	extracts features here.

	108
	00:08:04,770 --> 00:08:11,970
	So this is an example gif I've taken from you can actually see how when they applied this and slide

	109
	00:08:11,970 --> 00:08:15,750
	across the image or what the actual convolutional filter output looks like.

	110
	00:08:15,960 --> 00:08:17,670
	So this is the edge to here.

	111
	00:08:17,750 --> 00:08:19,300
	And the other a..

	112
	00:08:19,350 --> 00:08:20,760
	It's actually pretty cool.

	113
	00:08:20,760 --> 00:08:22,390
	Look at it again there.

	114
	00:08:23,250 --> 00:08:26,500
	And the other thing too they're awesome.

	115
	00:08:28,100 --> 00:08:30,690
	So now as you know that was just wonderful.

	116
	00:08:30,690 --> 00:08:35,520
	So we need many filters in our CNN's Elise as within reason.

	117
	00:08:35,520 --> 00:08:40,290
	You don't want to do too much although there's nothing actually wrong with doing too much just increases

	118
	00:08:40,290 --> 00:08:47,250
	your training time and model complexity and it may be redundant depending on your image data set.

	119
	00:08:47,280 --> 00:08:50,040
	So let's assume we're using 12 filters.

	120
	00:08:50,040 --> 00:08:56,180
	How do we actually visualize how that CNN actually looks at here.

	121
	00:08:56,190 --> 00:09:04,460
	So imagine we have an image that size 28 by 28 and tree tree dimensions red green and blue.

	122
	00:09:04,530 --> 00:09:06,360
	So that's why it has some depth here.

	123
	00:09:06,960 --> 00:09:11,930
	And this is a convolutional Salto which is basically the size here.

	124
	00:09:12,000 --> 00:09:18,270
	One by one by one that's actually the opposite story of the congressional filter.

	125
	00:09:18,290 --> 00:09:21,310
	Each grid this is our congressional filter box here.

	126
	00:09:21,320 --> 00:09:22,160
	All right.

	127
	00:09:22,310 --> 00:09:25,690
	So we're actually doing a one to one mapping of a convolutional filter.

	128
	00:09:26,120 --> 00:09:32,390
	So it's 28 also 28 by 28 by one but now we're using 12 filters here.

	129
	00:09:32,510 --> 00:09:41,540
	So each yellow block here represents a single conventional filter and there are 12 blocks stacked here.

	130
	00:09:41,900 --> 00:09:47,580
	So what happens is that for each filter We slide it across filler fill our values here.

	131
	00:09:48,790 --> 00:09:55,020
	And basically 12 times and we get a box of convolution or a box of filters here.

	132
	00:09:55,060 --> 00:09:58,820
	If you come up s.c this is a box of maps.

	133
	00:09:58,840 --> 00:10:01,380
	This is all convolution kernel matrix.

	134
	00:10:01,510 --> 00:10:05,920
	And in case you're wondering because it actually just slipped my mind when I was explaining this to

	135
	00:10:05,920 --> 00:10:10,230
	you because I did this slide a couple of weeks before explaining that in this video.

	136
	00:10:10,330 --> 00:10:18,430
	Now you noticed that before we had a filter that was say strawberry tree and reproduce a small convolution

	137
	00:10:18,490 --> 00:10:21,160
	of smaller Fincham up here.

	138
	00:10:21,160 --> 00:10:25,310
	However in this example I'm producing basically the same size which I'm up.

	139
	00:10:25,330 --> 00:10:32,230
	And this is actually what we need to do in most cases you don't have to but it'll actually explain to

	140
	00:10:32,230 --> 00:10:34,880
	you how we actually end up with the same size image later on.

	141
	00:10:34,990 --> 00:10:40,330
	But for now just assume we run this let's say this is a tree by a tree or five by five convolution here

	142
	00:10:40,870 --> 00:10:47,440
	we get the upper tier and we fill in our matrix here off each Emap.

	143
	00:10:47,560 --> 00:10:52,660
	So as I can see this is how it filters look stacked up visually see it quite clear there.

	144
	00:10:54,050 --> 00:11:00,620
	So the upwards of all conclusions from last Lavey sort of applying 12 filters of size tree by tree tree

	145
	00:11:01,310 --> 00:11:04,830
	to an image which was of 28 but we have a tree.

	146
	00:11:04,840 --> 00:11:08,750
	We produce 12 feature maps also called Activision maps.

	147
	00:11:08,750 --> 00:11:15,060
	Now these options are stacked together and treated as one big treaty matrix of output size 28 by 28

	148
	00:11:15,080 --> 00:11:16,170
	by 12.

	149
	00:11:16,520 --> 00:11:20,190
	And this is important this go back to this.

	150
	00:11:20,390 --> 00:11:27,630
	This now forms the input this big matrix here to our next layer in the center.

	151
	00:11:27,650 --> 00:11:32,780
	So now let's talk more about what these future maps are Activision maps actually are and how they represent

	152
	00:11:32,810 --> 00:11:34,190
	image features.

	153
	00:11:34,220 --> 00:11:42,380
	So now each cell in it's seldomly meaning each one by one point you know activation map metrics is considered

	154
	00:11:42,410 --> 00:11:45,100
	basically a feature extraction or a single neuron.

	155
	00:11:45,470 --> 00:11:50,470
	And that single neuron is basically looking at a specific region as it slides over the image.

	156
	00:11:50,470 --> 00:11:54,500
	What specific feature I should say as it slides over the image.

	157
	00:11:54,500 --> 00:12:01,460
	So we have a basically a feature map of the 28 by 20 It's like we just did that future map basically

	158
	00:12:01,460 --> 00:12:07,790
	has each neuron each cell basically activates depending on what it sees in the image.

	159
	00:12:08,330 --> 00:12:15,030
	And in the beginning when your own network that of course CNN I should say I would see an old convolutional

	160
	00:12:15,170 --> 00:12:22,400
	is basically basically a low level feature detectors and low level feature detectors basically looking

	161
	00:12:22,400 --> 00:12:24,650
	for simple things and images simple things.

	162
	00:12:24,650 --> 00:12:29,730
	Meaning like maybe edges maybe specific colors maybe a blob here and there.

	163
	00:12:29,750 --> 00:12:36,200
	However if we have consecutive concatenated convolutional Lia's as in deep and that works with Ruelas

	164
	00:12:36,650 --> 00:12:43,400
	of convolutional layers we can start detecting more special features like the thius of a cat what the

	165
	00:12:43,400 --> 00:12:46,160
	shape of a bicycle or the shape of a fetus.

	166
	00:12:46,250 --> 00:12:52,610
	So that's how CNN's actually used these convolutional feature maps to detect features.

	167
	00:12:52,800 --> 00:12:58,450
	So you've seen so far we just use a standard by an arbitrary filter size of tree by tree.

	168
	00:12:58,860 --> 00:13:01,200
	But can we use other sizes.

	169
	00:13:01,210 --> 00:13:08,310
	And how did you affect the convolution size and the future parts of the parts of the convolutional neural

	170
	00:13:08,310 --> 00:13:09,410
	net.

	171
	00:13:09,420 --> 00:13:13,290
	So basically that's called tweaking the hyper pyper parameters.

	172
	00:13:13,290 --> 00:13:18,870
	So the next section at Section 7.3 we look at dept stride and putting.