AI_DL_Assignment / 7. Convolutional Neural Networks (CNNs) Explained /3. Convolutions & Image Features.srt
| 1 | |
| 00:00:00,670 --> 00:00:06,470 | |
| OK so in 7.2 we are going to learn what convolutions all and what image features. | |
| 2 | |
| 00:00:06,630 --> 00:00:08,960 | |
| So let's get. | |
| 3 | |
| 00:00:09,490 --> 00:00:11,460 | |
| So before we dive into convolutions. | |
| 4 | |
| 00:00:11,470 --> 00:00:14,300 | |
| Let's take a look at what image features actually are. | |
| 5 | |
| 00:00:14,830 --> 00:00:21,280 | |
| So when I say image features I'm talking about interesting things in an image as a kind of a vague term | |
| 6 | |
| 00:00:21,450 --> 00:00:25,590 | |
| but it basically encapsulates things like edges colors patterns and shapes. | |
| 7 | |
| 00:00:25,600 --> 00:00:26,690 | |
| This is a dog here. | |
| 8 | |
| 00:00:26,700 --> 00:00:27,430 | |
| It has been. | |
| 9 | |
| 00:00:27,520 --> 00:00:30,490 | |
| The edges have been extracted using canny edge detector. | |
| 10 | |
| 00:00:30,910 --> 00:00:38,200 | |
| So image feature is basically just that just basically one narrow thing that we find interesting in | |
| 11 | |
| 00:00:38,200 --> 00:00:43,410 | |
| an image by narrow I mean like a type of category like edges or colors. | |
| 12 | |
| 00:00:43,440 --> 00:00:53,270 | |
| This could have easily been a brown color also or blue or whatever putting the color or shape so before | |
| 13 | |
| 00:00:53,270 --> 00:00:58,650 | |
| CNN's came into the picture scientists did feature engineering manually. | |
| 14 | |
| 00:00:58,940 --> 00:01:05,390 | |
| You see how I just mentioned that these are these are edges or colored extractors or patterns or shapes. | |
| 15 | |
| 00:01:05,390 --> 00:01:10,730 | |
| Now what we did is scientists and I actually had to do this at one time was extract different features | |
| 16 | |
| 00:01:10,730 --> 00:01:17,300 | |
| such as histogram of radiance color histograms by intercessions means a structural image. | |
| 17 | |
| 00:01:17,630 --> 00:01:18,740 | |
| Many different things. | |
| 18 | |
| 00:01:18,860 --> 00:01:23,930 | |
| And it was tedious to actually do this in engineering because a lot of times you're just kind of like | |
| 19 | |
| 00:01:23,930 --> 00:01:27,890 | |
| messing around trying different things and you don't even know what works. | |
| 20 | |
| 00:01:27,920 --> 00:01:33,410 | |
| And in the end because you don't have a good complicated model that does non-linear representations | |
| 21 | |
| 00:01:33,440 --> 00:01:39,190 | |
| Well you still end up getting basically not that great accuracy. | |
| 22 | |
| 00:01:41,250 --> 00:01:47,490 | |
| So decent examples of filters learned by this guy here in this publication. | |
| 23 | |
| 00:01:47,490 --> 00:01:53,700 | |
| This has been tech here multiple edges actors with whites in black and white as you brush stripes all | |
| 24 | |
| 00:01:53,700 --> 00:01:54,550 | |
| over here. | |
| 25 | |
| 00:01:54,780 --> 00:01:56,400 | |
| Different color patterns together. | |
| 26 | |
| 00:01:56,590 --> 00:02:03,180 | |
| Now one of the image features here exactly what I'm talking what image features but what does it have | |
| 27 | |
| 00:02:03,180 --> 00:02:04,680 | |
| to do with convolutions now. | |
| 28 | |
| 00:02:04,890 --> 00:02:08,160 | |
| So what are conditions now a convolution. | |
| 29 | |
| 00:02:08,160 --> 00:02:14,010 | |
| Before I even tell you how it relates to features convolution is effectively a mathematical two that | |
| 30 | |
| 00:02:14,010 --> 00:02:18,600 | |
| describes a process of combining two functions to produce a tiered function. | |
| 31 | |
| 00:02:18,600 --> 00:02:24,810 | |
| Now that sounds kind of vague until I tell you the function is a feature map and a feature map is effectively | |
| 32 | |
| 00:02:24,810 --> 00:02:25,870 | |
| these things here. | |
| 33 | |
| 00:02:26,220 --> 00:02:30,820 | |
| So now we imagine we're applying convolution to an image. | |
| 34 | |
| 00:02:30,860 --> 00:02:38,750 | |
| So that's playing a process of two functions so we apply to convolutional to an image to get them up. | |
| 35 | |
| 00:02:38,840 --> 00:02:44,560 | |
| So convolution is an action of using a filter or Kunaal we use both interchangeably in discourse and | |
| 36 | |
| 00:02:44,560 --> 00:02:47,180 | |
| in research and Terry. | |
| 37 | |
| 00:02:47,190 --> 00:02:53,300 | |
| So it's applied to the input and I will keast input being input image and the convolutions. | |
| 38 | |
| 00:02:53,370 --> 00:02:55,110 | |
| This is basically the convolutional process. | |
| 39 | |
| 00:02:55,110 --> 00:02:57,240 | |
| Now let me just go back to the slide here. | |
| 40 | |
| 00:02:57,390 --> 00:03:03,870 | |
| So I just want to reiterate that in pluckiest input which is the first convolution here input is applied | |
| 41 | |
| 00:03:03,960 --> 00:03:06,450 | |
| to what function called philatelic. | |
| 42 | |
| 00:03:06,750 --> 00:03:15,530 | |
| And that is if you chinup So the convolutional process is basically executed by sliding the filter that's | |
| 43 | |
| 00:03:15,530 --> 00:03:22,820 | |
| a filter function over the input image and this slutting process is basically a simple multiplication | |
| 44 | |
| 00:03:22,910 --> 00:03:27,940 | |
| matrix multiplication or dot product over to produce Atid function. | |
| 45 | |
| 00:03:27,950 --> 00:03:28,990 | |
| So how is it done. | |
| 46 | |
| 00:03:29,150 --> 00:03:32,090 | |
| So imagine this is basically an input image. | |
| 47 | |
| 00:03:32,090 --> 00:03:38,330 | |
| This is a 2D is not truly in reality but this is for explanation purposes and this is a convolution | |
| 48 | |
| 00:03:38,330 --> 00:03:41,570 | |
| filter here to some values in a smaller matrix. | |
| 49 | |
| 00:03:41,840 --> 00:03:43,870 | |
| And this is the output feature map. | |
| 50 | |
| 00:03:43,880 --> 00:03:47,220 | |
| So what's going to happen in the convolution process. | |
| 51 | |
| 00:03:47,340 --> 00:03:58,070 | |
| Well we're going to basically slide this image here over go back here over this area here and then again | |
| 52 | |
| 00:03:58,280 --> 00:04:00,490 | |
| and again and you'll see it slowly. | |
| 53 | |
| 00:04:00,500 --> 00:04:06,020 | |
| So when I mean convolve that kind of thing means we basically multiply them here. | |
| 54 | |
| 00:04:06,200 --> 00:04:12,050 | |
| So as you can see it devalues 1 0 1 1 0 0 0 0 1 1. | |
| 55 | |
| 00:04:12,050 --> 00:04:20,090 | |
| And these values here with 0 1 0 1 0 above or below that I actually didn't use these values mainly for | |
| 56 | |
| 00:04:20,090 --> 00:04:21,760 | |
| simplicity purposes. | |
| 57 | |
| 00:04:21,760 --> 00:04:26,070 | |
| Search engines to these values here to make this calculation far easier for us. | |
| 58 | |
| 00:04:26,090 --> 00:04:33,980 | |
| So by multiplying these two together we get zero by 1 1 by 0 0 by 1. | |
| 59 | |
| 00:04:33,980 --> 00:04:37,490 | |
| You'll see it here 1 by 0 0 by 1 1 0. | |
| 60 | |
| 00:04:37,730 --> 00:04:42,040 | |
| And so on and so on and we just add it up and we get two. | |
| 61 | |
| 00:04:42,280 --> 00:04:46,050 | |
| And that forms of force I put future in this box here. | |
| 62 | |
| 00:04:46,400 --> 00:04:54,350 | |
| So how many times can this tree by tree Matrix this slidden this or even that we're going to be slighted | |
| 63 | |
| 00:04:54,470 --> 00:04:56,220 | |
| over this. | |
| 64 | |
| 00:04:56,220 --> 00:04:58,310 | |
| This image here. | |
| 65 | |
| 00:04:58,310 --> 00:05:00,500 | |
| So imagine this the good here. | |
| 66 | |
| 00:05:00,740 --> 00:05:05,990 | |
| We have one box here and we can shift it again here too just like this here. | |
| 67 | |
| 00:05:06,350 --> 00:05:08,020 | |
| And then tree again. | |
| 68 | |
| 00:05:08,420 --> 00:05:15,260 | |
| So by studying it up this box we fill up now a second value of FICCI matrix and you don't have to add | |
| 69 | |
| 00:05:15,260 --> 00:05:21,100 | |
| one but imagine it as you'd want to hear and then start again at a second row here. | |
| 70 | |
| 00:05:21,410 --> 00:05:28,940 | |
| So we have one here to here tree here and then again four five six. | |
| 71 | |
| 00:05:28,940 --> 00:05:29,530 | |
| All right. | |
| 72 | |
| 00:05:29,780 --> 00:05:34,030 | |
| So we have basically enough values. | |
| 73 | |
| 00:05:34,300 --> 00:05:39,030 | |
| So we have in each row one two tree and tree times can misled across. | |
| 74 | |
| 00:05:39,050 --> 00:05:40,880 | |
| We have nine values in all. | |
| 75 | |
| 00:05:41,300 --> 00:05:47,450 | |
| So we can actually fill out this entire thing by sliding it across nine times. | |
| 76 | |
| 00:05:47,600 --> 00:05:49,200 | |
| That's how we build those features. | |
| 77 | |
| 00:05:49,400 --> 00:05:56,630 | |
| So by using what I would call tree by tree filter convolution kernel we produce feature map tree by | |
| 78 | |
| 00:05:56,630 --> 00:06:00,260 | |
| tree where it produces the filters here. | |
| 79 | |
| 00:06:00,770 --> 00:06:03,270 | |
| Now you understand this process. | |
| 80 | |
| 00:06:03,350 --> 00:06:04,930 | |
| Basically it's simple not. | |
| 81 | |
| 00:06:05,030 --> 00:06:11,870 | |
| But what exactly are effects of doing this and why is this important so fiercely. | |
| 82 | |
| 00:06:12,050 --> 00:06:22,700 | |
| Depending on the values of the kernel that was the killer being this blue box here on pollution we produce | |
| 83 | |
| 00:06:22,700 --> 00:06:27,860 | |
| different maps obviously because we can have different guilds with different values and they'll all | |
| 84 | |
| 00:06:27,860 --> 00:06:29,720 | |
| produce different feature maps. | |
| 85 | |
| 00:06:29,720 --> 00:06:36,050 | |
| So playing an artist is skill but as we just saw convolving with different Canal's produces interesting | |
| 86 | |
| 00:06:36,050 --> 00:06:38,890 | |
| feature maps that can be used to detect different features. | |
| 87 | |
| 00:06:38,900 --> 00:06:40,240 | |
| This is what makes it important. | |
| 88 | |
| 00:06:40,310 --> 00:06:49,670 | |
| So imagine we have several filters here each with different sets of values here and we're sliding it | |
| 89 | |
| 00:06:49,700 --> 00:06:50,290 | |
| over here. | |
| 90 | |
| 00:06:50,290 --> 00:06:51,990 | |
| We're producing different Fincham apps. | |
| 91 | |
| 00:06:52,250 --> 00:06:57,890 | |
| So what this means now is that we've now processed input image into basically features that have been | |
| 92 | |
| 00:06:57,890 --> 00:06:58,710 | |
| extracted. | |
| 93 | |
| 00:06:59,980 --> 00:07:00,920 | |
| So let's keep going. | |
| 94 | |
| 00:07:02,000 --> 00:07:08,570 | |
| So it's important to know the convolution keeps a special kinship between pixels by linning image features | |
| 95 | |
| 00:07:08,630 --> 00:07:11,120 | |
| over the small segments we pass over. | |
| 96 | |
| 00:07:11,120 --> 00:07:17,530 | |
| This means that convolution even though it's reduced in size here it's still sort of retained some for | |
| 97 | |
| 00:07:17,540 --> 00:07:18,470 | |
| spatial information. | |
| 98 | |
| 00:07:18,470 --> 00:07:22,400 | |
| In this large image just now it's in a more compressed type form. | |
| 99 | |
| 00:07:25,770 --> 00:07:32,070 | |
| So these are all examples of Kindles here basically identical kernel does nothing. | |
| 100 | |
| 00:07:32,250 --> 00:07:38,460 | |
| We have education canals that simply having these values in the signal changes an input image into this | |
| 101 | |
| 00:07:38,740 --> 00:07:40,590 | |
| is quite quite remarkable. | |
| 102 | |
| 00:07:40,590 --> 00:07:44,660 | |
| But you can actually write some code or try an open CV and see for yourself. | |
| 103 | |
| 00:07:44,670 --> 00:07:50,910 | |
| You can specify Cardinals find you in kennels and open C-v and runs in one form solutions and produce | |
| 104 | |
| 00:07:51,050 --> 00:07:53,910 | |
| lose lose sharpen images. | |
| 105 | |
| 00:07:53,920 --> 00:07:55,730 | |
| Detection is actually pretty cool. | |
| 106 | |
| 00:07:56,090 --> 00:08:03,000 | |
| So let's take a look at an example of a feature applied convolution kernel applied to an image that | |
| 107 | |
| 00:08:03,000 --> 00:08:04,690 | |
| extracts features here. | |
| 108 | |
| 00:08:04,770 --> 00:08:11,970 | |
| So this is an example gif I've taken from you can actually see how when they applied this and slide | |
| 109 | |
| 00:08:11,970 --> 00:08:15,750 | |
| across the image or what the actual convolutional filter output looks like. | |
| 110 | |
| 00:08:15,960 --> 00:08:17,670 | |
| So this is the edge to here. | |
| 111 | |
| 00:08:17,750 --> 00:08:19,300 | |
| And the other a.. | |
| 112 | |
| 00:08:19,350 --> 00:08:20,760 | |
| It's actually pretty cool. | |
| 113 | |
| 00:08:20,760 --> 00:08:22,390 | |
| Look at it again there. | |
| 114 | |
| 00:08:23,250 --> 00:08:26,500 | |
| And the other thing too they're awesome. | |
| 115 | |
| 00:08:28,100 --> 00:08:30,690 | |
| So now as you know that was just wonderful. | |
| 116 | |
| 00:08:30,690 --> 00:08:35,520 | |
| So we need many filters in our CNN's Elise as within reason. | |
| 117 | |
| 00:08:35,520 --> 00:08:40,290 | |
| You don't want to do too much although there's nothing actually wrong with doing too much just increases | |
| 118 | |
| 00:08:40,290 --> 00:08:47,250 | |
| your training time and model complexity and it may be redundant depending on your image data set. | |
| 119 | |
| 00:08:47,280 --> 00:08:50,040 | |
| So let's assume we're using 12 filters. | |
| 120 | |
| 00:08:50,040 --> 00:08:56,180 | |
| How do we actually visualize how that CNN actually looks at here. | |
| 121 | |
| 00:08:56,190 --> 00:09:04,460 | |
| So imagine we have an image that size 28 by 28 and tree tree dimensions red green and blue. | |
| 122 | |
| 00:09:04,530 --> 00:09:06,360 | |
| So that's why it has some depth here. | |
| 123 | |
| 00:09:06,960 --> 00:09:11,930 | |
| And this is a convolutional Salto which is basically the size here. | |
| 124 | |
| 00:09:12,000 --> 00:09:18,270 | |
| One by one by one that's actually the opposite story of the congressional filter. | |
| 125 | |
| 00:09:18,290 --> 00:09:21,310 | |
| Each grid this is our congressional filter box here. | |
| 126 | |
| 00:09:21,320 --> 00:09:22,160 | |
| All right. | |
| 127 | |
| 00:09:22,310 --> 00:09:25,690 | |
| So we're actually doing a one to one mapping of a convolutional filter. | |
| 128 | |
| 00:09:26,120 --> 00:09:32,390 | |
| So it's 28 also 28 by 28 by one but now we're using 12 filters here. | |
| 129 | |
| 00:09:32,510 --> 00:09:41,540 | |
| So each yellow block here represents a single conventional filter and there are 12 blocks stacked here. | |
| 130 | |
| 00:09:41,900 --> 00:09:47,580 | |
| So what happens is that for each filter We slide it across filler fill our values here. | |
| 131 | |
| 00:09:48,790 --> 00:09:55,020 | |
| And basically 12 times and we get a box of convolution or a box of filters here. | |
| 132 | |
| 00:09:55,060 --> 00:09:58,820 | |
| If you come up s.c this is a box of maps. | |
| 133 | |
| 00:09:58,840 --> 00:10:01,380 | |
| This is all convolution kernel matrix. | |
| 134 | |
| 00:10:01,510 --> 00:10:05,920 | |
| And in case you're wondering because it actually just slipped my mind when I was explaining this to | |
| 135 | |
| 00:10:05,920 --> 00:10:10,230 | |
| you because I did this slide a couple of weeks before explaining that in this video. | |
| 136 | |
| 00:10:10,330 --> 00:10:18,430 | |
| Now you noticed that before we had a filter that was say strawberry tree and reproduce a small convolution | |
| 137 | |
| 00:10:18,490 --> 00:10:21,160 | |
| of smaller Fincham up here. | |
| 138 | |
| 00:10:21,160 --> 00:10:25,310 | |
| However in this example I'm producing basically the same size which I'm up. | |
| 139 | |
| 00:10:25,330 --> 00:10:32,230 | |
| And this is actually what we need to do in most cases you don't have to but it'll actually explain to | |
| 140 | |
| 00:10:32,230 --> 00:10:34,880 | |
| you how we actually end up with the same size image later on. | |
| 141 | |
| 00:10:34,990 --> 00:10:40,330 | |
| But for now just assume we run this let's say this is a tree by a tree or five by five convolution here | |
| 142 | |
| 00:10:40,870 --> 00:10:47,440 | |
| we get the upper tier and we fill in our matrix here off each Emap. | |
| 143 | |
| 00:10:47,560 --> 00:10:52,660 | |
| So as I can see this is how it filters look stacked up visually see it quite clear there. | |
| 144 | |
| 00:10:54,050 --> 00:11:00,620 | |
| So the upwards of all conclusions from last Lavey sort of applying 12 filters of size tree by tree tree | |
| 145 | |
| 00:11:01,310 --> 00:11:04,830 | |
| to an image which was of 28 but we have a tree. | |
| 146 | |
| 00:11:04,840 --> 00:11:08,750 | |
| We produce 12 feature maps also called Activision maps. | |
| 147 | |
| 00:11:08,750 --> 00:11:15,060 | |
| Now these options are stacked together and treated as one big treaty matrix of output size 28 by 28 | |
| 148 | |
| 00:11:15,080 --> 00:11:16,170 | |
| by 12. | |
| 149 | |
| 00:11:16,520 --> 00:11:20,190 | |
| And this is important this go back to this. | |
| 150 | |
| 00:11:20,390 --> 00:11:27,630 | |
| This now forms the input this big matrix here to our next layer in the center. | |
| 151 | |
| 00:11:27,650 --> 00:11:32,780 | |
| So now let's talk more about what these future maps are Activision maps actually are and how they represent | |
| 152 | |
| 00:11:32,810 --> 00:11:34,190 | |
| image features. | |
| 153 | |
| 00:11:34,220 --> 00:11:42,380 | |
| So now each cell in it's seldomly meaning each one by one point you know activation map metrics is considered | |
| 154 | |
| 00:11:42,410 --> 00:11:45,100 | |
| basically a feature extraction or a single neuron. | |
| 155 | |
| 00:11:45,470 --> 00:11:50,470 | |
| And that single neuron is basically looking at a specific region as it slides over the image. | |
| 156 | |
| 00:11:50,470 --> 00:11:54,500 | |
| What specific feature I should say as it slides over the image. | |
| 157 | |
| 00:11:54,500 --> 00:12:01,460 | |
| So we have a basically a feature map of the 28 by 20 It's like we just did that future map basically | |
| 158 | |
| 00:12:01,460 --> 00:12:07,790 | |
| has each neuron each cell basically activates depending on what it sees in the image. | |
| 159 | |
| 00:12:08,330 --> 00:12:15,030 | |
| And in the beginning when your own network that of course CNN I should say I would see an old convolutional | |
| 160 | |
| 00:12:15,170 --> 00:12:22,400 | |
| is basically basically a low level feature detectors and low level feature detectors basically looking | |
| 161 | |
| 00:12:22,400 --> 00:12:24,650 | |
| for simple things and images simple things. | |
| 162 | |
| 00:12:24,650 --> 00:12:29,730 | |
| Meaning like maybe edges maybe specific colors maybe a blob here and there. | |
| 163 | |
| 00:12:29,750 --> 00:12:36,200 | |
| However if we have consecutive concatenated convolutional Lia's as in deep and that works with Ruelas | |
| 164 | |
| 00:12:36,650 --> 00:12:43,400 | |
| of convolutional layers we can start detecting more special features like the thius of a cat what the | |
| 165 | |
| 00:12:43,400 --> 00:12:46,160 | |
| shape of a bicycle or the shape of a fetus. | |
| 166 | |
| 00:12:46,250 --> 00:12:52,610 | |
| So that's how CNN's actually used these convolutional feature maps to detect features. | |
| 167 | |
| 00:12:52,800 --> 00:12:58,450 | |
| So you've seen so far we just use a standard by an arbitrary filter size of tree by tree. | |
| 168 | |
| 00:12:58,860 --> 00:13:01,200 | |
| But can we use other sizes. | |
| 169 | |
| 00:13:01,210 --> 00:13:08,310 | |
| And how did you affect the convolution size and the future parts of the parts of the convolutional neural | |
| 170 | |
| 00:13:08,310 --> 00:13:09,410 | |
| net. | |
| 171 | |
| 00:13:09,420 --> 00:13:13,290 | |
| So basically that's called tweaking the hyper pyper parameters. | |
| 172 | |
| 00:13:13,290 --> 00:13:18,870 | |
| So the next section at Section 7.3 we look at dept stride and putting. | |