| 1 | |
| 00:00:00,960 --> 00:00:03,370 | |
| So welcome back to chapter eight point one. | |
| 2 | |
| 00:00:03,390 --> 00:00:06,190 | |
| We introduce you to Chris and. | |
| 3 | |
| 00:00:08,000 --> 00:00:09,660 | |
| So what exactly is Karris. | |
| 4 | |
| 00:00:09,660 --> 00:00:15,060 | |
| I've mentioned it countless times on this course but I haven't actually told you precisely what it is. | |
| 5 | |
| 00:00:15,200 --> 00:00:21,170 | |
| So careless is a high level neural network API for Python and it makes constructing neural networks | |
| 6 | |
| 00:00:21,260 --> 00:00:25,370 | |
| all types not just CNNs but all types of neural networks. | |
| 7 | |
| 00:00:25,490 --> 00:00:31,760 | |
| It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things | |
| 8 | |
| 00:00:32,480 --> 00:00:36,770 | |
| configure your loss functions configure your different types of activation functions. | |
| 9 | |
| 00:00:36,770 --> 00:00:38,720 | |
| It's quite nice. | |
| 10 | |
| 00:00:38,870 --> 00:00:45,410 | |
| It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which | |
| 11 | |
| 00:00:45,440 --> 00:00:47,180 | |
| I use use quite a bit of the data. | |
| 12 | |
| 00:00:47,360 --> 00:00:51,690 | |
| However I've now moved intensively and I haven't looked back not because I it is bad. | |
| 13 | |
| 00:00:51,930 --> 00:00:59,000 | |
| Just basically stopped is has stopped being updated now project has pretty much done and closed and | |
| 14 | |
| 00:00:59,000 --> 00:01:02,710 | |
| everyone is probably pretty much adopting senseful now. | |
| 15 | |
| 00:01:03,680 --> 00:01:09,860 | |
| And it was developed by Francois Shuli and he has been a tremendous success in making tippling much | |
| 16 | |
| 00:01:09,860 --> 00:01:11,390 | |
| more accessible to the masses. | |
| 17 | |
| 00:01:12,680 --> 00:01:16,820 | |
| So what is tons of little is said Chris used to tend to flow back in. | |
| 18 | |
| 00:01:16,830 --> 00:01:18,400 | |
| But what exactly is a backhanded. | |
| 19 | |
| 00:01:18,530 --> 00:01:19,370 | |
| OK. | |
| 20 | |
| 00:01:19,620 --> 00:01:27,000 | |
| So then flow is an open source library that was created by Google Brind team by the Google brain team | |
| 21 | |
| 00:01:27,030 --> 00:01:33,530 | |
| in 2015 probably was being used inside of Google internally for many years prior to 2015. | |
| 22 | |
| 00:01:34,020 --> 00:01:39,750 | |
| And basically it's an extremely powerful extremely efficient and fast learning framework that is used | |
| 23 | |
| 00:01:39,750 --> 00:01:46,080 | |
| for high performance numerical competition across a variety of platforms such as use GPS use and use | |
| 24 | |
| 00:01:46,580 --> 00:01:52,320 | |
| it basically these engineers these guys at Google brain they developed this basically a superfast library | |
| 25 | |
| 00:01:52,650 --> 00:01:57,160 | |
| similar to some PI but basically incorporated it and built it around deplaning. | |
| 26 | |
| 00:01:57,180 --> 00:02:05,190 | |
| So you have all these the planning functions that are a part of the tensor flow framework to actually | |
| 27 | |
| 00:02:05,190 --> 00:02:11,580 | |
| has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier | |
| 28 | |
| 00:02:11,580 --> 00:02:12,200 | |
| to use. | |
| 29 | |
| 00:02:13,500 --> 00:02:20,390 | |
| So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically | |
| 30 | |
| 00:02:20,390 --> 00:02:24,640 | |
| a pythonic style of coding and it is extremely modular. | |
| 31 | |
| 00:02:24,860 --> 00:02:30,350 | |
| You don't even have to be a proficient program to use Chris this more elaborate. | |
| 32 | |
| 00:02:30,350 --> 00:02:36,010 | |
| He means that we can actually just start doing different things that are and neural nets are CNN's can | |
| 33 | |
| 00:02:36,130 --> 00:02:42,670 | |
| easily change course functions optimizes initialization schemes activation functions try different regular | |
| 34 | |
| 00:02:43,030 --> 00:02:48,580 | |
| sessions screams schemes to introduce more Leia's reduced number of filters. | |
| 35 | |
| 00:02:48,580 --> 00:02:53,080 | |
| All of those things are super easily inside of course. | |
| 36 | |
| 00:02:53,080 --> 00:02:56,710 | |
| So it allows us to build these powerful neural nets quickly and efficiently. | |
| 37 | |
| 00:02:57,130 --> 00:03:01,350 | |
| And obviously it works in partem and Python is one of my favorite. | |
| 38 | |
| 00:03:01,360 --> 00:03:06,370 | |
| Actually it is my favorite part of programming language ever because it makes so many complicated things | |
| 39 | |
| 00:03:06,370 --> 00:03:13,010 | |
| super easy tensor is definitely not as user friendly and what you are scarce. | |
| 40 | |
| 00:03:13,090 --> 00:03:18,510 | |
| I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite | |
| 41 | |
| 00:03:18,510 --> 00:03:18,980 | |
| hard. | |
| 42 | |
| 00:03:19,170 --> 00:03:21,620 | |
| So dancefloor seemed easy for me at that point. | |
| 43 | |
| 00:03:21,690 --> 00:03:24,870 | |
| However nothing beats carious of ease of use. | |
| 44 | |
| 00:03:24,990 --> 00:03:29,880 | |
| So unless you're doing complicated models or academic research or basically looking for some sort of | |
| 45 | |
| 00:03:29,880 --> 00:03:35,420 | |
| high performance startup you don't necessarily need to use pure to answer for anything. | |
| 46 | |
| 00:03:35,430 --> 00:03:38,630 | |
| So now you're ready ready to see some actual crosscourt. | |
| 47 | |
| 00:03:38,640 --> 00:03:40,300 | |
| Well it's actually pretty good. | |
| 48 | |
| 00:03:40,620 --> 00:03:46,980 | |
| But we're using the curus library and this is how we actually construct and build models inside of Titan | |
| 49 | |
| 00:03:47,580 --> 00:03:54,210 | |
| so fiercely we import the sequential model from Karris basically the sequential model is the main type | |
| 50 | |
| 00:03:54,210 --> 00:04:00,000 | |
| of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before | |
| 51 | |
| 00:04:00,060 --> 00:04:01,730 | |
| with sequential models. | |
| 52 | |
| 00:04:01,980 --> 00:04:05,340 | |
| If you're doing something exotic then it will probably be not be sequential. | |
| 53 | |
| 00:04:05,340 --> 00:04:09,480 | |
| The data is real and probably beyond the scope of this course. | |
| 54 | |
| 00:04:09,480 --> 00:04:12,110 | |
| Right now it's basically academic research. | |
| 55 | |
| 00:04:12,480 --> 00:04:13,130 | |
| So let's move on. | |
| 56 | |
| 00:04:13,140 --> 00:04:14,750 | |
| So we defined our model here. | |
| 57 | |
| 00:04:14,760 --> 00:04:23,060 | |
| We initialize it by running this line of sequential these two brackets and then now let's add some convolutional | |
| 58 | |
| 00:04:23,060 --> 00:04:24,500 | |
| layers to it. | |
| 59 | |
| 00:04:24,500 --> 00:04:28,150 | |
| So before we do that we have to import these layers from Carrousel. | |
| 60 | |
| 00:04:28,370 --> 00:04:34,380 | |
| So from Chris Dudley as we import dense dropout flatten conv to the max beling. | |
| 61 | |
| 00:04:34,400 --> 00:04:37,710 | |
| Now I don't have dropout here but it's been used in the model. | |
| 62 | |
| 00:04:37,730 --> 00:04:39,220 | |
| We actually are going to code. | |
| 63 | |
| 00:04:39,290 --> 00:04:40,110 | |
| So I left it in. | |
| 64 | |
| 00:04:40,170 --> 00:04:42,930 | |
| So you guys know it's how it's different to here. | |
| 65 | |
| 00:04:42,940 --> 00:04:43,610 | |
| OK. | |
| 66 | |
| 00:04:44,090 --> 00:04:48,370 | |
| So flawlessly modeled after this is we're adding are firstly a here. | |
| 67 | |
| 00:04:48,410 --> 00:04:50,550 | |
| So firstly as a convert to the. | |
| 68 | |
| 00:04:50,690 --> 00:04:51,560 | |
| That's what we call it. | |
| 69 | |
| 00:04:51,590 --> 00:04:56,030 | |
| And now we have open brackets here and we have some parameters to fill out. | |
| 70 | |
| 00:04:56,030 --> 00:04:57,830 | |
| So let's go to these parameters. | |
| 71 | |
| 00:04:57,840 --> 00:04:59,300 | |
| This one is number 22. | |
| 72 | |
| 00:04:59,360 --> 00:05:05,890 | |
| Then the kernel size we can see here then type activities and we use an input shape equals in shape. | |
| 73 | |
| 00:05:05,930 --> 00:05:08,360 | |
| That's a bit confusing I'll explain it to you shortly. | |
| 74 | |
| 00:05:08,600 --> 00:05:14,660 | |
| So let's go through each one first one here to the two that specifies specifying First the number of | |
| 75 | |
| 00:05:14,660 --> 00:05:16,030 | |
| kernels or filters. | |
| 76 | |
| 00:05:16,340 --> 00:05:24,710 | |
| So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using | |
| 77 | |
| 00:05:25,500 --> 00:05:31,730 | |
| an input shape is called input shape and that's because previously above this could we declared we created | |
| 78 | |
| 00:05:32,370 --> 00:05:38,400 | |
| shape parameter variable I should say that was twenty eight by 28 by 1. | |
| 79 | |
| 00:05:38,720 --> 00:05:40,240 | |
| So in what shape we use. | |
| 80 | |
| 00:05:40,550 --> 00:05:45,890 | |
| So I just left it out here for convenience because it's we tend to leave it out and does have a defined | |
| 81 | |
| 00:05:45,980 --> 00:05:49,440 | |
| outside of the scope of this declaration here. | |
| 82 | |
| 00:05:49,910 --> 00:05:51,230 | |
| Now to add another layer. | |
| 83 | |
| 00:05:51,590 --> 00:05:52,990 | |
| It's as simple as modeled. | |
| 84 | |
| 00:05:53,080 --> 00:05:55,280 | |
| And Chris is so easy to use. | |
| 85 | |
| 00:05:55,280 --> 00:06:00,290 | |
| Just keep it using one add and it stacks easily at the top of each other starting with the first to | |
| 86 | |
| 00:06:00,290 --> 00:06:01,650 | |
| the bottom layers. | |
| 87 | |
| 00:06:01,670 --> 00:06:05,450 | |
| So now we add another convolutional Liya sixty four to the system. | |
| 88 | |
| 00:06:05,490 --> 00:06:08,610 | |
| See him can on SES and not we don't need to specify size. | |
| 89 | |
| 00:06:08,620 --> 00:06:12,660 | |
| Everytime we do it it knows that a second parameter is kernel size. | |
| 90 | |
| 00:06:12,690 --> 00:06:13,770 | |
| So it's tree by tree. | |
| 91 | |
| 00:06:13,790 --> 00:06:15,760 | |
| And again it Activision really. | |
| 92 | |
| 00:06:16,280 --> 00:06:19,570 | |
| And now you've noticed we don't need to specify an input shape here. | |
| 93 | |
| 00:06:19,910 --> 00:06:20,690 | |
| And do you know why. | |
| 94 | |
| 00:06:20,780 --> 00:06:23,840 | |
| That's because this leads directly connected to display. | |
| 95 | |
| 00:06:23,930 --> 00:06:27,380 | |
| So it ticks the output of this layer and it knows the output. | |
| 96 | |
| 00:06:27,380 --> 00:06:31,290 | |
| So the output of this layer is effectively the input into this layer. | |
| 97 | |
| 00:06:31,670 --> 00:06:34,150 | |
| So we no longer have to declare inputs anymore. | |
| 98 | |
| 00:06:35,740 --> 00:06:37,710 | |
| So now you can add a max blink. | |
| 99 | |
| 00:06:38,000 --> 00:06:40,820 | |
| And I said before we are going to use Emacs beling of two by two. | |
| 100 | |
| 00:06:41,140 --> 00:06:42,170 | |
| So that's simple here. | |
| 101 | |
| 00:06:42,170 --> 00:06:49,340 | |
| We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets. | |
| 102 | |
| 00:06:49,340 --> 00:06:53,720 | |
| For this here close brackets for this here and then we do a Flaten. | |
| 103 | |
| 00:06:53,830 --> 00:06:56,220 | |
| I haven't discussed flattened flattens nationally. | |
| 104 | |
| 00:06:56,220 --> 00:06:57,300 | |
| It's basically a function. | |
| 105 | |
| 00:06:57,300 --> 00:07:01,350 | |
| We do that we use to feed the densely or fully connectedly. | |
| 106 | |
| 00:07:01,710 --> 00:07:05,540 | |
| You'll see it visually in the next diagram on the following slide. | |
| 107 | |
| 00:07:05,740 --> 00:07:11,940 | |
| Basically we then add a densely here with 128 units all the activated by real. | |
| 108 | |
| 00:07:12,300 --> 00:07:16,760 | |
| And this is connected to another densely here which outputs number of classes. | |
| 109 | |
| 00:07:16,830 --> 00:07:23,510 | |
| So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it | |
| 110 | |
| 00:07:23,510 --> 00:07:30,360 | |
| for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite | |
| 111 | |
| 00:07:30,360 --> 00:07:32,510 | |
| simple because to me this is quite basic. | |
| 112 | |
| 00:07:32,610 --> 00:07:37,190 | |
| You may take a while to get familiar with how these things work but you will get used to it in this | |
| 113 | |
| 00:07:37,200 --> 00:07:38,790 | |
| course I guarantee it. | |
| 114 | |
| 00:07:38,790 --> 00:07:40,800 | |
| So let's take a look at what we've built here. | |
| 115 | |
| 00:07:41,230 --> 00:07:41,540 | |
| OK. | |
| 116 | |
| 00:07:41,580 --> 00:07:44,880 | |
| So this is actually what we have built so far. | |
| 117 | |
| 00:07:44,880 --> 00:07:50,090 | |
| So we have an input image here 20 by 20 and by 1 1 because it's greyscale. | |
| 118 | |
| 00:07:50,190 --> 00:07:54,440 | |
| If it was a color image R.G. image it would be tree depth here. | |
| 119 | |
| 00:07:54,780 --> 00:08:02,250 | |
| So as you saw before we have 22 filters here connected to another conflict with 64 filters here. | |
| 120 | |
| 00:08:02,490 --> 00:08:08,850 | |
| And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in | |
| 121 | |
| 00:08:08,850 --> 00:08:10,490 | |
| 24 by 24. | |
| 122 | |
| 00:08:10,590 --> 00:08:12,770 | |
| And that's because we didn't use any zero padding. | |
| 123 | |
| 00:08:12,840 --> 00:08:15,980 | |
| And I'll show you guys later on in court how to do your reporting. | |
| 124 | |
| 00:08:15,990 --> 00:08:16,910 | |
| It's quite simple. | |
| 125 | |
| 00:08:17,100 --> 00:08:22,450 | |
| But for now just remember when you don't use your reporting or input image or convolutional feature | |
| 126 | |
| 00:08:22,450 --> 00:08:25,530 | |
| map size reduces from the input image size. | |
| 127 | |
| 00:08:25,530 --> 00:08:28,950 | |
| So we have these two Mattew currently stacked here. | |
| 128 | |
| 00:08:28,950 --> 00:08:33,000 | |
| Then we have OMX pooling which basically shrinks this by half. | |
| 129 | |
| 00:08:33,180 --> 00:08:34,360 | |
| I have a Drapeau thing here. | |
| 130 | |
| 00:08:34,500 --> 00:08:39,330 | |
| But we didn't actually use dropout before in the code but in the actual code when we started a project | |
| 131 | |
| 00:08:39,330 --> 00:08:44,720 | |
| I'll show you quickly how to actually implement dropout in one line super easy. | |
| 132 | |
| 00:08:44,820 --> 00:08:48,260 | |
| What I wanted to show you do was we have to flatten a here. | |
| 133 | |
| 00:08:48,490 --> 00:08:52,890 | |
| Nada Flannelly if you go back to here we just flatten brackets. | |
| 134 | |
| 00:08:52,950 --> 00:08:54,420 | |
| And what does it actually do. | |
| 135 | |
| 00:08:54,450 --> 00:09:00,960 | |
| Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it | |
| 136 | |
| 00:09:00,960 --> 00:09:07,430 | |
| into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns. | |
| 137 | |
| 00:09:07,470 --> 00:09:11,770 | |
| So what it means here is that we just we just flattened this matrix. | |
| 138 | |
| 00:09:11,820 --> 00:09:18,540 | |
| So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first | |
| 139 | |
| 00:09:18,540 --> 00:09:24,410 | |
| 12 here then second 12 and then just consecutive long basically a long row. | |
| 140 | |
| 00:09:24,870 --> 00:09:28,810 | |
| And that becomes does this put box here. | |
| 141 | |
| 00:09:29,100 --> 00:09:35,180 | |
| And now this up at Box here is fed into the fully connectedly here with 128 nodes. | |
| 142 | |
| 00:09:35,430 --> 00:09:36,440 | |
| Asked about it here. | |
| 143 | |
| 00:09:36,690 --> 00:09:37,690 | |
| So we did find it here. | |
| 144 | |
| 00:09:37,710 --> 00:09:40,760 | |
| Each node was connected to a real two activation units. | |
| 145 | |
| 00:09:41,220 --> 00:09:48,810 | |
| And then finally we connect this to our final Denslow with soft Max activation function and outputs | |
| 146 | |
| 00:09:48,810 --> 00:09:54,210 | |
| to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes. | |
| 147 | |
| 00:09:54,300 --> 00:09:56,290 | |
| So that's how we get our probabilities here. | |
| 148 | |
| 00:09:56,700 --> 00:09:59,130 | |
| So it's not illustrated here but later on you will see it. | |
| 149 | |
| 00:09:59,130 --> 00:10:03,140 | |
| So now that we've built our model we are ready to compile this model. | |
| 150 | |
| 00:10:03,390 --> 00:10:08,640 | |
| So compiling simply creates an object that stores our model we've just created and we can specify all | |
| 151 | |
| 00:10:08,640 --> 00:10:13,350 | |
| loss algorithm optimizer define our performance metric that we want to look at. | |
| 152 | |
| 00:10:13,440 --> 00:10:18,090 | |
| And additionally we can specify parameters for the optimizer such as litigates and momentum. | |
| 153 | |
| 00:10:18,090 --> 00:10:22,630 | |
| So this is a simple model that compile code here. | |
| 154 | |
| 00:10:22,980 --> 00:10:29,610 | |
| We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic | |
| 155 | |
| 00:10:29,610 --> 00:10:33,990 | |
| really innocent and we have a metric look at being defined as accuracy. | |
| 156 | |
| 00:10:36,370 --> 00:10:39,000 | |
| So how do we fit in our model now. | |
| 157 | |
| 00:10:39,010 --> 00:10:44,770 | |
| So following a simple basically what Eski line which is the most established and popular machine learning | |
| 158 | |
| 00:10:44,770 --> 00:10:48,540 | |
| library on pite and private through senseful in Paris. | |
| 159 | |
| 00:10:48,610 --> 00:10:52,350 | |
| Basically they did as applied model but that would be used to training data. | |
| 160 | |
| 00:10:52,600 --> 00:10:56,130 | |
| The training labels extreme and waitron that's what they are. | |
| 161 | |
| 00:10:56,140 --> 00:11:02,360 | |
| You'll see them in a code soon specified number of epochs and about say not about size. | |
| 162 | |
| 00:11:02,560 --> 00:11:05,720 | |
| This doesn't impact learning that much significantly. | |
| 163 | |
| 00:11:05,860 --> 00:11:11,470 | |
| How ever you basically should use a largest bedside size it's possible that your memory allows. | |
| 164 | |
| 00:11:11,500 --> 00:11:17,420 | |
| So you can experiment and try it you'll know it if you try it too large a box that size your kernel | |
| 165 | |
| 00:11:17,440 --> 00:11:18,250 | |
| will crash. | |
| 166 | |
| 00:11:18,250 --> 00:11:22,640 | |
| So generally I tend to not avoid having Why couldn't I put it in a book Crash. | |
| 167 | |
| 00:11:22,750 --> 00:11:29,030 | |
| So all is basically about size of 32 for pretty much smaller images or 16 for larger images. | |
| 168 | |
| 00:11:31,450 --> 00:11:35,250 | |
| And once we do that we can now evaluates and generate predictions afterward. | |
| 169 | |
| 00:11:35,290 --> 00:11:41,830 | |
| So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can | |
| 170 | |
| 00:11:41,830 --> 00:11:47,140 | |
| get these metrics the metrics parameters metrics object and then we can use that to actually look at | |
| 171 | |
| 00:11:47,140 --> 00:11:53,580 | |
| different graphs and if an interesting performance information from our model and if we wanted to ever | |
| 172 | |
| 00:11:53,590 --> 00:11:59,410 | |
| predict an individual point like we have an image and you want to get the actual class it belongs to | |
| 173 | |
| 00:12:00,040 --> 00:12:04,070 | |
| you can use model to predict model to predict allows us to feed one image at a time. | |
| 174 | |
| 00:12:04,130 --> 00:12:07,070 | |
| We can feed the entire dataset here as well. | |
| 175 | |
| 00:12:07,840 --> 00:12:09,660 | |
| So that's it's a let's get starting. | |
| 176 | |
| 00:12:09,690 --> 00:12:13,030 | |
| So let's build our own handwritten digit classify. | |