1 00:00:00,960 --> 00:00:03,370 So welcome back to chapter eight point one. 2 00:00:03,390 --> 00:00:06,190 We introduce you to Chris and. 3 00:00:08,000 --> 00:00:09,660 So what exactly is Karris. 4 00:00:09,660 --> 00:00:15,060 I've mentioned it countless times on this course but I haven't actually told you precisely what it is. 5 00:00:15,200 --> 00:00:21,170 So careless is a high level neural network API for Python and it makes constructing neural networks 6 00:00:21,260 --> 00:00:25,370 all types not just CNNs but all types of neural networks. 7 00:00:25,490 --> 00:00:31,760 It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things 8 00:00:32,480 --> 00:00:36,770 configure your loss functions configure your different types of activation functions. 9 00:00:36,770 --> 00:00:38,720 It's quite nice. 10 00:00:38,870 --> 00:00:45,410 It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which 11 00:00:45,440 --> 00:00:47,180 I use use quite a bit of the data. 12 00:00:47,360 --> 00:00:51,690 However I've now moved intensively and I haven't looked back not because I it is bad. 13 00:00:51,930 --> 00:00:59,000 Just basically stopped is has stopped being updated now project has pretty much done and closed and 14 00:00:59,000 --> 00:01:02,710 everyone is probably pretty much adopting senseful now. 15 00:01:03,680 --> 00:01:09,860 And it was developed by Francois Shuli and he has been a tremendous success in making tippling much 16 00:01:09,860 --> 00:01:11,390 more accessible to the masses. 17 00:01:12,680 --> 00:01:16,820 So what is tons of little is said Chris used to tend to flow back in. 18 00:01:16,830 --> 00:01:18,400 But what exactly is a backhanded. 19 00:01:18,530 --> 00:01:19,370 OK. 20 00:01:19,620 --> 00:01:27,000 So then flow is an open source library that was created by Google Brind team by the Google brain team 21 00:01:27,030 --> 00:01:33,530 in 2015 probably was being used inside of Google internally for many years prior to 2015. 22 00:01:34,020 --> 00:01:39,750 And basically it's an extremely powerful extremely efficient and fast learning framework that is used 23 00:01:39,750 --> 00:01:46,080 for high performance numerical competition across a variety of platforms such as use GPS use and use 24 00:01:46,580 --> 00:01:52,320 it basically these engineers these guys at Google brain they developed this basically a superfast library 25 00:01:52,650 --> 00:01:57,160 similar to some PI but basically incorporated it and built it around deplaning. 26 00:01:57,180 --> 00:02:05,190 So you have all these the planning functions that are a part of the tensor flow framework to actually 27 00:02:05,190 --> 00:02:11,580 has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier 28 00:02:11,580 --> 00:02:12,200 to use. 29 00:02:13,500 --> 00:02:20,390 So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically 30 00:02:20,390 --> 00:02:24,640 a pythonic style of coding and it is extremely modular. 31 00:02:24,860 --> 00:02:30,350 You don't even have to be a proficient program to use Chris this more elaborate. 32 00:02:30,350 --> 00:02:36,010 He means that we can actually just start doing different things that are and neural nets are CNN's can 33 00:02:36,130 --> 00:02:42,670 easily change course functions optimizes initialization schemes activation functions try different regular 34 00:02:43,030 --> 00:02:48,580 sessions screams schemes to introduce more Leia's reduced number of filters. 35 00:02:48,580 --> 00:02:53,080 All of those things are super easily inside of course. 36 00:02:53,080 --> 00:02:56,710 So it allows us to build these powerful neural nets quickly and efficiently. 37 00:02:57,130 --> 00:03:01,350 And obviously it works in partem and Python is one of my favorite. 38 00:03:01,360 --> 00:03:06,370 Actually it is my favorite part of programming language ever because it makes so many complicated things 39 00:03:06,370 --> 00:03:13,010 super easy tensor is definitely not as user friendly and what you are scarce. 40 00:03:13,090 --> 00:03:18,510 I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite 41 00:03:18,510 --> 00:03:18,980 hard. 42 00:03:19,170 --> 00:03:21,620 So dancefloor seemed easy for me at that point. 43 00:03:21,690 --> 00:03:24,870 However nothing beats carious of ease of use. 44 00:03:24,990 --> 00:03:29,880 So unless you're doing complicated models or academic research or basically looking for some sort of 45 00:03:29,880 --> 00:03:35,420 high performance startup you don't necessarily need to use pure to answer for anything. 46 00:03:35,430 --> 00:03:38,630 So now you're ready ready to see some actual crosscourt. 47 00:03:38,640 --> 00:03:40,300 Well it's actually pretty good. 48 00:03:40,620 --> 00:03:46,980 But we're using the curus library and this is how we actually construct and build models inside of Titan 49 00:03:47,580 --> 00:03:54,210 so fiercely we import the sequential model from Karris basically the sequential model is the main type 50 00:03:54,210 --> 00:04:00,000 of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before 51 00:04:00,060 --> 00:04:01,730 with sequential models. 52 00:04:01,980 --> 00:04:05,340 If you're doing something exotic then it will probably be not be sequential. 53 00:04:05,340 --> 00:04:09,480 The data is real and probably beyond the scope of this course. 54 00:04:09,480 --> 00:04:12,110 Right now it's basically academic research. 55 00:04:12,480 --> 00:04:13,130 So let's move on. 56 00:04:13,140 --> 00:04:14,750 So we defined our model here. 57 00:04:14,760 --> 00:04:23,060 We initialize it by running this line of sequential these two brackets and then now let's add some convolutional 58 00:04:23,060 --> 00:04:24,500 layers to it. 59 00:04:24,500 --> 00:04:28,150 So before we do that we have to import these layers from Carrousel. 60 00:04:28,370 --> 00:04:34,380 So from Chris Dudley as we import dense dropout flatten conv to the max beling. 61 00:04:34,400 --> 00:04:37,710 Now I don't have dropout here but it's been used in the model. 62 00:04:37,730 --> 00:04:39,220 We actually are going to code. 63 00:04:39,290 --> 00:04:40,110 So I left it in. 64 00:04:40,170 --> 00:04:42,930 So you guys know it's how it's different to here. 65 00:04:42,940 --> 00:04:43,610 OK. 66 00:04:44,090 --> 00:04:48,370 So flawlessly modeled after this is we're adding are firstly a here. 67 00:04:48,410 --> 00:04:50,550 So firstly as a convert to the. 68 00:04:50,690 --> 00:04:51,560 That's what we call it. 69 00:04:51,590 --> 00:04:56,030 And now we have open brackets here and we have some parameters to fill out. 70 00:04:56,030 --> 00:04:57,830 So let's go to these parameters. 71 00:04:57,840 --> 00:04:59,300 This one is number 22. 72 00:04:59,360 --> 00:05:05,890 Then the kernel size we can see here then type activities and we use an input shape equals in shape. 73 00:05:05,930 --> 00:05:08,360 That's a bit confusing I'll explain it to you shortly. 74 00:05:08,600 --> 00:05:14,660 So let's go through each one first one here to the two that specifies specifying First the number of 75 00:05:14,660 --> 00:05:16,030 kernels or filters. 76 00:05:16,340 --> 00:05:24,710 So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using 77 00:05:25,500 --> 00:05:31,730 an input shape is called input shape and that's because previously above this could we declared we created 78 00:05:32,370 --> 00:05:38,400 shape parameter variable I should say that was twenty eight by 28 by 1. 79 00:05:38,720 --> 00:05:40,240 So in what shape we use. 80 00:05:40,550 --> 00:05:45,890 So I just left it out here for convenience because it's we tend to leave it out and does have a defined 81 00:05:45,980 --> 00:05:49,440 outside of the scope of this declaration here. 82 00:05:49,910 --> 00:05:51,230 Now to add another layer. 83 00:05:51,590 --> 00:05:52,990 It's as simple as modeled. 84 00:05:53,080 --> 00:05:55,280 And Chris is so easy to use. 85 00:05:55,280 --> 00:06:00,290 Just keep it using one add and it stacks easily at the top of each other starting with the first to 86 00:06:00,290 --> 00:06:01,650 the bottom layers. 87 00:06:01,670 --> 00:06:05,450 So now we add another convolutional Liya sixty four to the system. 88 00:06:05,490 --> 00:06:08,610 See him can on SES and not we don't need to specify size. 89 00:06:08,620 --> 00:06:12,660 Everytime we do it it knows that a second parameter is kernel size. 90 00:06:12,690 --> 00:06:13,770 So it's tree by tree. 91 00:06:13,790 --> 00:06:15,760 And again it Activision really. 92 00:06:16,280 --> 00:06:19,570 And now you've noticed we don't need to specify an input shape here. 93 00:06:19,910 --> 00:06:20,690 And do you know why. 94 00:06:20,780 --> 00:06:23,840 That's because this leads directly connected to display. 95 00:06:23,930 --> 00:06:27,380 So it ticks the output of this layer and it knows the output. 96 00:06:27,380 --> 00:06:31,290 So the output of this layer is effectively the input into this layer. 97 00:06:31,670 --> 00:06:34,150 So we no longer have to declare inputs anymore. 98 00:06:35,740 --> 00:06:37,710 So now you can add a max blink. 99 00:06:38,000 --> 00:06:40,820 And I said before we are going to use Emacs beling of two by two. 100 00:06:41,140 --> 00:06:42,170 So that's simple here. 101 00:06:42,170 --> 00:06:49,340 We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets. 102 00:06:49,340 --> 00:06:53,720 For this here close brackets for this here and then we do a Flaten. 103 00:06:53,830 --> 00:06:56,220 I haven't discussed flattened flattens nationally. 104 00:06:56,220 --> 00:06:57,300 It's basically a function. 105 00:06:57,300 --> 00:07:01,350 We do that we use to feed the densely or fully connectedly. 106 00:07:01,710 --> 00:07:05,540 You'll see it visually in the next diagram on the following slide. 107 00:07:05,740 --> 00:07:11,940 Basically we then add a densely here with 128 units all the activated by real. 108 00:07:12,300 --> 00:07:16,760 And this is connected to another densely here which outputs number of classes. 109 00:07:16,830 --> 00:07:23,510 So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it 110 00:07:23,510 --> 00:07:30,360 for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite 111 00:07:30,360 --> 00:07:32,510 simple because to me this is quite basic. 112 00:07:32,610 --> 00:07:37,190 You may take a while to get familiar with how these things work but you will get used to it in this 113 00:07:37,200 --> 00:07:38,790 course I guarantee it. 114 00:07:38,790 --> 00:07:40,800 So let's take a look at what we've built here. 115 00:07:41,230 --> 00:07:41,540 OK. 116 00:07:41,580 --> 00:07:44,880 So this is actually what we have built so far. 117 00:07:44,880 --> 00:07:50,090 So we have an input image here 20 by 20 and by 1 1 because it's greyscale. 118 00:07:50,190 --> 00:07:54,440 If it was a color image R.G. image it would be tree depth here. 119 00:07:54,780 --> 00:08:02,250 So as you saw before we have 22 filters here connected to another conflict with 64 filters here. 120 00:08:02,490 --> 00:08:08,850 And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in 121 00:08:08,850 --> 00:08:10,490 24 by 24. 122 00:08:10,590 --> 00:08:12,770 And that's because we didn't use any zero padding. 123 00:08:12,840 --> 00:08:15,980 And I'll show you guys later on in court how to do your reporting. 124 00:08:15,990 --> 00:08:16,910 It's quite simple. 125 00:08:17,100 --> 00:08:22,450 But for now just remember when you don't use your reporting or input image or convolutional feature 126 00:08:22,450 --> 00:08:25,530 map size reduces from the input image size. 127 00:08:25,530 --> 00:08:28,950 So we have these two Mattew currently stacked here. 128 00:08:28,950 --> 00:08:33,000 Then we have OMX pooling which basically shrinks this by half. 129 00:08:33,180 --> 00:08:34,360 I have a Drapeau thing here. 130 00:08:34,500 --> 00:08:39,330 But we didn't actually use dropout before in the code but in the actual code when we started a project 131 00:08:39,330 --> 00:08:44,720 I'll show you quickly how to actually implement dropout in one line super easy. 132 00:08:44,820 --> 00:08:48,260 What I wanted to show you do was we have to flatten a here. 133 00:08:48,490 --> 00:08:52,890 Nada Flannelly if you go back to here we just flatten brackets. 134 00:08:52,950 --> 00:08:54,420 And what does it actually do. 135 00:08:54,450 --> 00:09:00,960 Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it 136 00:09:00,960 --> 00:09:07,430 into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns. 137 00:09:07,470 --> 00:09:11,770 So what it means here is that we just we just flattened this matrix. 138 00:09:11,820 --> 00:09:18,540 So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first 139 00:09:18,540 --> 00:09:24,410 12 here then second 12 and then just consecutive long basically a long row. 140 00:09:24,870 --> 00:09:28,810 And that becomes does this put box here. 141 00:09:29,100 --> 00:09:35,180 And now this up at Box here is fed into the fully connectedly here with 128 nodes. 142 00:09:35,430 --> 00:09:36,440 Asked about it here. 143 00:09:36,690 --> 00:09:37,690 So we did find it here. 144 00:09:37,710 --> 00:09:40,760 Each node was connected to a real two activation units. 145 00:09:41,220 --> 00:09:48,810 And then finally we connect this to our final Denslow with soft Max activation function and outputs 146 00:09:48,810 --> 00:09:54,210 to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes. 147 00:09:54,300 --> 00:09:56,290 So that's how we get our probabilities here. 148 00:09:56,700 --> 00:09:59,130 So it's not illustrated here but later on you will see it. 149 00:09:59,130 --> 00:10:03,140 So now that we've built our model we are ready to compile this model. 150 00:10:03,390 --> 00:10:08,640 So compiling simply creates an object that stores our model we've just created and we can specify all 151 00:10:08,640 --> 00:10:13,350 loss algorithm optimizer define our performance metric that we want to look at. 152 00:10:13,440 --> 00:10:18,090 And additionally we can specify parameters for the optimizer such as litigates and momentum. 153 00:10:18,090 --> 00:10:22,630 So this is a simple model that compile code here. 154 00:10:22,980 --> 00:10:29,610 We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic 155 00:10:29,610 --> 00:10:33,990 really innocent and we have a metric look at being defined as accuracy. 156 00:10:36,370 --> 00:10:39,000 So how do we fit in our model now. 157 00:10:39,010 --> 00:10:44,770 So following a simple basically what Eski line which is the most established and popular machine learning 158 00:10:44,770 --> 00:10:48,540 library on pite and private through senseful in Paris. 159 00:10:48,610 --> 00:10:52,350 Basically they did as applied model but that would be used to training data. 160 00:10:52,600 --> 00:10:56,130 The training labels extreme and waitron that's what they are. 161 00:10:56,140 --> 00:11:02,360 You'll see them in a code soon specified number of epochs and about say not about size. 162 00:11:02,560 --> 00:11:05,720 This doesn't impact learning that much significantly. 163 00:11:05,860 --> 00:11:11,470 How ever you basically should use a largest bedside size it's possible that your memory allows. 164 00:11:11,500 --> 00:11:17,420 So you can experiment and try it you'll know it if you try it too large a box that size your kernel 165 00:11:17,440 --> 00:11:18,250 will crash. 166 00:11:18,250 --> 00:11:22,640 So generally I tend to not avoid having Why couldn't I put it in a book Crash. 167 00:11:22,750 --> 00:11:29,030 So all is basically about size of 32 for pretty much smaller images or 16 for larger images. 168 00:11:31,450 --> 00:11:35,250 And once we do that we can now evaluates and generate predictions afterward. 169 00:11:35,290 --> 00:11:41,830 So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can 170 00:11:41,830 --> 00:11:47,140 get these metrics the metrics parameters metrics object and then we can use that to actually look at 171 00:11:47,140 --> 00:11:53,580 different graphs and if an interesting performance information from our model and if we wanted to ever 172 00:11:53,590 --> 00:11:59,410 predict an individual point like we have an image and you want to get the actual class it belongs to 173 00:12:00,040 --> 00:12:04,070 you can use model to predict model to predict allows us to feed one image at a time. 174 00:12:04,130 --> 00:12:07,070 We can feed the entire dataset here as well. 175 00:12:07,840 --> 00:12:09,660 So that's it's a let's get starting. 176 00:12:09,690 --> 00:12:13,030 So let's build our own handwritten digit classify.