AI_DL_Assignment / 8. Build CNNs in Python using Keras /2. Introduction to Keras & Tensorflow.srt
Prince-1's picture
Add files using upload-large-folder tool
d157f08 verified
1
00:00:00,960 --> 00:00:03,370
So welcome back to chapter eight point one.
2
00:00:03,390 --> 00:00:06,190
We introduce you to Chris and.
3
00:00:08,000 --> 00:00:09,660
So what exactly is Karris.
4
00:00:09,660 --> 00:00:15,060
I've mentioned it countless times on this course but I haven't actually told you precisely what it is.
5
00:00:15,200 --> 00:00:21,170
So careless is a high level neural network API for Python and it makes constructing neural networks
6
00:00:21,260 --> 00:00:25,370
all types not just CNNs but all types of neural networks.
7
00:00:25,490 --> 00:00:31,760
It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things
8
00:00:32,480 --> 00:00:36,770
configure your loss functions configure your different types of activation functions.
9
00:00:36,770 --> 00:00:38,720
It's quite nice.
10
00:00:38,870 --> 00:00:45,410
It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which
11
00:00:45,440 --> 00:00:47,180
I use use quite a bit of the data.
12
00:00:47,360 --> 00:00:51,690
However I've now moved intensively and I haven't looked back not because I it is bad.
13
00:00:51,930 --> 00:00:59,000
Just basically stopped is has stopped being updated now project has pretty much done and closed and
14
00:00:59,000 --> 00:01:02,710
everyone is probably pretty much adopting senseful now.
15
00:01:03,680 --> 00:01:09,860
And it was developed by Francois Shuli and he has been a tremendous success in making tippling much
16
00:01:09,860 --> 00:01:11,390
more accessible to the masses.
17
00:01:12,680 --> 00:01:16,820
So what is tons of little is said Chris used to tend to flow back in.
18
00:01:16,830 --> 00:01:18,400
But what exactly is a backhanded.
19
00:01:18,530 --> 00:01:19,370
OK.
20
00:01:19,620 --> 00:01:27,000
So then flow is an open source library that was created by Google Brind team by the Google brain team
21
00:01:27,030 --> 00:01:33,530
in 2015 probably was being used inside of Google internally for many years prior to 2015.
22
00:01:34,020 --> 00:01:39,750
And basically it's an extremely powerful extremely efficient and fast learning framework that is used
23
00:01:39,750 --> 00:01:46,080
for high performance numerical competition across a variety of platforms such as use GPS use and use
24
00:01:46,580 --> 00:01:52,320
it basically these engineers these guys at Google brain they developed this basically a superfast library
25
00:01:52,650 --> 00:01:57,160
similar to some PI but basically incorporated it and built it around deplaning.
26
00:01:57,180 --> 00:02:05,190
So you have all these the planning functions that are a part of the tensor flow framework to actually
27
00:02:05,190 --> 00:02:11,580
has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier
28
00:02:11,580 --> 00:02:12,200
to use.
29
00:02:13,500 --> 00:02:20,390
So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically
30
00:02:20,390 --> 00:02:24,640
a pythonic style of coding and it is extremely modular.
31
00:02:24,860 --> 00:02:30,350
You don't even have to be a proficient program to use Chris this more elaborate.
32
00:02:30,350 --> 00:02:36,010
He means that we can actually just start doing different things that are and neural nets are CNN's can
33
00:02:36,130 --> 00:02:42,670
easily change course functions optimizes initialization schemes activation functions try different regular
34
00:02:43,030 --> 00:02:48,580
sessions screams schemes to introduce more Leia's reduced number of filters.
35
00:02:48,580 --> 00:02:53,080
All of those things are super easily inside of course.
36
00:02:53,080 --> 00:02:56,710
So it allows us to build these powerful neural nets quickly and efficiently.
37
00:02:57,130 --> 00:03:01,350
And obviously it works in partem and Python is one of my favorite.
38
00:03:01,360 --> 00:03:06,370
Actually it is my favorite part of programming language ever because it makes so many complicated things
39
00:03:06,370 --> 00:03:13,010
super easy tensor is definitely not as user friendly and what you are scarce.
40
00:03:13,090 --> 00:03:18,510
I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite
41
00:03:18,510 --> 00:03:18,980
hard.
42
00:03:19,170 --> 00:03:21,620
So dancefloor seemed easy for me at that point.
43
00:03:21,690 --> 00:03:24,870
However nothing beats carious of ease of use.
44
00:03:24,990 --> 00:03:29,880
So unless you're doing complicated models or academic research or basically looking for some sort of
45
00:03:29,880 --> 00:03:35,420
high performance startup you don't necessarily need to use pure to answer for anything.
46
00:03:35,430 --> 00:03:38,630
So now you're ready ready to see some actual crosscourt.
47
00:03:38,640 --> 00:03:40,300
Well it's actually pretty good.
48
00:03:40,620 --> 00:03:46,980
But we're using the curus library and this is how we actually construct and build models inside of Titan
49
00:03:47,580 --> 00:03:54,210
so fiercely we import the sequential model from Karris basically the sequential model is the main type
50
00:03:54,210 --> 00:04:00,000
of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before
51
00:04:00,060 --> 00:04:01,730
with sequential models.
52
00:04:01,980 --> 00:04:05,340
If you're doing something exotic then it will probably be not be sequential.
53
00:04:05,340 --> 00:04:09,480
The data is real and probably beyond the scope of this course.
54
00:04:09,480 --> 00:04:12,110
Right now it's basically academic research.
55
00:04:12,480 --> 00:04:13,130
So let's move on.
56
00:04:13,140 --> 00:04:14,750
So we defined our model here.
57
00:04:14,760 --> 00:04:23,060
We initialize it by running this line of sequential these two brackets and then now let's add some convolutional
58
00:04:23,060 --> 00:04:24,500
layers to it.
59
00:04:24,500 --> 00:04:28,150
So before we do that we have to import these layers from Carrousel.
60
00:04:28,370 --> 00:04:34,380
So from Chris Dudley as we import dense dropout flatten conv to the max beling.
61
00:04:34,400 --> 00:04:37,710
Now I don't have dropout here but it's been used in the model.
62
00:04:37,730 --> 00:04:39,220
We actually are going to code.
63
00:04:39,290 --> 00:04:40,110
So I left it in.
64
00:04:40,170 --> 00:04:42,930
So you guys know it's how it's different to here.
65
00:04:42,940 --> 00:04:43,610
OK.
66
00:04:44,090 --> 00:04:48,370
So flawlessly modeled after this is we're adding are firstly a here.
67
00:04:48,410 --> 00:04:50,550
So firstly as a convert to the.
68
00:04:50,690 --> 00:04:51,560
That's what we call it.
69
00:04:51,590 --> 00:04:56,030
And now we have open brackets here and we have some parameters to fill out.
70
00:04:56,030 --> 00:04:57,830
So let's go to these parameters.
71
00:04:57,840 --> 00:04:59,300
This one is number 22.
72
00:04:59,360 --> 00:05:05,890
Then the kernel size we can see here then type activities and we use an input shape equals in shape.
73
00:05:05,930 --> 00:05:08,360
That's a bit confusing I'll explain it to you shortly.
74
00:05:08,600 --> 00:05:14,660
So let's go through each one first one here to the two that specifies specifying First the number of
75
00:05:14,660 --> 00:05:16,030
kernels or filters.
76
00:05:16,340 --> 00:05:24,710
So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using
77
00:05:25,500 --> 00:05:31,730
an input shape is called input shape and that's because previously above this could we declared we created
78
00:05:32,370 --> 00:05:38,400
shape parameter variable I should say that was twenty eight by 28 by 1.
79
00:05:38,720 --> 00:05:40,240
So in what shape we use.
80
00:05:40,550 --> 00:05:45,890
So I just left it out here for convenience because it's we tend to leave it out and does have a defined
81
00:05:45,980 --> 00:05:49,440
outside of the scope of this declaration here.
82
00:05:49,910 --> 00:05:51,230
Now to add another layer.
83
00:05:51,590 --> 00:05:52,990
It's as simple as modeled.
84
00:05:53,080 --> 00:05:55,280
And Chris is so easy to use.
85
00:05:55,280 --> 00:06:00,290
Just keep it using one add and it stacks easily at the top of each other starting with the first to
86
00:06:00,290 --> 00:06:01,650
the bottom layers.
87
00:06:01,670 --> 00:06:05,450
So now we add another convolutional Liya sixty four to the system.
88
00:06:05,490 --> 00:06:08,610
See him can on SES and not we don't need to specify size.
89
00:06:08,620 --> 00:06:12,660
Everytime we do it it knows that a second parameter is kernel size.
90
00:06:12,690 --> 00:06:13,770
So it's tree by tree.
91
00:06:13,790 --> 00:06:15,760
And again it Activision really.
92
00:06:16,280 --> 00:06:19,570
And now you've noticed we don't need to specify an input shape here.
93
00:06:19,910 --> 00:06:20,690
And do you know why.
94
00:06:20,780 --> 00:06:23,840
That's because this leads directly connected to display.
95
00:06:23,930 --> 00:06:27,380
So it ticks the output of this layer and it knows the output.
96
00:06:27,380 --> 00:06:31,290
So the output of this layer is effectively the input into this layer.
97
00:06:31,670 --> 00:06:34,150
So we no longer have to declare inputs anymore.
98
00:06:35,740 --> 00:06:37,710
So now you can add a max blink.
99
00:06:38,000 --> 00:06:40,820
And I said before we are going to use Emacs beling of two by two.
100
00:06:41,140 --> 00:06:42,170
So that's simple here.
101
00:06:42,170 --> 00:06:49,340
We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets.
102
00:06:49,340 --> 00:06:53,720
For this here close brackets for this here and then we do a Flaten.
103
00:06:53,830 --> 00:06:56,220
I haven't discussed flattened flattens nationally.
104
00:06:56,220 --> 00:06:57,300
It's basically a function.
105
00:06:57,300 --> 00:07:01,350
We do that we use to feed the densely or fully connectedly.
106
00:07:01,710 --> 00:07:05,540
You'll see it visually in the next diagram on the following slide.
107
00:07:05,740 --> 00:07:11,940
Basically we then add a densely here with 128 units all the activated by real.
108
00:07:12,300 --> 00:07:16,760
And this is connected to another densely here which outputs number of classes.
109
00:07:16,830 --> 00:07:23,510
So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it
110
00:07:23,510 --> 00:07:30,360
for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite
111
00:07:30,360 --> 00:07:32,510
simple because to me this is quite basic.
112
00:07:32,610 --> 00:07:37,190
You may take a while to get familiar with how these things work but you will get used to it in this
113
00:07:37,200 --> 00:07:38,790
course I guarantee it.
114
00:07:38,790 --> 00:07:40,800
So let's take a look at what we've built here.
115
00:07:41,230 --> 00:07:41,540
OK.
116
00:07:41,580 --> 00:07:44,880
So this is actually what we have built so far.
117
00:07:44,880 --> 00:07:50,090
So we have an input image here 20 by 20 and by 1 1 because it's greyscale.
118
00:07:50,190 --> 00:07:54,440
If it was a color image R.G. image it would be tree depth here.
119
00:07:54,780 --> 00:08:02,250
So as you saw before we have 22 filters here connected to another conflict with 64 filters here.
120
00:08:02,490 --> 00:08:08,850
And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in
121
00:08:08,850 --> 00:08:10,490
24 by 24.
122
00:08:10,590 --> 00:08:12,770
And that's because we didn't use any zero padding.
123
00:08:12,840 --> 00:08:15,980
And I'll show you guys later on in court how to do your reporting.
124
00:08:15,990 --> 00:08:16,910
It's quite simple.
125
00:08:17,100 --> 00:08:22,450
But for now just remember when you don't use your reporting or input image or convolutional feature
126
00:08:22,450 --> 00:08:25,530
map size reduces from the input image size.
127
00:08:25,530 --> 00:08:28,950
So we have these two Mattew currently stacked here.
128
00:08:28,950 --> 00:08:33,000
Then we have OMX pooling which basically shrinks this by half.
129
00:08:33,180 --> 00:08:34,360
I have a Drapeau thing here.
130
00:08:34,500 --> 00:08:39,330
But we didn't actually use dropout before in the code but in the actual code when we started a project
131
00:08:39,330 --> 00:08:44,720
I'll show you quickly how to actually implement dropout in one line super easy.
132
00:08:44,820 --> 00:08:48,260
What I wanted to show you do was we have to flatten a here.
133
00:08:48,490 --> 00:08:52,890
Nada Flannelly if you go back to here we just flatten brackets.
134
00:08:52,950 --> 00:08:54,420
And what does it actually do.
135
00:08:54,450 --> 00:09:00,960
Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it
136
00:09:00,960 --> 00:09:07,430
into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns.
137
00:09:07,470 --> 00:09:11,770
So what it means here is that we just we just flattened this matrix.
138
00:09:11,820 --> 00:09:18,540
So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first
139
00:09:18,540 --> 00:09:24,410
12 here then second 12 and then just consecutive long basically a long row.
140
00:09:24,870 --> 00:09:28,810
And that becomes does this put box here.
141
00:09:29,100 --> 00:09:35,180
And now this up at Box here is fed into the fully connectedly here with 128 nodes.
142
00:09:35,430 --> 00:09:36,440
Asked about it here.
143
00:09:36,690 --> 00:09:37,690
So we did find it here.
144
00:09:37,710 --> 00:09:40,760
Each node was connected to a real two activation units.
145
00:09:41,220 --> 00:09:48,810
And then finally we connect this to our final Denslow with soft Max activation function and outputs
146
00:09:48,810 --> 00:09:54,210
to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes.
147
00:09:54,300 --> 00:09:56,290
So that's how we get our probabilities here.
148
00:09:56,700 --> 00:09:59,130
So it's not illustrated here but later on you will see it.
149
00:09:59,130 --> 00:10:03,140
So now that we've built our model we are ready to compile this model.
150
00:10:03,390 --> 00:10:08,640
So compiling simply creates an object that stores our model we've just created and we can specify all
151
00:10:08,640 --> 00:10:13,350
loss algorithm optimizer define our performance metric that we want to look at.
152
00:10:13,440 --> 00:10:18,090
And additionally we can specify parameters for the optimizer such as litigates and momentum.
153
00:10:18,090 --> 00:10:22,630
So this is a simple model that compile code here.
154
00:10:22,980 --> 00:10:29,610
We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic
155
00:10:29,610 --> 00:10:33,990
really innocent and we have a metric look at being defined as accuracy.
156
00:10:36,370 --> 00:10:39,000
So how do we fit in our model now.
157
00:10:39,010 --> 00:10:44,770
So following a simple basically what Eski line which is the most established and popular machine learning
158
00:10:44,770 --> 00:10:48,540
library on pite and private through senseful in Paris.
159
00:10:48,610 --> 00:10:52,350
Basically they did as applied model but that would be used to training data.
160
00:10:52,600 --> 00:10:56,130
The training labels extreme and waitron that's what they are.
161
00:10:56,140 --> 00:11:02,360
You'll see them in a code soon specified number of epochs and about say not about size.
162
00:11:02,560 --> 00:11:05,720
This doesn't impact learning that much significantly.
163
00:11:05,860 --> 00:11:11,470
How ever you basically should use a largest bedside size it's possible that your memory allows.
164
00:11:11,500 --> 00:11:17,420
So you can experiment and try it you'll know it if you try it too large a box that size your kernel
165
00:11:17,440 --> 00:11:18,250
will crash.
166
00:11:18,250 --> 00:11:22,640
So generally I tend to not avoid having Why couldn't I put it in a book Crash.
167
00:11:22,750 --> 00:11:29,030
So all is basically about size of 32 for pretty much smaller images or 16 for larger images.
168
00:11:31,450 --> 00:11:35,250
And once we do that we can now evaluates and generate predictions afterward.
169
00:11:35,290 --> 00:11:41,830
So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can
170
00:11:41,830 --> 00:11:47,140
get these metrics the metrics parameters metrics object and then we can use that to actually look at
171
00:11:47,140 --> 00:11:53,580
different graphs and if an interesting performance information from our model and if we wanted to ever
172
00:11:53,590 --> 00:11:59,410
predict an individual point like we have an image and you want to get the actual class it belongs to
173
00:12:00,040 --> 00:12:04,070
you can use model to predict model to predict allows us to feed one image at a time.
174
00:12:04,130 --> 00:12:07,070
We can feed the entire dataset here as well.
175
00:12:07,840 --> 00:12:09,660
So that's it's a let's get starting.
176
00:12:09,690 --> 00:12:13,030
So let's build our own handwritten digit classify.