1
00:00:00,960 --> 00:00:03,370
So welcome back to chapter eight point one.

2
00:00:03,390 --> 00:00:06,190
We introduce you to Chris and.

3
00:00:08,000 --> 00:00:09,660
So what exactly is Karris.

4
00:00:09,660 --> 00:00:15,060
I've mentioned it countless times on this course but I haven't actually told you precisely what it is.

5
00:00:15,200 --> 00:00:21,170
So careless is a high level neural network API for Python and it makes constructing neural networks

6
00:00:21,260 --> 00:00:25,370
all types not just CNNs but all types of neural networks.

7
00:00:25,490 --> 00:00:31,760
It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things

8
00:00:32,480 --> 00:00:36,770
configure your loss functions configure your different types of activation functions.

9
00:00:36,770 --> 00:00:38,720
It's quite nice.

10
00:00:38,870 --> 00:00:45,410
It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which

11
00:00:45,440 --> 00:00:47,180
I use use quite a bit of the data.

12
00:00:47,360 --> 00:00:51,690
However I've now moved intensively and I haven't looked back not because I it is bad.

13
00:00:51,930 --> 00:00:59,000
Just basically stopped is has stopped being updated now project has pretty much done and closed and

14
00:00:59,000 --> 00:01:02,710
everyone is probably pretty much adopting senseful now.

15
00:01:03,680 --> 00:01:09,860
And it was developed by Francois Shuli and he has been a tremendous success in making tippling much

16
00:01:09,860 --> 00:01:11,390
more accessible to the masses.

17
00:01:12,680 --> 00:01:16,820
So what is tons of little is said Chris used to tend to flow back in.

18
00:01:16,830 --> 00:01:18,400
But what exactly is a backhanded.

19
00:01:18,530 --> 00:01:19,370
OK.

20
00:01:19,620 --> 00:01:27,000
So then flow is an open source library that was created by Google Brind team by the Google brain team

21
00:01:27,030 --> 00:01:33,530
in 2015 probably was being used inside of Google internally for many years prior to 2015.

22
00:01:34,020 --> 00:01:39,750
And basically it's an extremely powerful extremely efficient and fast learning framework that is used

23
00:01:39,750 --> 00:01:46,080
for high performance numerical competition across a variety of platforms such as use GPS use and use

24
00:01:46,580 --> 00:01:52,320
it basically these engineers these guys at Google brain they developed this basically a superfast library

25
00:01:52,650 --> 00:01:57,160
similar to some PI but basically incorporated it and built it around deplaning.

26
00:01:57,180 --> 00:02:05,190
So you have all these the planning functions that are a part of the tensor flow framework to actually

27
00:02:05,190 --> 00:02:11,580
has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier

28
00:02:11,580 --> 00:02:12,200
to use.

29
00:02:13,500 --> 00:02:20,390
So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically

30
00:02:20,390 --> 00:02:24,640
a pythonic style of coding and it is extremely modular.

31
00:02:24,860 --> 00:02:30,350
You don't even have to be a proficient program to use Chris this more elaborate.

32
00:02:30,350 --> 00:02:36,010
He means that we can actually just start doing different things that are and neural nets are CNN's can

33
00:02:36,130 --> 00:02:42,670
easily change course functions optimizes initialization schemes activation functions try different regular

34
00:02:43,030 --> 00:02:48,580
sessions screams schemes to introduce more Leia's reduced number of filters.

35
00:02:48,580 --> 00:02:53,080
All of those things are super easily inside of course.

36
00:02:53,080 --> 00:02:56,710
So it allows us to build these powerful neural nets quickly and efficiently.

37
00:02:57,130 --> 00:03:01,350
And obviously it works in partem and Python is one of my favorite.

38
00:03:01,360 --> 00:03:06,370
Actually it is my favorite part of programming language ever because it makes so many complicated things

39
00:03:06,370 --> 00:03:13,010
super easy tensor is definitely not as user friendly and what you are scarce.

40
00:03:13,090 --> 00:03:18,510
I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite

41
00:03:18,510 --> 00:03:18,980
hard.

42
00:03:19,170 --> 00:03:21,620
So dancefloor seemed easy for me at that point.

43
00:03:21,690 --> 00:03:24,870
However nothing beats carious of ease of use.

44
00:03:24,990 --> 00:03:29,880
So unless you're doing complicated models or academic research or basically looking for some sort of

45
00:03:29,880 --> 00:03:35,420
high performance startup you don't necessarily need to use pure to answer for anything.

46
00:03:35,430 --> 00:03:38,630
So now you're ready ready to see some actual crosscourt.

47
00:03:38,640 --> 00:03:40,300
Well it's actually pretty good.

48
00:03:40,620 --> 00:03:46,980
But we're using the curus library and this is how we actually construct and build models inside of Titan

49
00:03:47,580 --> 00:03:54,210
so fiercely we import the sequential model from Karris basically the sequential model is the main type

50
00:03:54,210 --> 00:04:00,000
of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before

51
00:04:00,060 --> 00:04:01,730
with sequential models.

52
00:04:01,980 --> 00:04:05,340
If you're doing something exotic then it will probably be not be sequential.

53
00:04:05,340 --> 00:04:09,480
The data is real and probably beyond the scope of this course.

54
00:04:09,480 --> 00:04:12,110
Right now it's basically academic research.

55
00:04:12,480 --> 00:04:13,130
So let's move on.

56
00:04:13,140 --> 00:04:14,750
So we defined our model here.

57
00:04:14,760 --> 00:04:23,060
We initialize it by running this line of sequential these two brackets and then now let's add some convolutional

58
00:04:23,060 --> 00:04:24,500
layers to it.

59
00:04:24,500 --> 00:04:28,150
So before we do that we have to import these layers from Carrousel.

60
00:04:28,370 --> 00:04:34,380
So from Chris Dudley as we import dense dropout flatten conv to the max beling.

61
00:04:34,400 --> 00:04:37,710
Now I don't have dropout here but it's been used in the model.

62
00:04:37,730 --> 00:04:39,220
We actually are going to code.

63
00:04:39,290 --> 00:04:40,110
So I left it in.

64
00:04:40,170 --> 00:04:42,930
So you guys know it's how it's different to here.

65
00:04:42,940 --> 00:04:43,610
OK.

66
00:04:44,090 --> 00:04:48,370
So flawlessly modeled after this is we're adding are firstly a here.

67
00:04:48,410 --> 00:04:50,550
So firstly as a convert to the.

68
00:04:50,690 --> 00:04:51,560
That's what we call it.

69
00:04:51,590 --> 00:04:56,030
And now we have open brackets here and we have some parameters to fill out.

70
00:04:56,030 --> 00:04:57,830
So let's go to these parameters.

71
00:04:57,840 --> 00:04:59,300
This one is number 22.

72
00:04:59,360 --> 00:05:05,890
Then the kernel size we can see here then type activities and we use an input shape equals in shape.

73
00:05:05,930 --> 00:05:08,360
That's a bit confusing I'll explain it to you shortly.

74
00:05:08,600 --> 00:05:14,660
So let's go through each one first one here to the two that specifies specifying First the number of

75
00:05:14,660 --> 00:05:16,030
kernels or filters.

76
00:05:16,340 --> 00:05:24,710
So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using

77
00:05:25,500 --> 00:05:31,730
an input shape is called input shape and that's because previously above this could we declared we created

78
00:05:32,370 --> 00:05:38,400
shape parameter variable I should say that was twenty eight by 28 by 1.

79
00:05:38,720 --> 00:05:40,240
So in what shape we use.

80
00:05:40,550 --> 00:05:45,890
So I just left it out here for convenience because it's we tend to leave it out and does have a defined

81
00:05:45,980 --> 00:05:49,440
outside of the scope of this declaration here.

82
00:05:49,910 --> 00:05:51,230
Now to add another layer.

83
00:05:51,590 --> 00:05:52,990
It's as simple as modeled.

84
00:05:53,080 --> 00:05:55,280
And Chris is so easy to use.

85
00:05:55,280 --> 00:06:00,290
Just keep it using one add and it stacks easily at the top of each other starting with the first to

86
00:06:00,290 --> 00:06:01,650
the bottom layers.

87
00:06:01,670 --> 00:06:05,450
So now we add another convolutional Liya sixty four to the system.

88
00:06:05,490 --> 00:06:08,610
See him can on SES and not we don't need to specify size.

89
00:06:08,620 --> 00:06:12,660
Everytime we do it it knows that a second parameter is kernel size.

90
00:06:12,690 --> 00:06:13,770
So it's tree by tree.

91
00:06:13,790 --> 00:06:15,760
And again it Activision really.

92
00:06:16,280 --> 00:06:19,570
And now you've noticed we don't need to specify an input shape here.

93
00:06:19,910 --> 00:06:20,690
And do you know why.

94
00:06:20,780 --> 00:06:23,840
That's because this leads directly connected to display.

95
00:06:23,930 --> 00:06:27,380
So it ticks the output of this layer and it knows the output.

96
00:06:27,380 --> 00:06:31,290
So the output of this layer is effectively the input into this layer.

97
00:06:31,670 --> 00:06:34,150
So we no longer have to declare inputs anymore.

98
00:06:35,740 --> 00:06:37,710
So now you can add a max blink.

99
00:06:38,000 --> 00:06:40,820
And I said before we are going to use Emacs beling of two by two.

100
00:06:41,140 --> 00:06:42,170
So that's simple here.

101
00:06:42,170 --> 00:06:49,340
We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets.

102
00:06:49,340 --> 00:06:53,720
For this here close brackets for this here and then we do a Flaten.

103
00:06:53,830 --> 00:06:56,220
I haven't discussed flattened flattens nationally.

104
00:06:56,220 --> 00:06:57,300
It's basically a function.

105
00:06:57,300 --> 00:07:01,350
We do that we use to feed the densely or fully connectedly.

106
00:07:01,710 --> 00:07:05,540
You'll see it visually in the next diagram on the following slide.

107
00:07:05,740 --> 00:07:11,940
Basically we then add a densely here with 128 units all the activated by real.

108
00:07:12,300 --> 00:07:16,760
And this is connected to another densely here which outputs number of classes.

109
00:07:16,830 --> 00:07:23,510
So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it

110
00:07:23,510 --> 00:07:30,360
for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite

111
00:07:30,360 --> 00:07:32,510
simple because to me this is quite basic.

112
00:07:32,610 --> 00:07:37,190
You may take a while to get familiar with how these things work but you will get used to it in this

113
00:07:37,200 --> 00:07:38,790
course I guarantee it.

114
00:07:38,790 --> 00:07:40,800
So let's take a look at what we've built here.

115
00:07:41,230 --> 00:07:41,540
OK.

116
00:07:41,580 --> 00:07:44,880
So this is actually what we have built so far.

117
00:07:44,880 --> 00:07:50,090
So we have an input image here 20 by 20 and by 1 1 because it's greyscale.

118
00:07:50,190 --> 00:07:54,440
If it was a color image R.G. image it would be tree depth here.

119
00:07:54,780 --> 00:08:02,250
So as you saw before we have 22 filters here connected to another conflict with 64 filters here.

120
00:08:02,490 --> 00:08:08,850
And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in

121
00:08:08,850 --> 00:08:10,490
24 by 24.

122
00:08:10,590 --> 00:08:12,770
And that's because we didn't use any zero padding.

123
00:08:12,840 --> 00:08:15,980
And I'll show you guys later on in court how to do your reporting.

124
00:08:15,990 --> 00:08:16,910
It's quite simple.

125
00:08:17,100 --> 00:08:22,450
But for now just remember when you don't use your reporting or input image or convolutional feature

126
00:08:22,450 --> 00:08:25,530
map size reduces from the input image size.

127
00:08:25,530 --> 00:08:28,950
So we have these two Mattew currently stacked here.

128
00:08:28,950 --> 00:08:33,000
Then we have OMX pooling which basically shrinks this by half.

129
00:08:33,180 --> 00:08:34,360
I have a Drapeau thing here.

130
00:08:34,500 --> 00:08:39,330
But we didn't actually use dropout before in the code but in the actual code when we started a project

131
00:08:39,330 --> 00:08:44,720
I'll show you quickly how to actually implement dropout in one line super easy.

132
00:08:44,820 --> 00:08:48,260
What I wanted to show you do was we have to flatten a here.

133
00:08:48,490 --> 00:08:52,890
Nada Flannelly if you go back to here we just flatten brackets.

134
00:08:52,950 --> 00:08:54,420
And what does it actually do.

135
00:08:54,450 --> 00:09:00,960
Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it

136
00:09:00,960 --> 00:09:07,430
into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns.

137
00:09:07,470 --> 00:09:11,770
So what it means here is that we just we just flattened this matrix.

138
00:09:11,820 --> 00:09:18,540
So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first

139
00:09:18,540 --> 00:09:24,410
12 here then second 12 and then just consecutive long basically a long row.

140
00:09:24,870 --> 00:09:28,810
And that becomes does this put box here.

141
00:09:29,100 --> 00:09:35,180
And now this up at Box here is fed into the fully connectedly here with 128 nodes.

142
00:09:35,430 --> 00:09:36,440
Asked about it here.

143
00:09:36,690 --> 00:09:37,690
So we did find it here.

144
00:09:37,710 --> 00:09:40,760
Each node was connected to a real two activation units.

145
00:09:41,220 --> 00:09:48,810
And then finally we connect this to our final Denslow with soft Max activation function and outputs

146
00:09:48,810 --> 00:09:54,210
to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes.

147
00:09:54,300 --> 00:09:56,290
So that's how we get our probabilities here.

148
00:09:56,700 --> 00:09:59,130
So it's not illustrated here but later on you will see it.

149
00:09:59,130 --> 00:10:03,140
So now that we've built our model we are ready to compile this model.

150
00:10:03,390 --> 00:10:08,640
So compiling simply creates an object that stores our model we've just created and we can specify all

151
00:10:08,640 --> 00:10:13,350
loss algorithm optimizer define our performance metric that we want to look at.

152
00:10:13,440 --> 00:10:18,090
And additionally we can specify parameters for the optimizer such as litigates and momentum.

153
00:10:18,090 --> 00:10:22,630
So this is a simple model that compile code here.

154
00:10:22,980 --> 00:10:29,610
We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic

155
00:10:29,610 --> 00:10:33,990
really innocent and we have a metric look at being defined as accuracy.

156
00:10:36,370 --> 00:10:39,000
So how do we fit in our model now.

157
00:10:39,010 --> 00:10:44,770
So following a simple basically what Eski line which is the most established and popular machine learning

158
00:10:44,770 --> 00:10:48,540
library on pite and private through senseful in Paris.

159
00:10:48,610 --> 00:10:52,350
Basically they did as applied model but that would be used to training data.

160
00:10:52,600 --> 00:10:56,130
The training labels extreme and waitron that's what they are.

161
00:10:56,140 --> 00:11:02,360
You'll see them in a code soon specified number of epochs and about say not about size.

162
00:11:02,560 --> 00:11:05,720
This doesn't impact learning that much significantly.

163
00:11:05,860 --> 00:11:11,470
How ever you basically should use a largest bedside size it's possible that your memory allows.

164
00:11:11,500 --> 00:11:17,420
So you can experiment and try it you'll know it if you try it too large a box that size your kernel

165
00:11:17,440 --> 00:11:18,250
will crash.

166
00:11:18,250 --> 00:11:22,640
So generally I tend to not avoid having Why couldn't I put it in a book Crash.

167
00:11:22,750 --> 00:11:29,030
So all is basically about size of 32 for pretty much smaller images or 16 for larger images.

168
00:11:31,450 --> 00:11:35,250
And once we do that we can now evaluates and generate predictions afterward.

169
00:11:35,290 --> 00:11:41,830
So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can

170
00:11:41,830 --> 00:11:47,140
get these metrics the metrics parameters metrics object and then we can use that to actually look at

171
00:11:47,140 --> 00:11:53,580
different graphs and if an interesting performance information from our model and if we wanted to ever

172
00:11:53,590 --> 00:11:59,410
predict an individual point like we have an image and you want to get the actual class it belongs to

173
00:12:00,040 --> 00:12:04,070
you can use model to predict model to predict allows us to feed one image at a time.

174
00:12:04,130 --> 00:12:07,070
We can feed the entire dataset here as well.

175
00:12:07,840 --> 00:12:09,660
So that's it's a let's get starting.

176
00:12:09,690 --> 00:12:13,030
So let's build our own handwritten digit classify.