File size: 18,188 Bytes
d157f08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
1
00:00:00,960 --> 00:00:03,370
So welcome back to chapter eight point one.

2
00:00:03,390 --> 00:00:06,190
We introduce you to Chris and.

3
00:00:08,000 --> 00:00:09,660
So what exactly is Karris.

4
00:00:09,660 --> 00:00:15,060
I've mentioned it countless times on this course but I haven't actually told you precisely what it is.

5
00:00:15,200 --> 00:00:21,170
So careless is a high level neural network API for Python and it makes constructing neural networks

6
00:00:21,260 --> 00:00:25,370
all types not just CNNs but all types of neural networks.

7
00:00:25,490 --> 00:00:31,760
It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things

8
00:00:32,480 --> 00:00:36,770
configure your loss functions configure your different types of activation functions.

9
00:00:36,770 --> 00:00:38,720
It's quite nice.

10
00:00:38,870 --> 00:00:45,410
It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which

11
00:00:45,440 --> 00:00:47,180
I use use quite a bit of the data.

12
00:00:47,360 --> 00:00:51,690
However I've now moved intensively and I haven't looked back not because I it is bad.

13
00:00:51,930 --> 00:00:59,000
Just basically stopped is has stopped being updated now project has pretty much done and closed and

14
00:00:59,000 --> 00:01:02,710
everyone is probably pretty much adopting senseful now.

15
00:01:03,680 --> 00:01:09,860
And it was developed by Francois Shuli and he has been a tremendous success in making tippling much

16
00:01:09,860 --> 00:01:11,390
more accessible to the masses.

17
00:01:12,680 --> 00:01:16,820
So what is tons of little is said Chris used to tend to flow back in.

18
00:01:16,830 --> 00:01:18,400
But what exactly is a backhanded.

19
00:01:18,530 --> 00:01:19,370
OK.

20
00:01:19,620 --> 00:01:27,000
So then flow is an open source library that was created by Google Brind team by the Google brain team

21
00:01:27,030 --> 00:01:33,530
in 2015 probably was being used inside of Google internally for many years prior to 2015.

22
00:01:34,020 --> 00:01:39,750
And basically it's an extremely powerful extremely efficient and fast learning framework that is used

23
00:01:39,750 --> 00:01:46,080
for high performance numerical competition across a variety of platforms such as use GPS use and use

24
00:01:46,580 --> 00:01:52,320
it basically these engineers these guys at Google brain they developed this basically a superfast library

25
00:01:52,650 --> 00:01:57,160
similar to some PI but basically incorporated it and built it around deplaning.

26
00:01:57,180 --> 00:02:05,190
So you have all these the planning functions that are a part of the tensor flow framework to actually

27
00:02:05,190 --> 00:02:11,580
has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier

28
00:02:11,580 --> 00:02:12,200
to use.

29
00:02:13,500 --> 00:02:20,390
So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically

30
00:02:20,390 --> 00:02:24,640
a pythonic style of coding and it is extremely modular.

31
00:02:24,860 --> 00:02:30,350
You don't even have to be a proficient program to use Chris this more elaborate.

32
00:02:30,350 --> 00:02:36,010
He means that we can actually just start doing different things that are and neural nets are CNN's can

33
00:02:36,130 --> 00:02:42,670
easily change course functions optimizes initialization schemes activation functions try different regular

34
00:02:43,030 --> 00:02:48,580
sessions screams schemes to introduce more Leia's reduced number of filters.

35
00:02:48,580 --> 00:02:53,080
All of those things are super easily inside of course.

36
00:02:53,080 --> 00:02:56,710
So it allows us to build these powerful neural nets quickly and efficiently.

37
00:02:57,130 --> 00:03:01,350
And obviously it works in partem and Python is one of my favorite.

38
00:03:01,360 --> 00:03:06,370
Actually it is my favorite part of programming language ever because it makes so many complicated things

39
00:03:06,370 --> 00:03:13,010
super easy tensor is definitely not as user friendly and what you are scarce.

40
00:03:13,090 --> 00:03:18,510
I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite

41
00:03:18,510 --> 00:03:18,980
hard.

42
00:03:19,170 --> 00:03:21,620
So dancefloor seemed easy for me at that point.

43
00:03:21,690 --> 00:03:24,870
However nothing beats carious of ease of use.

44
00:03:24,990 --> 00:03:29,880
So unless you're doing complicated models or academic research or basically looking for some sort of

45
00:03:29,880 --> 00:03:35,420
high performance startup you don't necessarily need to use pure to answer for anything.

46
00:03:35,430 --> 00:03:38,630
So now you're ready ready to see some actual crosscourt.

47
00:03:38,640 --> 00:03:40,300
Well it's actually pretty good.

48
00:03:40,620 --> 00:03:46,980
But we're using the curus library and this is how we actually construct and build models inside of Titan

49
00:03:47,580 --> 00:03:54,210
so fiercely we import the sequential model from Karris basically the sequential model is the main type

50
00:03:54,210 --> 00:04:00,000
of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before

51
00:04:00,060 --> 00:04:01,730
with sequential models.

52
00:04:01,980 --> 00:04:05,340
If you're doing something exotic then it will probably be not be sequential.

53
00:04:05,340 --> 00:04:09,480
The data is real and probably beyond the scope of this course.

54
00:04:09,480 --> 00:04:12,110
Right now it's basically academic research.

55
00:04:12,480 --> 00:04:13,130
So let's move on.

56
00:04:13,140 --> 00:04:14,750
So we defined our model here.

57
00:04:14,760 --> 00:04:23,060
We initialize it by running this line of sequential these two brackets and then now let's add some convolutional

58
00:04:23,060 --> 00:04:24,500
layers to it.

59
00:04:24,500 --> 00:04:28,150
So before we do that we have to import these layers from Carrousel.

60
00:04:28,370 --> 00:04:34,380
So from Chris Dudley as we import dense dropout flatten conv to the max beling.

61
00:04:34,400 --> 00:04:37,710
Now I don't have dropout here but it's been used in the model.

62
00:04:37,730 --> 00:04:39,220
We actually are going to code.

63
00:04:39,290 --> 00:04:40,110
So I left it in.

64
00:04:40,170 --> 00:04:42,930
So you guys know it's how it's different to here.

65
00:04:42,940 --> 00:04:43,610
OK.

66
00:04:44,090 --> 00:04:48,370
So flawlessly modeled after this is we're adding are firstly a here.

67
00:04:48,410 --> 00:04:50,550
So firstly as a convert to the.

68
00:04:50,690 --> 00:04:51,560
That's what we call it.

69
00:04:51,590 --> 00:04:56,030
And now we have open brackets here and we have some parameters to fill out.

70
00:04:56,030 --> 00:04:57,830
So let's go to these parameters.

71
00:04:57,840 --> 00:04:59,300
This one is number 22.

72
00:04:59,360 --> 00:05:05,890
Then the kernel size we can see here then type activities and we use an input shape equals in shape.

73
00:05:05,930 --> 00:05:08,360
That's a bit confusing I'll explain it to you shortly.

74
00:05:08,600 --> 00:05:14,660
So let's go through each one first one here to the two that specifies specifying First the number of

75
00:05:14,660 --> 00:05:16,030
kernels or filters.

76
00:05:16,340 --> 00:05:24,710
So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using

77
00:05:25,500 --> 00:05:31,730
an input shape is called input shape and that's because previously above this could we declared we created

78
00:05:32,370 --> 00:05:38,400
shape parameter variable I should say that was twenty eight by 28 by 1.

79
00:05:38,720 --> 00:05:40,240
So in what shape we use.

80
00:05:40,550 --> 00:05:45,890
So I just left it out here for convenience because it's we tend to leave it out and does have a defined

81
00:05:45,980 --> 00:05:49,440
outside of the scope of this declaration here.

82
00:05:49,910 --> 00:05:51,230
Now to add another layer.

83
00:05:51,590 --> 00:05:52,990
It's as simple as modeled.

84
00:05:53,080 --> 00:05:55,280
And Chris is so easy to use.

85
00:05:55,280 --> 00:06:00,290
Just keep it using one add and it stacks easily at the top of each other starting with the first to

86
00:06:00,290 --> 00:06:01,650
the bottom layers.

87
00:06:01,670 --> 00:06:05,450
So now we add another convolutional Liya sixty four to the system.

88
00:06:05,490 --> 00:06:08,610
See him can on SES and not we don't need to specify size.

89
00:06:08,620 --> 00:06:12,660
Everytime we do it it knows that a second parameter is kernel size.

90
00:06:12,690 --> 00:06:13,770
So it's tree by tree.

91
00:06:13,790 --> 00:06:15,760
And again it Activision really.

92
00:06:16,280 --> 00:06:19,570
And now you've noticed we don't need to specify an input shape here.

93
00:06:19,910 --> 00:06:20,690
And do you know why.

94
00:06:20,780 --> 00:06:23,840
That's because this leads directly connected to display.

95
00:06:23,930 --> 00:06:27,380
So it ticks the output of this layer and it knows the output.

96
00:06:27,380 --> 00:06:31,290
So the output of this layer is effectively the input into this layer.

97
00:06:31,670 --> 00:06:34,150
So we no longer have to declare inputs anymore.

98
00:06:35,740 --> 00:06:37,710
So now you can add a max blink.

99
00:06:38,000 --> 00:06:40,820
And I said before we are going to use Emacs beling of two by two.

100
00:06:41,140 --> 00:06:42,170
So that's simple here.

101
00:06:42,170 --> 00:06:49,340
We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets.

102
00:06:49,340 --> 00:06:53,720
For this here close brackets for this here and then we do a Flaten.

103
00:06:53,830 --> 00:06:56,220
I haven't discussed flattened flattens nationally.

104
00:06:56,220 --> 00:06:57,300
It's basically a function.

105
00:06:57,300 --> 00:07:01,350
We do that we use to feed the densely or fully connectedly.

106
00:07:01,710 --> 00:07:05,540
You'll see it visually in the next diagram on the following slide.

107
00:07:05,740 --> 00:07:11,940
Basically we then add a densely here with 128 units all the activated by real.

108
00:07:12,300 --> 00:07:16,760
And this is connected to another densely here which outputs number of classes.

109
00:07:16,830 --> 00:07:23,510
So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it

110
00:07:23,510 --> 00:07:30,360
for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite

111
00:07:30,360 --> 00:07:32,510
simple because to me this is quite basic.

112
00:07:32,610 --> 00:07:37,190
You may take a while to get familiar with how these things work but you will get used to it in this

113
00:07:37,200 --> 00:07:38,790
course I guarantee it.

114
00:07:38,790 --> 00:07:40,800
So let's take a look at what we've built here.

115
00:07:41,230 --> 00:07:41,540
OK.

116
00:07:41,580 --> 00:07:44,880
So this is actually what we have built so far.

117
00:07:44,880 --> 00:07:50,090
So we have an input image here 20 by 20 and by 1 1 because it's greyscale.

118
00:07:50,190 --> 00:07:54,440
If it was a color image R.G. image it would be tree depth here.

119
00:07:54,780 --> 00:08:02,250
So as you saw before we have 22 filters here connected to another conflict with 64 filters here.

120
00:08:02,490 --> 00:08:08,850
And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in

121
00:08:08,850 --> 00:08:10,490
24 by 24.

122
00:08:10,590 --> 00:08:12,770
And that's because we didn't use any zero padding.

123
00:08:12,840 --> 00:08:15,980
And I'll show you guys later on in court how to do your reporting.

124
00:08:15,990 --> 00:08:16,910
It's quite simple.

125
00:08:17,100 --> 00:08:22,450
But for now just remember when you don't use your reporting or input image or convolutional feature

126
00:08:22,450 --> 00:08:25,530
map size reduces from the input image size.

127
00:08:25,530 --> 00:08:28,950
So we have these two Mattew currently stacked here.

128
00:08:28,950 --> 00:08:33,000
Then we have OMX pooling which basically shrinks this by half.

129
00:08:33,180 --> 00:08:34,360
I have a Drapeau thing here.

130
00:08:34,500 --> 00:08:39,330
But we didn't actually use dropout before in the code but in the actual code when we started a project

131
00:08:39,330 --> 00:08:44,720
I'll show you quickly how to actually implement dropout in one line super easy.

132
00:08:44,820 --> 00:08:48,260
What I wanted to show you do was we have to flatten a here.

133
00:08:48,490 --> 00:08:52,890
Nada Flannelly if you go back to here we just flatten brackets.

134
00:08:52,950 --> 00:08:54,420
And what does it actually do.

135
00:08:54,450 --> 00:09:00,960
Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it

136
00:09:00,960 --> 00:09:07,430
into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns.

137
00:09:07,470 --> 00:09:11,770
So what it means here is that we just we just flattened this matrix.

138
00:09:11,820 --> 00:09:18,540
So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first

139
00:09:18,540 --> 00:09:24,410
12 here then second 12 and then just consecutive long basically a long row.

140
00:09:24,870 --> 00:09:28,810
And that becomes does this put box here.

141
00:09:29,100 --> 00:09:35,180
And now this up at Box here is fed into the fully connectedly here with 128 nodes.

142
00:09:35,430 --> 00:09:36,440
Asked about it here.

143
00:09:36,690 --> 00:09:37,690
So we did find it here.

144
00:09:37,710 --> 00:09:40,760
Each node was connected to a real two activation units.

145
00:09:41,220 --> 00:09:48,810
And then finally we connect this to our final Denslow with soft Max activation function and outputs

146
00:09:48,810 --> 00:09:54,210
to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes.

147
00:09:54,300 --> 00:09:56,290
So that's how we get our probabilities here.

148
00:09:56,700 --> 00:09:59,130
So it's not illustrated here but later on you will see it.

149
00:09:59,130 --> 00:10:03,140
So now that we've built our model we are ready to compile this model.

150
00:10:03,390 --> 00:10:08,640
So compiling simply creates an object that stores our model we've just created and we can specify all

151
00:10:08,640 --> 00:10:13,350
loss algorithm optimizer define our performance metric that we want to look at.

152
00:10:13,440 --> 00:10:18,090
And additionally we can specify parameters for the optimizer such as litigates and momentum.

153
00:10:18,090 --> 00:10:22,630
So this is a simple model that compile code here.

154
00:10:22,980 --> 00:10:29,610
We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic

155
00:10:29,610 --> 00:10:33,990
really innocent and we have a metric look at being defined as accuracy.

156
00:10:36,370 --> 00:10:39,000
So how do we fit in our model now.

157
00:10:39,010 --> 00:10:44,770
So following a simple basically what Eski line which is the most established and popular machine learning

158
00:10:44,770 --> 00:10:48,540
library on pite and private through senseful in Paris.

159
00:10:48,610 --> 00:10:52,350
Basically they did as applied model but that would be used to training data.

160
00:10:52,600 --> 00:10:56,130
The training labels extreme and waitron that's what they are.

161
00:10:56,140 --> 00:11:02,360
You'll see them in a code soon specified number of epochs and about say not about size.

162
00:11:02,560 --> 00:11:05,720
This doesn't impact learning that much significantly.

163
00:11:05,860 --> 00:11:11,470
How ever you basically should use a largest bedside size it's possible that your memory allows.

164
00:11:11,500 --> 00:11:17,420
So you can experiment and try it you'll know it if you try it too large a box that size your kernel

165
00:11:17,440 --> 00:11:18,250
will crash.

166
00:11:18,250 --> 00:11:22,640
So generally I tend to not avoid having Why couldn't I put it in a book Crash.

167
00:11:22,750 --> 00:11:29,030
So all is basically about size of 32 for pretty much smaller images or 16 for larger images.

168
00:11:31,450 --> 00:11:35,250
And once we do that we can now evaluates and generate predictions afterward.

169
00:11:35,290 --> 00:11:41,830
So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can

170
00:11:41,830 --> 00:11:47,140
get these metrics the metrics parameters metrics object and then we can use that to actually look at

171
00:11:47,140 --> 00:11:53,580
different graphs and if an interesting performance information from our model and if we wanted to ever

172
00:11:53,590 --> 00:11:59,410
predict an individual point like we have an image and you want to get the actual class it belongs to

173
00:12:00,040 --> 00:12:04,070
you can use model to predict model to predict allows us to feed one image at a time.

174
00:12:04,130 --> 00:12:07,070
We can feed the entire dataset here as well.

175
00:12:07,840 --> 00:12:09,660
So that's it's a let's get starting.

176
00:12:09,690 --> 00:12:13,030
So let's build our own handwritten digit classify.