AI_DL_Assignment / 8. Build CNNs in Python using Keras /2. Introduction to Keras & Tensorflow.srt

Add files using upload-large-folder tool

d157f08 verified 3 months ago

18.2 kB

	1
	00:00:00,960 --> 00:00:03,370
	So welcome back to chapter eight point one.

	2
	00:00:03,390 --> 00:00:06,190
	We introduce you to Chris and.

	3
	00:00:08,000 --> 00:00:09,660
	So what exactly is Karris.

	4
	00:00:09,660 --> 00:00:15,060
	I've mentioned it countless times on this course but I haven't actually told you precisely what it is.

	5
	00:00:15,200 --> 00:00:21,170
	So careless is a high level neural network API for Python and it makes constructing neural networks

	6
	00:00:21,260 --> 00:00:25,370
	all types not just CNNs but all types of neural networks.

	7
	00:00:25,490 --> 00:00:31,760
	It makes it extremely easy and extremely Margiela just to Adli as Swapan stuff change different things

	8
	00:00:32,480 --> 00:00:36,770
	configure your loss functions configure your different types of activation functions.

	9
	00:00:36,770 --> 00:00:38,720
	It's quite nice.

	10
	00:00:38,870 --> 00:00:45,410
	It has the ability to use of flow S.A.G. which is used for natural language processing and Tiano which

	11
	00:00:45,440 --> 00:00:47,180
	I use use quite a bit of the data.

	12
	00:00:47,360 --> 00:00:51,690
	However I've now moved intensively and I haven't looked back not because I it is bad.

	13
	00:00:51,930 --> 00:00:59,000
	Just basically stopped is has stopped being updated now project has pretty much done and closed and

	14
	00:00:59,000 --> 00:01:02,710
	everyone is probably pretty much adopting senseful now.

	15
	00:01:03,680 --> 00:01:09,860
	And it was developed by Francois Shuli and he has been a tremendous success in making tippling much

	16
	00:01:09,860 --> 00:01:11,390
	more accessible to the masses.

	17
	00:01:12,680 --> 00:01:16,820
	So what is tons of little is said Chris used to tend to flow back in.

	18
	00:01:16,830 --> 00:01:18,400
	But what exactly is a backhanded.

	19
	00:01:18,530 --> 00:01:19,370
	OK.

	20
	00:01:19,620 --> 00:01:27,000
	So then flow is an open source library that was created by Google Brind team by the Google brain team

	21
	00:01:27,030 --> 00:01:33,530
	in 2015 probably was being used inside of Google internally for many years prior to 2015.

	22
	00:01:34,020 --> 00:01:39,750
	And basically it's an extremely powerful extremely efficient and fast learning framework that is used

	23
	00:01:39,750 --> 00:01:46,080
	for high performance numerical competition across a variety of platforms such as use GPS use and use

	24
	00:01:46,580 --> 00:01:52,320
	it basically these engineers these guys at Google brain they developed this basically a superfast library

	25
	00:01:52,650 --> 00:01:57,160
	similar to some PI but basically incorporated it and built it around deplaning.

	26
	00:01:57,180 --> 00:02:05,190
	So you have all these the planning functions that are a part of the tensor flow framework to actually

	27
	00:02:05,190 --> 00:02:11,580
	has a pite an API and a bit it pretty much is accessible and easy to use but carious is much easier

	28
	00:02:11,580 --> 00:02:12,200
	to use.

	29
	00:02:13,500 --> 00:02:20,390
	So why use Kerrison set of pure incivil as I said Chris is extremely easy to use as a father as a basically

	30
	00:02:20,390 --> 00:02:24,640
	a pythonic style of coding and it is extremely modular.

	31
	00:02:24,860 --> 00:02:30,350
	You don't even have to be a proficient program to use Chris this more elaborate.

	32
	00:02:30,350 --> 00:02:36,010
	He means that we can actually just start doing different things that are and neural nets are CNN's can

	33
	00:02:36,130 --> 00:02:42,670
	easily change course functions optimizes initialization schemes activation functions try different regular

	34
	00:02:43,030 --> 00:02:48,580
	sessions screams schemes to introduce more Leia's reduced number of filters.

	35
	00:02:48,580 --> 00:02:53,080
	All of those things are super easily inside of course.

	36
	00:02:53,080 --> 00:02:56,710
	So it allows us to build these powerful neural nets quickly and efficiently.

	37
	00:02:57,130 --> 00:03:01,350
	And obviously it works in partem and Python is one of my favorite.

	38
	00:03:01,360 --> 00:03:06,370
	Actually it is my favorite part of programming language ever because it makes so many complicated things

	39
	00:03:06,370 --> 00:03:13,010
	super easy tensor is definitely not as user friendly and what you are scarce.

	40
	00:03:13,090 --> 00:03:18,510
	I've used a little bit to be fair but I was using Tanno at that point and Ti-Anna actually was quite

	41
	00:03:18,510 --> 00:03:18,980
	hard.

	42
	00:03:19,170 --> 00:03:21,620
	So dancefloor seemed easy for me at that point.

	43
	00:03:21,690 --> 00:03:24,870
	However nothing beats carious of ease of use.

	44
	00:03:24,990 --> 00:03:29,880
	So unless you're doing complicated models or academic research or basically looking for some sort of

	45
	00:03:29,880 --> 00:03:35,420
	high performance startup you don't necessarily need to use pure to answer for anything.

	46
	00:03:35,430 --> 00:03:38,630
	So now you're ready ready to see some actual crosscourt.

	47
	00:03:38,640 --> 00:03:40,300
	Well it's actually pretty good.

	48
	00:03:40,620 --> 00:03:46,980
	But we're using the curus library and this is how we actually construct and build models inside of Titan

	49
	00:03:47,580 --> 00:03:54,210
	so fiercely we import the sequential model from Karris basically the sequential model is the main type

	50
	00:03:54,210 --> 00:04:00,000
	of model you'll be building in Paris is basically all the CNNs and neural nets I've shown you before

	51
	00:04:00,060 --> 00:04:01,730
	with sequential models.

	52
	00:04:01,980 --> 00:04:05,340
	If you're doing something exotic then it will probably be not be sequential.

	53
	00:04:05,340 --> 00:04:09,480
	The data is real and probably beyond the scope of this course.

	54
	00:04:09,480 --> 00:04:12,110
	Right now it's basically academic research.

	55
	00:04:12,480 --> 00:04:13,130
	So let's move on.

	56
	00:04:13,140 --> 00:04:14,750
	So we defined our model here.

	57
	00:04:14,760 --> 00:04:23,060
	We initialize it by running this line of sequential these two brackets and then now let's add some convolutional

	58
	00:04:23,060 --> 00:04:24,500
	layers to it.

	59
	00:04:24,500 --> 00:04:28,150
	So before we do that we have to import these layers from Carrousel.

	60
	00:04:28,370 --> 00:04:34,380
	So from Chris Dudley as we import dense dropout flatten conv to the max beling.

	61
	00:04:34,400 --> 00:04:37,710
	Now I don't have dropout here but it's been used in the model.

	62
	00:04:37,730 --> 00:04:39,220
	We actually are going to code.

	63
	00:04:39,290 --> 00:04:40,110
	So I left it in.

	64
	00:04:40,170 --> 00:04:42,930
	So you guys know it's how it's different to here.

	65
	00:04:42,940 --> 00:04:43,610
	OK.

	66
	00:04:44,090 --> 00:04:48,370
	So flawlessly modeled after this is we're adding are firstly a here.

	67
	00:04:48,410 --> 00:04:50,550
	So firstly as a convert to the.

	68
	00:04:50,690 --> 00:04:51,560
	That's what we call it.

	69
	00:04:51,590 --> 00:04:56,030
	And now we have open brackets here and we have some parameters to fill out.

	70
	00:04:56,030 --> 00:04:57,830
	So let's go to these parameters.

	71
	00:04:57,840 --> 00:04:59,300
	This one is number 22.

	72
	00:04:59,360 --> 00:05:05,890
	Then the kernel size we can see here then type activities and we use an input shape equals in shape.

	73
	00:05:05,930 --> 00:05:08,360
	That's a bit confusing I'll explain it to you shortly.

	74
	00:05:08,600 --> 00:05:14,660
	So let's go through each one first one here to the two that specifies specifying First the number of

	75
	00:05:14,660 --> 00:05:16,030
	kernels or filters.

	76
	00:05:16,340 --> 00:05:24,710
	So in our first Leo we're using 22 filters of kernel size tree by tree with an activation function using

	77
	00:05:25,500 --> 00:05:31,730
	an input shape is called input shape and that's because previously above this could we declared we created

	78
	00:05:32,370 --> 00:05:38,400
	shape parameter variable I should say that was twenty eight by 28 by 1.

	79
	00:05:38,720 --> 00:05:40,240
	So in what shape we use.

	80
	00:05:40,550 --> 00:05:45,890
	So I just left it out here for convenience because it's we tend to leave it out and does have a defined

	81
	00:05:45,980 --> 00:05:49,440
	outside of the scope of this declaration here.

	82
	00:05:49,910 --> 00:05:51,230
	Now to add another layer.

	83
	00:05:51,590 --> 00:05:52,990
	It's as simple as modeled.

	84
	00:05:53,080 --> 00:05:55,280
	And Chris is so easy to use.

	85
	00:05:55,280 --> 00:06:00,290
	Just keep it using one add and it stacks easily at the top of each other starting with the first to

	86
	00:06:00,290 --> 00:06:01,650
	the bottom layers.

	87
	00:06:01,670 --> 00:06:05,450
	So now we add another convolutional Liya sixty four to the system.

	88
	00:06:05,490 --> 00:06:08,610
	See him can on SES and not we don't need to specify size.

	89
	00:06:08,620 --> 00:06:12,660
	Everytime we do it it knows that a second parameter is kernel size.

	90
	00:06:12,690 --> 00:06:13,770
	So it's tree by tree.

	91
	00:06:13,790 --> 00:06:15,760
	And again it Activision really.

	92
	00:06:16,280 --> 00:06:19,570
	And now you've noticed we don't need to specify an input shape here.

	93
	00:06:19,910 --> 00:06:20,690
	And do you know why.

	94
	00:06:20,780 --> 00:06:23,840
	That's because this leads directly connected to display.

	95
	00:06:23,930 --> 00:06:27,380
	So it ticks the output of this layer and it knows the output.

	96
	00:06:27,380 --> 00:06:31,290
	So the output of this layer is effectively the input into this layer.

	97
	00:06:31,670 --> 00:06:34,150
	So we no longer have to declare inputs anymore.

	98
	00:06:35,740 --> 00:06:37,710
	So now you can add a max blink.

	99
	00:06:38,000 --> 00:06:40,820
	And I said before we are going to use Emacs beling of two by two.

	100
	00:06:41,140 --> 00:06:42,170
	So that's simple here.

	101
	00:06:42,170 --> 00:06:49,340
	We have modeled that Max spooling D and specify the pool size open brackets to by to close brackets.

	102
	00:06:49,340 --> 00:06:53,720
	For this here close brackets for this here and then we do a Flaten.

	103
	00:06:53,830 --> 00:06:56,220
	I haven't discussed flattened flattens nationally.

	104
	00:06:56,220 --> 00:06:57,300
	It's basically a function.

	105
	00:06:57,300 --> 00:07:01,350
	We do that we use to feed the densely or fully connectedly.

	106
	00:07:01,710 --> 00:07:05,540
	You'll see it visually in the next diagram on the following slide.

	107
	00:07:05,740 --> 00:07:11,940
	Basically we then add a densely here with 128 units all the activated by real.

	108
	00:07:12,300 --> 00:07:16,760
	And this is connected to another densely here which outputs number of classes.

	109
	00:07:16,830 --> 00:07:23,510
	So classes in this dataset here were 10 because we're using the amnesty to set 0 to 9 and we use it

	110
	00:07:23,510 --> 00:07:30,360
	for soft Maxtor activation to get the basically the probabilities that I hope you think this is quite

	111
	00:07:30,360 --> 00:07:32,510
	simple because to me this is quite basic.

	112
	00:07:32,610 --> 00:07:37,190
	You may take a while to get familiar with how these things work but you will get used to it in this

	113
	00:07:37,200 --> 00:07:38,790
	course I guarantee it.

	114
	00:07:38,790 --> 00:07:40,800
	So let's take a look at what we've built here.

	115
	00:07:41,230 --> 00:07:41,540
	OK.

	116
	00:07:41,580 --> 00:07:44,880
	So this is actually what we have built so far.

	117
	00:07:44,880 --> 00:07:50,090
	So we have an input image here 20 by 20 and by 1 1 because it's greyscale.

	118
	00:07:50,190 --> 00:07:54,440
	If it was a color image R.G. image it would be tree depth here.

	119
	00:07:54,780 --> 00:08:02,250
	So as you saw before we have 22 filters here connected to another conflict with 64 filters here.

	120
	00:08:02,490 --> 00:08:08,850
	And you may have noticed the size of the image shrink here or Schrank here it became 26 by 26 and in

	121
	00:08:08,850 --> 00:08:10,490
	24 by 24.

	122
	00:08:10,590 --> 00:08:12,770
	And that's because we didn't use any zero padding.

	123
	00:08:12,840 --> 00:08:15,980
	And I'll show you guys later on in court how to do your reporting.

	124
	00:08:15,990 --> 00:08:16,910
	It's quite simple.

	125
	00:08:17,100 --> 00:08:22,450
	But for now just remember when you don't use your reporting or input image or convolutional feature

	126
	00:08:22,450 --> 00:08:25,530
	map size reduces from the input image size.

	127
	00:08:25,530 --> 00:08:28,950
	So we have these two Mattew currently stacked here.

	128
	00:08:28,950 --> 00:08:33,000
	Then we have OMX pooling which basically shrinks this by half.

	129
	00:08:33,180 --> 00:08:34,360
	I have a Drapeau thing here.

	130
	00:08:34,500 --> 00:08:39,330
	But we didn't actually use dropout before in the code but in the actual code when we started a project

	131
	00:08:39,330 --> 00:08:44,720
	I'll show you quickly how to actually implement dropout in one line super easy.

	132
	00:08:44,820 --> 00:08:48,260
	What I wanted to show you do was we have to flatten a here.

	133
	00:08:48,490 --> 00:08:52,890
	Nada Flannelly if you go back to here we just flatten brackets.

	134
	00:08:52,950 --> 00:08:54,420
	And what does it actually do.

	135
	00:08:54,450 --> 00:09:00,960
	Flatten basically takes this treatise multidimensional matrix 64 by 12 by 12 and basically turns it

	136
	00:09:00,960 --> 00:09:07,430
	into a roll of twelve hundred ninety nine thousand two hundred and sixteen columns.

	137
	00:09:07,470 --> 00:09:11,770
	So what it means here is that we just we just flattened this matrix.

	138
	00:09:11,820 --> 00:09:18,540
	So instead of having told by Jove I-64 imagine you just built an entire long row where it's the first

	139
	00:09:18,540 --> 00:09:24,410
	12 here then second 12 and then just consecutive long basically a long row.

	140
	00:09:24,870 --> 00:09:28,810
	And that becomes does this put box here.

	141
	00:09:29,100 --> 00:09:35,180
	And now this up at Box here is fed into the fully connectedly here with 128 nodes.

	142
	00:09:35,430 --> 00:09:36,440
	Asked about it here.

	143
	00:09:36,690 --> 00:09:37,690
	So we did find it here.

	144
	00:09:37,710 --> 00:09:40,760
	Each node was connected to a real two activation units.

	145
	00:09:41,220 --> 00:09:48,810
	And then finally we connect this to our final Denslow with soft Max activation function and outputs

	146
	00:09:48,810 --> 00:09:54,210
	to 10 nodes 10 nodes because our amnestied a class dataset has 10 classes.

	147
	00:09:54,300 --> 00:09:56,290
	So that's how we get our probabilities here.

	148
	00:09:56,700 --> 00:09:59,130
	So it's not illustrated here but later on you will see it.

	149
	00:09:59,130 --> 00:10:03,140
	So now that we've built our model we are ready to compile this model.

	150
	00:10:03,390 --> 00:10:08,640
	So compiling simply creates an object that stores our model we've just created and we can specify all

	151
	00:10:08,640 --> 00:10:13,350
	loss algorithm optimizer define our performance metric that we want to look at.

	152
	00:10:13,440 --> 00:10:18,090
	And additionally we can specify parameters for the optimizer such as litigates and momentum.

	153
	00:10:18,090 --> 00:10:22,630
	So this is a simple model that compile code here.

	154
	00:10:22,980 --> 00:10:29,610
	We have our categorical cross entropy definers and last type we have optimized SAGD being stochastic

	155
	00:10:29,610 --> 00:10:33,990
	really innocent and we have a metric look at being defined as accuracy.

	156
	00:10:36,370 --> 00:10:39,000
	So how do we fit in our model now.

	157
	00:10:39,010 --> 00:10:44,770
	So following a simple basically what Eski line which is the most established and popular machine learning

	158
	00:10:44,770 --> 00:10:48,540
	library on pite and private through senseful in Paris.

	159
	00:10:48,610 --> 00:10:52,350
	Basically they did as applied model but that would be used to training data.

	160
	00:10:52,600 --> 00:10:56,130
	The training labels extreme and waitron that's what they are.

	161
	00:10:56,140 --> 00:11:02,360
	You'll see them in a code soon specified number of epochs and about say not about size.

	162
	00:11:02,560 --> 00:11:05,720
	This doesn't impact learning that much significantly.

	163
	00:11:05,860 --> 00:11:11,470
	How ever you basically should use a largest bedside size it's possible that your memory allows.

	164
	00:11:11,500 --> 00:11:17,420
	So you can experiment and try it you'll know it if you try it too large a box that size your kernel

	165
	00:11:17,440 --> 00:11:18,250
	will crash.

	166
	00:11:18,250 --> 00:11:22,640
	So generally I tend to not avoid having Why couldn't I put it in a book Crash.

	167
	00:11:22,750 --> 00:11:29,030
	So all is basically about size of 32 for pretty much smaller images or 16 for larger images.

	168
	00:11:31,450 --> 00:11:35,250
	And once we do that we can now evaluates and generate predictions afterward.

	169
	00:11:35,290 --> 00:11:41,830
	So by running a model that evaluates and feeding it X test and whitest labels with a bat size we can

	170
	00:11:41,830 --> 00:11:47,140
	get these metrics the metrics parameters metrics object and then we can use that to actually look at

	171
	00:11:47,140 --> 00:11:53,580
	different graphs and if an interesting performance information from our model and if we wanted to ever

	172
	00:11:53,590 --> 00:11:59,410
	predict an individual point like we have an image and you want to get the actual class it belongs to

	173
	00:12:00,040 --> 00:12:04,070
	you can use model to predict model to predict allows us to feed one image at a time.

	174
	00:12:04,130 --> 00:12:07,070
	We can feed the entire dataset here as well.

	175
	00:12:07,840 --> 00:12:09,660
	So that's it's a let's get starting.

	176
	00:12:09,690 --> 00:12:13,030
	So let's build our own handwritten digit classify.