This is the multi-page printable view of this section. Click here to print.
Course
- 1: CS231n Resources
- 2: Standford CS231n 2017 Summary
- 3: MK Internet of Things
- 4: MK Pengolahan Sinyal Digital
- 5: MK Sistem Kendali Cerdas
- 6: MK Sistem Kendali
- 7: Rangkaian Listrik
- 8: Embedded Systems
- 9: Machine Learning Andrew Ng Quizzes
- 10: Statistic and Probability
- 11: MK Machine Learning
- 12: Electronics
- 13: Free Online Course
- 14: Machine Learning by Andrew Ng Resources
- 15: Instrumentation
- 16: Digital Signal Processing
- 17: MK Sistem Kendali Lanjut
- 18: Machine Learning CS299
- 19: MK Dasar Teknik Elektro
- 20: Kuliah
- 21: Linear Algebra
- 22: MK Matematika Teknik
- 23: Control Design with Frequency Method
- 24: Control Systems
- 25: Fundamentals of Electrical Engineering
- 26: Course
- 27: Computer Science
- 28: Control Systems Resources
- 29: Electronic Resources
- 30: Math Resources
1 - CS231n Resources
CS231n Resources
- CS231n Main Site
- CS231n Github Page
- CS231n GIthub Source
- CS231n Schedule
- CS231n Slides
- CS231n YouTube
- CS231n Twitter
- CS231n Korean
How To
Student Notes
- albertpumarola/deep-learning-notes: My CS231n lecture notes
- hnarayanan/CS231n: Working through CS231n: Convolutional Neural Networks for Visual Recognition
- mbadry1/CS231n-2017-Summary
- visionNoob/CS231N_17_KOR_SUB: CS231N 2017 video subtitles translation project for Korean Computer Science students
- Yorko/stanford_cs231n_2019: Solutions and comments to assignments for 2019 Stanford’s course on convolutional neural networks
- maxis42/CS231n: Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition
- LoserSun/cs231n-study-schedule: cs231n learning notes
- zhuole1025/cs231n: notes & assignments for cs231n 2020
- khanhnamle1994/computer-vision: Programming Assignments and Lectures for Stanford’s CS 231: Convolutional Neural Networks for Visual Recognition
Student Forums
CS231 Assignment
- dengfy/cs231n: Convolutional Neural Networks for Visual Recognition in Stanford
- lightaime/cs231n: cs231n assignments sovled by https://ghli.org
- Burton2000/CS231n-2017: Completed the CS231n 2017 spring assignments from Stanford university
Similar with CS231n
- CS294-129 Designing, Visualizing and Understanding Deep Neural Networks, University of California Berkeley, license Public Domain : pptx
- EECS 498-007 / 598-005: Deep Learning for Computer Vision
- CS 6501 Deep Learning for Visual Recognition : pptx
- CS 6501 - 009 Computational Visual Recognition
- CSE 576 Computer Vision Spring 2020 : pptx
- EECS 442: Computer Vision : pptx
- ECE 6504 Deep Learning for Perception, license free : pptx
- Machine Learning
- CSCI 1430: Introduction to Computer Vision license free : pptx
- Deep Learning in Data Science
- Mathematical Statistics
- Deep learning CS342 UT Austin, license CC-BY-SA;
- CS 498 DL University Illinois
- CAP 5636 - Advanced Artificial Intelligence
- Stanford University: Tensorflow for Deep Learning Research
- ECE 5973-961/983: Artificial Neural Networks and Applications
- ECE 6504 Deep Learning for Perception ppt license: CC-BY
- Intro to Machine Learning - ECE, Virginia Tech - Spring 2015: ECE 5984 ppt license: CC-BY
- Introduction to Deep Learning @CUHK
- CS 7643 Deep Learning ppt license: CC-BY
- Yaoliang Yu ppt
- CS 165 ppt
EECS498 Student Notes
CS294-129 Student Notes
- kavimaluskam/cs294-129-exercise
- arjasethan1/cs294-129: CS294-129 Designing, Visualizing and Understanding Deep Neural Networks
CS498 DL Student Notes
Related Books
2 - Standford CS231n 2017 Summary
Standford CS231n 2017 Summary
Table of contents
- Standford CS231n 2017 Summary
- Table of contents
- Course Info
- 01. Introduction to CNN for visual recognition
- 02. Image classification
- 03. Loss function and optimization
- 04. Introduction to Neural network
- 05. Convolutional neural networks (CNNs)
- 06. Training neural networks I
- 07. Training neural networks II
- 08. Deep learning software
- 09. CNN architectures
- 10. Recurrent Neural networks
- 11. Detection and Segmentation
- 12. Visualizing and Understanding
- 13. Generative models
- 14. Deep reinforcement learning
- 15. Efficient Methods and Hardware for Deep Learning
- 16. Adversarial Examples and Adversarial Training
Course Info
- Website: http://cs231n.stanford.edu/
- Lectures link: https://www.youtube.com/playlist?list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk
- Full syllabus link: http://cs231n.stanford.edu/syllabus.html
- Assignments solutions: https://github.com/Burton2000/CS231n-2017
- Number of lectures: 16
- Course description:
-
Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.
-
01. Introduction to CNN for visual recognition
- A brief history of Computer vision starting from the late 1960s to 2017.
- Computer vision problems includes image classification, object localization, object detection, and scene understanding.
- Imagenet is one of the biggest datasets in image classification available right now.
- Starting 2012 in the Imagenet competition, CNN (Convolutional neural networks) is always winning.
- CNN actually has been invented in 1997 by Yann Lecun.
02. Image classification
-
Image classification problem has a lot of challenges like illumination and viewpoints.
-
An image classification algorithm can be solved with K nearest neighborhood (KNN) but it can poorly solve the problem. The properties of KNN are:
- Hyperparameters of KNN are: k and the distance measure
- K is the number of neighbors we are comparing to.
- Distance measures include:
- L2 distance (Euclidean distance)
- Best for non coordinate points
- L1 distance (Manhattan distance)
- Best for coordinate points
- L2 distance (Euclidean distance)
-
Hyperparameters can be optimized using Cross-validation as following (In our case we are trying tp predict K):
-
Split your dataset into
f
folds. -
Given predicted hyperparameters:
- Train your algorithm with f-1 folds and test it with the remain flood. and repeat this with every fold.
-
Choose the hyperparameters that gives the best training values (Average over all folds)
-
-
Linear SVM classifier is an option for solving the image classification problem, but the curse of dimensions makes it stop improving at some point.
-
Logistic regression is a also a solution for image classification problem, but image classification problem is non linear!
-
Linear classifiers has to run the following equation:
Y = wX + b
- shape of
w
is the same asx
and shape ofb
is 1.
- shape of
-
We can add 1 to X vector and remove the bias so that:
Y = wX
- shape of
x
isoldX+1
andw
is the same asx
- shape of
-
We need to know how can we get
w
’s andb
’s that makes the classifier runs at best.
03. Loss function and optimization
-
In the last section we talked about linear classifier but we didn’t discussed how we could train the parameters of that model to get best
w
’s andb
’s. -
We need a loss function to measure how good or bad our current parameters.
-
Loss = L[i] =(f(X[i],W),Y[i]) Loss_for_all = 1/N * Sum(Li(f(X[i],W),Y[i])) # Indicates the average
-
Then we find a way to minimize the loss function given some parameters. This is called optimization.
-
Loss function for a linear SVM classifier:
L[i] = Sum where all classes except the predicted class (max(0, s[j] - s[y[i]] + 1))
- We call this the hinge loss.
- Loss function means we are happy if the best prediction are the same as the true value other wise we give an error with 1 margin.
- Example:
- Given this example we want to compute the loss of this image.
L = max (0, 437.9 - (-96.8) + 1) + max(0, 61.95 - (-96.8) + 1) = max(0, 535.7) + max(0, 159.75) = 695.45
- Final loss is 695.45 which is big and reflects that the cat score needs to be the best over all classes as its the lowest value now. We need to minimize that loss.
- Its OK for the margin to be 1. But its a hyperparameter too.
-
If your loss function gives you zero, are this value is the same value for your parameter? No there are a lot of parameters that can give you best score.
-
You’ll sometimes hear about people instead using the squared hinge loss SVM (or L2-SVM). that penalizes violated margins more strongly (quadratically instead of linearly). The unsquared version is more standard, but in some datasets the squared hinge loss can work better.
-
We add regularization for the loss function so that the discovered model don’t overfit the data.
-
Loss = L = 1/N * Sum(Li(f(X[i],W),Y[i])) + lambda * R(W)
-
Where
R
is the regularizer, andlambda
is the regularization term. -
There are different regularizations techniques:
-
Regularizer Equation Comments L2 R(W) = Sum(W^2)
Sum all the W squared L1 R(W) = Sum(lWl)
Sum of all Ws with abs Elastic net (L1 + L2) R(W) = beta * Sum(W^2) + Sum(lWl)
Dropout No Equation -
Regularization prefers smaller
W
s over bigW
s. -
Regularizations is called weight decay. biases should not included in regularization.
-
Softmax loss (Like linear regression but works for more than 2 classes):
-
Softmax function:
-
* ```python
A[L] = e^(score[L]) / sum(e^(score[L]), NoOfClasses)
```
*
-
Sum of the vector should be 1.
-
Softmax loss:
* ```python
Loss = -logP(Y = y[i]|X = x[i])
```
*
* Log of the probability of the good class. We want it to be near 1 thats why we added a minus.
* Softmax loss is called cross-entropy loss.
-
Consider this numerical problem when you are computing Softmax:
* ```python
f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup
# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer
```
*
- Optimization:
- How we can optimize loss functions we discussed?
- Strategy one:
- Get a random parameters and try all of them on the loss and get the best loss. But its a bad idea.
- Strategy two:
-
Follow the slope.
- Image source.
-
Our goal is to compute the gradient of each parameter we have.
- Numerical gradient: Approximate, slow, easy to write. (But its useful in debugging.)
- Analytic gradient: Exact, Fast, Error-prone. (Always used in practice)
-
After we compute the gradient of our parameters, we compute the gradient descent:
-
* ```python
W = W - learning_rate * W_grad
```
*
* learning_rate is so important hyper parameter you should get the best value of it first of all the hyperparameters.
* stochastic gradient descent:
* Instead of using all the date, use a mini batch of examples (32/64/128 are commonly used) for faster results.
04. Introduction to Neural network
- Computing the analytic gradient for arbitrary complex functions:
- What is a Computational graphs?
- Used to represent any function. with nodes.
- Using Computational graphs can easy lead us to use a technique that called back-propagation. Even with complex models like CNN and RNN.
- Back-propagation simple example:
-
Suppose we have
f(x,y,z) = (x+y)z
-
Then graph can be represented this way:
-
- What is a Computational graphs?
* ```
X
\
(+)--> q ---(*)--> f
/ /
Y /
/
/
Z---------/
```
*
* We made an intermediate variable `q` to hold the values of `x+y`
* Then we have:
*
* ```python
q = (x+y) # dq/dx = 1 , dq/dy = 1
f = qz # df/dq = z , df/dz = q
```
*
* Then:
*
* ```python
df/dq = z
df/dz = q
df/dx = df/dq * dq/dx = z * 1 = z # Chain rule
df/dy = df/dq * dq/dy = z * 1 = z # Chain rule
```
*
-
So in the Computational graphs, we call each operation
f
. For eachf
we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule. -
In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.
-
A bigger example:
- Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.
-
Modularized implementation: forward/ backward API (example multiply code):
* ```python
class MultuplyGate(object):
"""
x,y are scalars
"""
def forward(x,y):
z = x*y
self.x = x # Cache
self.y = y # Cache
# We cache x and y because we know that the derivatives contains them.
return z
def backward(dz):
dx = self.y * dz #self.y is dx
dy = self.x * dz
return [dx, dy]
```
*
- If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:
- Multiplication
- Max
- Plus
- Minus
- Sigmoid
- Convolution
- So to define neural network as a function:
- (Before) Linear score function:
f = Wx
- (Now) 2-layer neural network:
f = W2*max(0,W1*x)
- Where max is the RELU non linear function
- (Now) 3-layer neural network:
f = W3*max(0,W2*max(0,W1*x)
- And so on..
- (Before) Linear score function:
- Neural networks is a stack of some simple operation that forms complex operations.
05. Convolutional neural networks (CNNs)
- Neural networks history:
- First perceptron machine was developed by Frank Rosenblatt in 1957. It was used to recognize letters of the alphabet. Back propagation wasn’t developed yet.
- Multilayer perceptron was developed in 1960 by Adaline/Madaline. Back propagation wasn’t developed yet.
- Back propagation was developed in 1986 by Rumeelhart.
- There was a period which nothing new was happening with NN. Cause of the limited computing resources and data.
- In 2006 Hinton released a paper that shows that we can train a deep neural network using Restricted Boltzmann machines to initialize the weights then back propagation.
- The first strong results was in 2012 by Hinton in speech recognition. And the Alexnet “Convolutional neural networks” that wins the image net in 2012 also by Hinton’s team.
- After that NN is widely used in various applications.
- Convolutional neural networks history:
- Hubel & Wisel in 1959 to 1968 experiments on cats cortex found that there are a topographical mapping in the cortex and that the neurons has hireical organization from simple to complex.
- In 1998, Yann Lecun gives the paper Gradient-based learning applied to document recognition that introduced the Convolutional neural networks. It was good for recognizing zip letters but couldn’t run on a more complex examples.
- In 2012 AlexNet used the same Yan Lecun architecture and won the image net challenge. The difference from 1998 that now we have a large data sets that can be used also the power of the GPUs solved a lot of performance problems.
- Starting from 2012 there are CNN that are used for various tasks (Here are some applications):
- Image classification.
- Image retrieval.
- Extracting features using a NN and then do a similarity matching.
- Object detection.
- Segmentation.
- Each pixel in an image takes a label.
- Face recognition.
- Pose recognition.
- Medical images.
- Playing Atari games with reinforcement learning.
- Galaxies classification.
- Street signs recognition.
- Image captioning.
- Deep dream.
- ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture.
- There are a few distinct types of Layers in ConvNet (e.g. CONV/FC/RELU/POOL are by far the most popular)
- Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
- Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
- How Convolutional neural networks works?
- A fully connected layer is a layer in which all the neurons is connected. Sometimes we call it a dense layer.
- If input shape is
(X, M)
the weighs shape for this will be(NoOfHiddenNeurons, X)
- If input shape is
- Convolution layer is a layer in which we will keep the structure of the input by a filter that goes through all the image.
- We do this with dot product:
W.T*X + b
. This equation uses the broadcasting technique. - So we need to get the values of
W
andb
- We usually deal with the filter (
W
) as a vector not a matrix.
- We do this with dot product:
- We call output of the convolution activation map. We need to have multiple activation map.
- Example if we have 6 filters, here are the shapes:
- Input image
(32,32,3)
- filter size
(5,5,3)
- We apply 6 filters. The depth must be three because the input map has depth of three.
- Output of Conv.
(28,28,6)
- if one filter it will be
(28,28,1)
- if one filter it will be
- After RELU
(28,28,6)
- Another filter
(5,5,6)
- Output of Conv.
(24,24,10)
- Input image
- Example if we have 6 filters, here are the shapes:
- It turns out that convNets learns in the first layers the low features and then the mid-level features and then the high level features.
- After the Convnets we can have a linear classifier for a classification task.
- In Convolutional neural networks usually we have some (Conv ==> Relu)s and then we apply a pool operation to downsample the size of the activation.
- A fully connected layer is a layer in which all the neurons is connected. Sometimes we call it a dense layer.
- What is stride when we are doing convolution:
- While doing a conv layer we have many choices to make regarding the stride of which we will take. I will explain this by examples.
- Stride is skipping while sliding. By default its 1.
- Given a matrix with shape of
(7,7)
and a filter with shape(3,3)
:- If stride is
1
then the output shape will be(5,5)
# 2 are dropped
- If stride is
2
then the output shape will be(3,3)
# 4 are dropped
- If stride is
3
it doesn’t work.
- If stride is
- A general formula would be
((N-F)/stride +1)
- If stride is
1
thenO = ((7-3)/1)+1 = 4 + 1 = 5
- If stride is
2
thenO = ((7-3)/2)+1 = 2 + 1 = 3
- If stride is
3
thenO = ((7-3)/3)+1 = 1.33 + 1 = 2.33
# doesn't work
- If stride is
- In practice its common to zero pad the border.
# Padding from both sides.
- Give a stride of
1
its common to pad to this equation:(F-1)/2
where F is the filter size- Example
F = 3
==> Zero pad with1
- Example
F = 5
==> Zero pad with2
- Example
- If we pad this way we call this same convolution.
- Adding zeros gives another features to the edges thats why there are different padding techniques like padding the corners not zeros but in practice zeros works!
- We do this to maintain our full size of the input. If we didn’t do that the input will be shrinking too fast and we will lose a lot of data.
- Give a stride of
- Example:
- If we have input of shape
(32,32,3)
and ten filters with shape is(5,5)
with stride1
and pad2
- Output size will be
(32,32,10)
# We maintain the size.
- Output size will be
- Size of parameters per filter
= 5*5*3 + 1 = 76
- All parameters
= 76 * 10 = 76
- If we have input of shape
- Number of filters is usually common to be to the power of 2.
# To vectorize well.
- So here are the parameters for the Conv layer:
- Number of filters K.
- Usually a power of 2.
- Spatial content size F.
- 3,5,7 ….
- The stride S.
- Usually 1 or 2 (If the stride is big there will be a downsampling but different of pooling)
- Amount of Padding
- If we want the input shape to be as the output shape, based on the F if 3 its 1, if F is 5 the 2 and so on.
- Number of filters K.
- Pooling makes the representation smaller and more manageable.
- Pooling Operates over each activation map independently.
- Example of pooling is the maxpooling.
- Parameters of max pooling is the size of the filter and the stride"
- Example
2x2
with stride2
# Usually the two parameters are the same 2 , 2
- Example
- Parameters of max pooling is the size of the filter and the stride"
- Also example of pooling is average pooling.
- In this case it might be learnable.
06. Training neural networks I
-
As a revision here are the Mini batch stochastic gradient descent algorithm steps:
-
Loop:
- Sample a batch of data.
- Forward prop it through the graph (network) and get loss.
- Backprop to calculate the gradients.
- Update the parameters using the gradients.
-
-
Activation functions:
-
Different choices for activation function includes Sigmoid, tanh, RELU, Leaky RELU, Maxout, and ELU.
-
Sigmoid:
- Squashes the numbers between [0,1]
- Used as a firing rate like human brains.
Sigmoid(x) = 1 / (1 + e^-x)
- Problems with sigmoid:
- big values neurons kill the gradients.
- Gradients are in most cases near 0 (Big values/small values), that kills the updates if the graph/network are large.
- Not Zero-centered.
- Didn’t produce zero-mean data.
exp()
is a bit compute expensive.- just to mention. We have a more complex operations in deep learning like convolution.
- big values neurons kill the gradients.
-
Tanh:
- Squashes the numbers between [-1,1]
- Zero centered.
- Still big values neurons “kill” the gradients.
Tanh(x)
is the equation.- Proposed by Yann Lecun in 1991.
-
RELU (Rectified linear unit):
RELU(x) = max(0,x)
- Doesn’t kill the gradients.
- Only small values that are killed. Killed the gradient in the half
- Computationally efficient.
- Converges much faster than Sigmoid and Tanh
(6x)
- More biologically plausible than sigmoid.
- Proposed by Alex Krizhevsky in 2012 Toronto university. (AlexNet)
- Problems:
- Not zero centered.
- If weights aren’t initialized good, maybe 75% of the neurons will be dead and thats a waste computation. But its still works. This is an active area of research to optimize this.
- To solve the issue mentioned above, people might initialize all the biases by 0.01
-
Leaky RELU:
leaky_RELU(x) = max(0.01x,x)
- Doesn’t kill the gradients from both sides.
- Computationally efficient.
- Converges much faster than Sigmoid and Tanh (6x)
- Will not die.
- PRELU is placing the 0.01 by a variable alpha which is learned as a parameter.
-
Exponential linear units (ELU):
-
* ```
ELU(x) = { x if x > 0
alpah *(exp(x) -1) if x <= 0
# alpah are a learning parameter
}
```
*
* It has all the benefits of RELU
* Closer to zero mean outputs and adds some robustness to noise.
* problems
* `exp()` is a bit compute expensive.
- Maxout activations:
maxout(x) = max(w1.T*x + b1, w2.T*x + b2)
- Generalizes RELU and Leaky RELU
- Doesn’t die!
- Problems:
- oubles the number of parameters per neuron
- In practice:
- Use RELU. Be careful for your learning rates.
- Try out Leaky RELU/Maxout/ELU
- Try out tanh but don’t expect much.
- Don’t use sigmoid!
- Data preprocessing:
-
Normalize the data:
-
-
# Zero centered data. (Calculate the mean for every input). # On of the reasons we do this is because we need data to be between positive and negative and not all the be negative or positive. X -= np.mean(X, axis = 1) # Then apply the standard deviation. Hint: in images we don't do this. X /= np.std(X, axis = 1)
-
To normalize images:
- Subtract the mean image (E.g. Alexnet)
- Mean image shape is the same as the input images.
- Or Subtract per-channel mean
- Means calculate the mean for each channel of all images. Shape is 3 (3 channels)
- Subtract the mean image (E.g. Alexnet)
-
Weight initialization:
-
What happened when initialize all Ws with zeros?
- All the neurons will do exactly the same thing. They will have the same gradient and they will have the same update.
- So if W’s of a specific layer is equal the thing described happened
-
First idea is to initialize the w’s with small random numbers:
-
* ```python
W = 0.01 * np.random.rand(D, H)
# Works OK for small networks but it makes problems with deeper networks!
```
*
* The standard deviations is going to zero in deeper networks. and the gradient will vanish sooner in deep networks.
*
* ```python
W = 1 * np.random.rand(D, H)
# Works OK for small networks but it makes problems with deeper networks!
```
*
* The network will explode with big numbers!
-
Xavier initialization:
* ```python
W = np.random.rand(in, out) / np.sqrt(in)
```
*
* It works because we want the variance of the input to be as the variance of the output.
* But it has an issue, It breaks when you are using RELU.
-
He initialization (Solution for the RELU issue):
* ```python
W = np.random.rand(in, out) / np.sqrt(in/2)
```
*
* Solves the issue with RELU. Its recommended when you are using RELU
-
Proper initialization is an active area of research.
-
Batch normalization:
-
is a technique to provide any layer in a Neural Network with inputs that are zero mean/unit variance.
-
It speeds up the training. You want to do this a lot.
- Made by Sergey Ioffe and Christian Szegedy at 2015.
-
We make a Gaussian activations in each layer. by calculating the mean and the variance.
-
Usually inserted after (fully connected or Convolutional layers) and (before nonlinearity).
-
Steps (For each output of a layer)
5. First we compute the mean and variance^2 of the batch for each feature.
6. We normalize by subtracting the mean and dividing by square root of (variance^2 + epsilon)- epsilon to not divide by zero
-
Then we make a scale and shift variables:
Result = gamma * normalizedX + beta
- gamma and beta are learnable parameters.
- it basically possible to say “Hey!! I don’t want zero mean/unit variance input, give me back the raw input - it’s better for me.”
- Hey shift and scale by what you want not just the mean and variance!
-
The algorithm makes each layer flexible (It chooses which distribution it wants)
-
We initialize the BatchNorm Parameters to transform the input to zero mean/unit variance distributions but during training they can learn that any other distribution might be better.
-
During the running of the training we need to calculate the globalMean and globalVariance for each layer by using weighted average.
-
Benefits of Batch Normalization:
- Networks train faster.
- Allows higher learning rates.
- helps reduce the sensitivity to the initial starting weights.
- Makes more activation functions viable.
- Provides some regularization.
- Because we are calculating mean and variance for each batch that gives a slight regularization effect.
-
In conv layers, we will have one variance and one mean per activation map.
-
Batch normalization have worked best for CONV and regular deep NN, But for recurrent NN and reinforcement learning its still an active research area.
- Its challengey in reinforcement learning because the batch is small.
-
-
Baby sitting the learning process
-
Preprocessing of data.
-
Choose the architecture.
-
Make a forward pass and check the loss (Disable regularization). Check if the loss is reasonable.
-
Add regularization, the loss should go up!
-
Disable the regularization again and take a small number of data and try to train the loss and reach zero loss.
- You should overfit perfectly for small datasets.
-
Take your full training data, and small regularization then try some value of learning rate.
- If loss is barely changing, then the learning rate is small.
- If you got
NAN
then your NN exploded and your learning rate is high. - Get your learning rate range by trying the min value (That can change) and the max value that doesn’t explode the network.
-
Do Hyperparameters optimization to get the best hyperparameters values.
-
-
Hyperparameter Optimization
- Try Cross validation strategy.
- Run with a few ephocs, and try to optimize the ranges.
- Its best to optimize in log space.
- Adjust your ranges and try again.
- Its better to try random search instead of grid searches (In log space)
- Try Cross validation strategy.
07. Training neural networks II
- Optimization algorithms:
- Problems with stochastic gradient descent:
- if loss quickly in one direction and slowly in another (For only two variables), you will get very slow progress along shallow dimension, jitter along steep direction. Our NN will have a lot of parameters then the problem will be more.
- Local minimum or saddle points
- If SGD went into local minimum we will stuck at this point because the gradient is zero.
- Also in saddle points the gradient will be zero so we will stuck.
- Saddle points says that at some point:
- Some gradients will get the loss up.
- Some gradients will get the loss down.
- And that happens more in high dimensional (100 million dimension for example)
- The problem of deep NN is more about saddle points than about local minimum because deep NN has high dimensions (Parameters)
- Mini batches are noisy because the gradient is not taken for the whole batch.
- SGD + momentum:
-
Build up velocity as a running mean of gradients:
-
- Problems with stochastic gradient descent:
* ```python
# Computing weighted average. rho best is in range [0.9 - 0.99]
V[t+1] = rho * v[t] + dx
x[t+1] = x[t] - learningRate * V[t+1]
```
*
* `V[0]` is zero.
* Solves the saddle point and local minimum problems.
* It overshoots the problem and returns to it back.
-
Nestrov momentum:
* ```python
dx = compute_gradient(x)
old_v = v
v = rho * v - learning_rate * dx
x+= -rho * old_v + (1+rho) * v
```
*
* Doesn't overshoot the problem but slower than SGD + momentum
-
AdaGrad
* ```python
grad_squared = 0
while(True):
dx = compute_gradient(x)
# here is a problem, the grad_squared isn't decayed (gets so large)
grad_squared += dx * dx
x -= (learning_rate*dx) / (np.sqrt(grad_squared) + 1e-7)
```
*
-
RMSProp
* ```python
grad_squared = 0
while(True):
dx = compute_gradient(x)
#Solved ADAgra
grad_squared = decay_rate * grad_squared + (1-grad_squared) * dx * dx
x -= (learning_rate*dx) / (np.sqrt(grad_squared) + 1e-7)
```
*
* People uses this instead of AdaGrad
- Adam
- Calculates the momentum and RMSProp as the gradients.
- It need a Fixing bias to fix starts of gradients.
- Is the best technique so far runs best on a lot of problems.
- With
beta1 = 0.9
andbeta2 = 0.999
andlearning_rate = 1e-3
or5e-4
is a great starting point for many models!
- Learning decay
- Ex. decay learning rate by half every few epochs.
- To help the learning rate not to bounce out.
- Learning decay is common with SGD+momentum but not common with Adam.
- Dont use learning decay from the start at choosing your hyperparameters. Try first and check if you need decay or not.
- All the above algorithms we have discussed is a first order optimization.
- Second order optimization
- Use gradient and Hessian to from quadratic approximation.
- Step to the minima of the approximation.
- What is nice about this update?
- It doesn’t has a learning rate in some of the versions.
- But its unpractical for deep learning
- Has O(N^2) elements.
- Inverting takes O(N^3).
- L-BFGS is a version of second order optimization
- Works with batch optimization but not with mini-batches.
- In practice first use ADAM and if it didn’t work try L-BFGS.
- Some says all the famous deep architectures uses SGS + Nestrov momentum
- Regularization
- So far we have talked about reducing the training error, but we care about most is how our model will handle unseen data!
- What if the gab of the error between training data and validation data are too large?
- This error is called high variance.
- Model Ensembles:
- Algorithm:
- Train multiple independent models of the same architecture with different initializations.
- At test time average their results.
- It can get you extra 2% performance.
- It reduces the generalization error.
- You can use some snapshots of your NN at the training ensembles them and take the results.
- Algorithm:
- Regularization solves the high variance problem. We have talked about L1, L2 Regularization.
- Some Regularization techniques are designed for only NN and can do better.
- Drop out:
- In each forward pass, randomly set some of the neurons to zero. Probability of dropping is a hyperparameter that are 0.5 for almost cases.
- So you will chooses some activation and makes them zero.
- It works because:
- It forces the network to have redundant representation; prevent co-adaption of features!
- If you think about this, It ensemble some of the models in the same model!
- At test time we might multiply each dropout layer by the probability of the dropout.
- Sometimes at test time we don’t multiply anything and leave it as it is.
- With drop out it takes more time to train.
- Data augmentation:
- Another technique that makes Regularization.
- Change the data!
- For example flip the image, or rotate it.
- Example in ResNet:
-
Training: Sample random crops and scales:
- Pick random L in range [256,480]
- Resize training image, short side = L
- Sample random 224x244 patch.
-
Testing: average a fixed set of crops
4. Resize image at 5 scales: {224, 256, 384, 480, 640}
5. For each size, use 10 224x224 crops: 4 corners + center + flips -
Apply Color jitter or PCA
-
Translation, rotation, stretching.
-
- Drop connect
- Like drop out idea it makes a regularization.
- Instead of dropping the activation, we randomly zeroing the weights.
- Fractional Max Pooling
- Cool regularization idea. Not commonly used.
- Randomize the regions in which we pool.
- Stochastic depth
- New idea.
- Eliminate layers, instead on neurons.
- Has the similar effect of drop out but its a new idea.
- Transfer learning:
-
Some times your data is overfitted by your model because the data is small not because of regularization.
-
You need a lot of data if you want to train/use CNNs.
-
Steps of transfer learning
- Train on a big dataset that has common features with your dataset. Called pretraining.
- Freeze the layers except the last layer and feed your small dataset to learn only the last layer.
- Not only the last layer maybe trained again, you can fine tune any number of layers you want based on the number of data you have
-
Guide to use transfer learning:
-
* | | Very Similar dataset | very different dataset |
| ----------------------- | ---------------------------------- | ---------------------------------------- |
| **very little dataset** | Use Linear classifier on top layer | You're in trouble.. Try linear classifier from different stages |
| **quite a lot of data** | Finetune a few layers | Finetune a large layers |
*
- Transfer learning is the normal not an exception.
08. Deep learning software
- This section changes a lot every year in CS231n due to rabid changes in the deep learning softwares.
- CPU vs GPU
- GPU The graphics card was developed to render graphics to play games or make 3D media,. etc.
- NVIDIA vs AMD
- Deep learning choose NVIDIA over AMD GPU because NVIDIA is pushing research forward deep learning also makes it architecture more suitable for deep learning.
- NVIDIA vs AMD
- CPU has fewer cores but each core is much faster and much more capable; great at sequential tasks. While GPUs has more cores but each core is much slower “dumber”; great for parallel tasks.
- GPU cores needs to work together. and has its own memory.
- Matrix multiplication is from the operations that are suited for GPUs. It has MxN independent operations that can be done on parallel.
- Convolution operation also can be paralyzed because it has independent operations.
- Programming GPUs frameworks:
- CUDA (NVIDIA only)
- Write c-like code that runs directly on the GPU.
- Its hard to build a good optimized code that runs on GPU. Thats why they provided high level APIs.
- Higher level APIs: cuBLAS, cuDNN, etc
- CuDNN has implemented back prop. , convolution, recurrent and a lot more for you!
- In practice you won’t write a parallel code. You will use the code implemented and optimized by others!
- OpenCl
- Similar to CUDA, but runs on any GPU.
- Usually Slower .
- Haven’t much support yet from all deep learning softwares.
- CUDA (NVIDIA only)
- There are a lot of courses for learning parallel programming.
- If you aren’t careful, training can bottleneck on reading dara and transferring to GPU. So the solutions are:
- Read all the data into RAM. # If possible
- Use SSD instead of HDD
- Use multiple CPU threads to prefetch data!
- While the GPU are computing, a CPU thread will fetch the data for you.
- A lot of frameworks implemented that for you because its a little bit painful!
- GPU The graphics card was developed to render graphics to play games or make 3D media,. etc.
- Deep learning Frameworks
- Its super fast moving!
- Currently available frameworks:
- Tensorflow (Google)
- Caffe (UC Berkeley)
- Caffe2 (Facebook)
- Torch (NYU / Facebook)
- PyTorch (Facebook)
- Theano (U monteral)
- Paddle (Baidu)
- CNTK (Microsoft)
- MXNet (Amazon)
- The instructor thinks that you should focus on Tensorflow and PyTorch.
- The point of deep learning frameworks:
- Easily build big computational graphs.
- Easily compute gradients in computational graphs.
- Run it efficiently on GPU (cuDNN - cuBLAS)
- Numpy doesn’t run on GPU.
- Most of the frameworks tries to be like NUMPY in the forward pass and then they compute the gradients for you.
- Tensorflow (Google)
-
Code are two parts:
- Define computational graph.
- Run the graph and reuse it many times.
-
Tensorflow uses a static graph architecture.
-
Tensorflow variables live in the graph. while the placeholders are feed each run.
-
Global initializer function initializes the variables that lives in the graph.
-
Use predefined optimizers and losses.
-
You can make a full layers with layers.dense function.
-
Keras (High level wrapper):
- Keras is a layer on top pf Tensorflow, makes common things easy to do.
- So popular!
- Trains a full deep NN in a few lines of codes.
-
There are a lot high level wrappers:
- Keras
- TFLearn
- TensorLayer
- tf.layers
#Ships with tensorflow
- tf-Slim
#Ships with tensorflow
- tf.contrib.learn
#Ships with tensorflow
- Sonnet
# New from deep mind
-
Tensorflow has pretrained models that you can use while you are using transfer learning.
-
Tensorboard adds logging to record loss, stats. Run server and get pretty graphs!
-
It has distributed code if you want to split your graph on some nodes.
-
Tensorflow is actually inspired from Theano. It has the same inspirations and structure.
-
- PyTorch (Facebook)
- Has three layers of abstraction:
- Tensor:
ndarray
but runs on GPU#Like numpy arrays in tensorflow
- Variable: Node in a computational graphs; stores data and gradient
#Like Tensor, Variable, Placeholders
- Variable: Node in a computational graphs; stores data and gradient
- Module: A NN layer; may store state or learnable weights
#Like tf.layers in tensorflow
- Tensor:
- In PyTorch the graphs runs in the same loop you are executing which makes it easier for debugging. This is called a dynamic graph.
- In PyTorch you can define your own autograd functions by writing forward and backward for tensors. Most of the times it will implemented for you.
- Torch.nn is a high level api like keras in tensorflow. You can create the models and go on and on.
- You can define your own nn module!
- Also Pytorch contains optimizers like tensorflow.
- It contains a data loader that wraps a Dataset and provides minbatches, shuffling and multithreading.
- PyTorch contains the best and super easy to use pretrained models
- PyTorch contains Visdom that are like tensorboard. but Tensorboard seems to be more powerful.
- PyTorch is new and still evolving compared to Torch. Its still in beta state.
- PyTorch is best for research.
- Has three layers of abstraction:
- Tensorflow builds the graph once, then run them many times (Called static graph)
- In each PyTorch iteration we build a new graph (Called dynamic graph)
- Static vs dynamic graphs:
- Optimization:
- With static graphs, framework can optimize the graph for you before it runs.
- Serialization
- Static: Once graph is built, can serialize it and run it without the code that built the graph. Ex use the graph in c++
- Dynamic: Always need to keep the code around.
- Conditional
- Is easier in dynamic graphs. And more complicated in static graphs.
- Loops:
- Is easier in dynamic graphs. And more complicated in static graphs.
- Optimization:
- Tensorflow fold make dynamic graphs easier in Tensorflow through dynamic batching.
- Dynamic graph applications include: recurrent networks and recursive networks.
- Caffe2 uses static graphs and can train model in python also works on IOS and Android
- Tensorflow/Caffe2 are used a lot in production especially on mobile.
09. CNN architectures
- This section talks about the famous CNN architectures. Focuses on CNN architectures that won ImageNet competition since 2012.
- These architectures includes: AlexNet, VGG, GoogLeNet, and ResNet.
- Also we will discuss some interesting architectures as we go.
- The first ConvNet that was made was LeNet-5 architectures are:by Yann Lecun at 1998.
- Architecture are:
CONV-POOL-CONV-POOL-FC-FC-FC
- Each conv filters was
5x5
applied at stride 1 - Each pool was
2x2
applied at stride2
- It was useful in Digit recognition.
- In particular the insight that image features are distributed across the entire image, and convolutions with learnable parameters are an effective way to extract similar features at multiple location with few parameters.
- It contains exactly 5 layers
- Architecture are:
- In 2010 Dan Claudiu Ciresan and Jurgen Schmidhuber published one of the very fist implementations of GPU Neural nets. This implementation had both forward and backward implemented on a a NVIDIA GTX 280 graphic processor of an up to 9 layers neural network.
- AlexNet (2012):
- ConvNet that started the evolution and wins the ImageNet at 2012.
- Architecture are:
CONV1-MAXPOOL1-NORM1-CONV2-MAXPOOL2-NORM2-CONV3-CONV4-CONV5-MAXPOOL3-FC6-FC7-FC8
- Contains exactly 8 layers the first 5 are Convolutional and the last 3 are fully connected layers.
- AlexNet accuracy error was
16.4%
- For example if the input is 227 x 227 x3 then these are the shapes of the of the outputs at each layer:
- CONV1 (96 11 x 11 filters at stride 4, pad 0)
- Output shape
(55,55,96)
, Number of weights are(11*11*3*96)+96 = 34944
- Output shape
- MAXPOOL1 (3 x 3 filters applied at stride 2)
- Output shape
(27,27,96)
, No Weights
- Output shape
- NORM1
- Output shape
(27,27,96)
, We don’t do this any more
- Output shape
- CONV2 (256 5 x 5 filters at stride 1, pad 2)
- MAXPOOL2 (3 x 3 filters at stride 2)
- NORM2
- CONV3 (384 3 x 3 filters ar stride 1, pad 1)
- CONV4 (384 3 x 3 filters ar stride 1, pad 1)
- CONV5 (256 3 x 3 filters ar stride 1, pad 1)
- MAXPOOL3 (3 x 3 filters at stride 2)
- Output shape
(6,6,256)
- Output shape
- FC6 (4096)
- FC7 (4096)
- FC8 (1000 neurons for class score)
- CONV1 (96 11 x 11 filters at stride 4, pad 0)
- Some other details:
- First use of RELU.
- Norm layers but not used any more.
- heavy data augmentation
- Dropout
0.5
- batch size
128
- SGD momentum
0.9
- Learning rate
1e-2
reduce by 10 at some iterations - 7 CNN ensembles!
- AlexNet was trained on GTX 580 GPU with only 3 GB which wasn’t enough to train in one machine so they have spread the feature maps in half. The first AlexNet was distributed!
- Its still used in transfer learning in a lot of tasks.
- Total number of parameters are
60 million
- ZFNet (2013)
- Won in 2013 with error 11.7%
- It has the same general structure but they changed a little in hyperparameters to get the best output.
- Also contains 8 layers.
- AlexNet but:
CONV1
: change from (11 x 11 stride 4) to (7 x 7 stride 2)CONV3,4,5
: instead of 384, 384, 256 filters use 512, 1024, 512
- OverFeat (2013)
- Won the localization in imageNet in 2013
- We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries.
- VGGNet (2014) (Oxford)
- Deeper network with more layers.
- Contains 19 layers.
- Won on 2014 with GoogleNet with error 7.3%
- Smaller filters with deeper layers.
- The great advantage of VGG was the insight that multiple 3 × 3 convolution in sequence can emulate the effect of larger receptive fields, for examples 5 × 5 and 7 × 7.
- Used the simple 3 x 3 Conv all through the network.
- 3 (3 x 3) filters has the same effect as 7 x 7
- The Architecture contains several CONV layers then POOL layer over 5 times and then the full connected layers.
- It has a total memory of 96MB per image for only forward propagation!
- Most memory are in the earlier layers
- Total number of parameters are 138 million
- Most of the parameters are in the fully connected layers
- Has a similar details in training like AlexNet. Like using momentum and dropout.
- VGG19 are an upgrade for VGG16 that are slightly better but with more memory
- GoogleNet (2014)
- Deeper network with more layers.
- Contains 22 layers.
- It has Efficient Inception module.
- Only 5 million parameters! 12x less than AlexNet
- Won on 2014 with VGGNet with error 6.7%
- Inception module:
- Design a good local network topology (network within a network (NiN)) and then stack these modules on top of each other.
- It consists of:
- Apply parallel filter operations on the input from previous layer
- Multiple convs of sizes (1 x 1, 3 x 3, 5 x 5)
- Adds padding to maintain the sizes.
- Pooling operation. (Max Pooling)
- Adds padding to maintain the sizes.
- Multiple convs of sizes (1 x 1, 3 x 3, 5 x 5)
- Concatenate all filter outputs together depth-wise.
- Apply parallel filter operations on the input from previous layer
- For example:
- Input for inception module is 28 x 28 x 256
- Then the parallel filters applied:
- (1 x 1), 128 filter
# output shape (28,28,128)
- (3 x 3), 192 filter
# output shape (28,28,192)
- (5 x 5), 96 filter
# output shape (28,28,96)
- (3 x 3) Max pooling
# output shape (28,28,256)
- (1 x 1), 128 filter
- After concatenation this will be
(28,28,672)
- By this design -We call Naiveit has a big computation complexity.
- The last example will make:
- [1 x 1 conv, 128] ==> 28 * 28 * 128 * 1 * 1 * 256 = 25 Million approx
- [3 x 3 conv, 192] ==> 28 * 28 * 192 *3 *3 * 256 = 346 Million approx
- [5 x 5 conv, 96] ==> 28 * 28 * 96 * 5 * 5 * 256 = 482 Million approx
- In total around 854 Million operation!
- The last example will make:
- Solution: bottleneck layers that use 1x1 convolutions to reduce feature depth.
- Inspired from NiN (Network in network)
- The bottleneck solution will make a total operations of 358M on this example which is good compared with the naive implementation.
- So GoogleNet stacks this Inception module multiple times to get a full architecture of a network that can solve a problem without the Fully connected layers.
- Just to mention, it uses an average pooling layer at the end before the classification step.
- Full architecture:
- In February 2015 Batch-normalized Inception was introduced as Inception V2. Batch-normalization computes the mean and standard-deviation of all feature maps at the output of a layer, and normalizes their responses with these values.
- In December 2015 they introduced a paper “Rethinking the Inception Architecture for Computer Vision” which explains the older inception models well also introducing a new version V3.
- The first GoogleNet and VGG was before batch normalization invented so they had some hacks to train the NN and converge well.
- ResNet (2015) (Microsoft Research)
- 152-layer model for ImageNet. Winner by 3.57% which is more than human level error.
- This is also the very first time that a network of > hundred, even 1000 layers was trained.
- Swept all classification and detection competitions in ILSVRC’15 and COCO’15!
- What happens when we continue stacking deeper layers on a “plain” Convolutional neural network?
- The deeper model performs worse, but it’s not caused by overfitting!
- The learning stops performs well somehow because deeper NN are harder to optimize!
- The deeper model should be able to perform at least as well as the shallower model.
- A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.
- Residual block:
-
Microsoft came with the Residual block which has this architecture:
-
* ```python
# Instead of us trying To learn a new representation, We learn only Residual
Y = (W2* RELU(W1x+b1) + b2) + X
```
*
* Say you have a network till a depth of N layers. You only want to add a new layer if you get something extra out of adding that layer.
* One way to ensure this new (N+1)th layer learns something new about your network is to also provide the input(x) without any transformation to the output of the (N+1)th layer. This essentially drives the new layer to learn something different from what the input has already encoded.
* The other advantage is such connections help in handling the Vanishing gradient problem in very deep networks.
- With the Residual block we can now have a deep NN of any depth without the fearing that we can’t optimize the network.
- ResNet with a large number of layers started to use a bottleneck layer similar to the Inception bottleneck to reduce the dimensions.
- Full ResNet architecture:
- Stack residual blocks.
- Every residual block has two 3 x 3 conv layers.
- Additional conv layer at the beginning.
- No FC layers at the end (only FC 1000 to output classes)
- Periodically, double number of filters and downsample spatially using stride 2 (/2 in each dimension)
- Training ResNet in practice:
- Batch Normalization after every CONV layer.
- Xavier/2 initialization from He et al.
- SGD + Momentum (
0.9
) - Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size
256
- Weight decay of
1e-5
- No dropout used.
- Stack residual blocks.
- Inception-v4: Resnet + Inception and was founded in 2016.
- The complexity comparing over all the architectures:
- VGG: Highest memory, most operations.
- GoogLeNet: most efficient.
- ResNets Improvements:
- (2016) Identity Mappings in Deep Residual Networks
- From the creators of ResNet.
- Gives better performance.
- (2016) Wide Residual Networks
- Argues that residuals are the important factor, not depth
- 50-layer wide ResNet outperforms 152-layer original ResNet
- Increasing width instead of depth more computationally efficient (parallelizable)
- (2016) Deep Networks with Stochastic Depth
- Motivation: reduce vanishing gradients and training time through short networks during training.
- Randomly drop a subset of layers during each training pass
- Use full deep network at test time.
- (2016) Identity Mappings in Deep Residual Networks
- Beyond ResNets:
- (2017) FractalNet: Ultra-Deep Neural Networks without Residuals
- Argues that key is transitioning effectively from shallow to deep and residual representations are not necessary.
- Trained with dropping out sub-paths
- Full network at test time.
- (2017) Densely Connected Convolutional Networks
- (2017) SqueezeNet: AlexNet-level Accuracy With 50x Fewer Parameters and <0.5Mb Model Size
- Good for production.
- It is a re-hash of many concepts from ResNet and Inception, and show that after all, a better design of architecture will deliver small network sizes and parameters without needing complex compression algorithms.
- (2017) FractalNet: Ultra-Deep Neural Networks without Residuals
- Conclusion:
- ResNet current best default.
- Trend towards extremely deep networks
- In the last couple of years, some models all using the shortcuts like “ResNet” to eaisly flow the gradients.
10. Recurrent Neural networks
- Vanilla Neural Networks “Feed neural networks”, input of fixed size goes through some hidden units and then go to output. We call it a one to one network.
- Recurrent Neural Networks RNN Models:
- One to many
- Example: Image Captioning
- image ==> sequence of words
- Example: Image Captioning
- Many to One
- Example: Sentiment Classification
- sequence of words ==> sentiment
- Example: Sentiment Classification
- Many to many
- Example: Machine Translation
- seq of words in one language ==> seq of words in another language
- Example: Video classification on frame level
- Example: Machine Translation
- One to many
- RNNs can also work for Non-Sequence Data (One to One problems)
- It worked in Digit classification through taking a series of “glimpses”
- “Multiple Object Recognition with Visual Attention”, ICLR 2015.
- It worked on generating images one piece at a time
- i.e generating a captcha
- It worked in Digit classification through taking a series of “glimpses”
- So what is a recurrent neural network?
-
Recurrent core cell that take an input x and that cell has an internal state that are updated each time it reads an input.
-
The RNN block should return a vector.
-
We can process a sequence of vectors x by applying a recurrence formula at every time step:
-
* ```python
h[t] = fw (h[t-1], x[t]) # Where fw is some function with parameters W
```
*
* The same function and the same set of parameters are used at every time step.
-
(Vanilla) Recurrent Neural Network:
* ```
h[t] = tanh (W[h,h]*h[t-1] + W[x,h]*x[t]) # Then we save h[t]
y[t] = W[h,y]*h[t]
```
*
* This is the simplest example of a RNN.
- RNN works on a sequence of related data.
- Recurrent NN Computational graph:
h0
are initialized to zero.- Gradient of
W
is the sum of all theW
gradients that has been calculated! - A many to many graph:
- Also the last is the sum of all losses and the weights of Y is one and is updated through summing all the gradients!
- A many to one graph:
- A one to many graph:
- sequence to sequence graph:
- Encoder and decoder philosophy.
- Examples:
- Suppose we are building words using characters. We want a model to predict the next character of a sequence. Lets say that the characters are only
[h, e, l, o]
and the words are [hello]- Training:
- Only the third prediction here is true. The loss needs to be optimized.
- We can train the network by feeding the whole word(s).
- Testing time:
- At test time we work with a character by character. The output character will be the next input with the other saved hidden activations.
- This link contains all the code but uses Truncated Backpropagation through time as we will discuss.
- Training:
- Suppose we are building words using characters. We want a model to predict the next character of a sequence. Lets say that the characters are only
- Backpropagation through time Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient.
- But if we choose the whole sequence it will be so slow and take so much memory and will never converge!
- So in practice people are doing “Truncated Backpropagation through time” as we go on we Run forward and backward through chunks of the sequence instead of whole sequence
- Then Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps.
- Example on image captioning:
- They use
token to finish running. - The biggest dataset for image captioning is Microsoft COCO.
- They use
- Image Captioning with Attention is a project in which when the RNN is generating captions, it looks at a specific part of the image not the whole image.
- Image Captioning with Attention technique is also used in “Visual Question Answering” problem
- Multilayer RNNs is generally using some layers as the hidden layer that are feed into again. LSTM is a multilayer RNNs.
- Backward flow of gradients in RNN can explode or vanish. Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)
- LSTM stands for Long Short Term Memory. It was designed to help the vanishing gradient problem on RNNs.
- It consists of:
- f: Forget gate, Whether to erase cell
- i: Input gate, whether to write to cell
- g: Gate gate (?), How much to write to cell
- o: Output gate, How much to reveal cell
- The LSTM gradients are easily computed like ResNet
- The LSTM is keeping data on the long or short memory as it trains means it can remember not just the things from last layer but layers.
- It consists of:
- Highway networks is something between ResNet and LSTM that is still in research.
- Better/simpler architectures are a hot topic of current research
- Better understanding (both theoretical and empirical) is needed.
- RNN is used for problems that uses sequences of related inputs more. Like NLP and Speech recognition.
11. Detection and Segmentation
- So far we are talking about image classification problem. In this section we will talk about Segmentation, Localization, Detection.
- Semantic Segmentation
- We want to Label each pixel in the image with a category label.
- As you see the cows in the image, Semantic Segmentation Don’t differentiate instances, only care about pixels.
- The first idea is to use a sliding window. We take a small window size and slide it all over the picture. For each window we want to label the center pixel.
- It will work but its not a good idea because it will be computational expensive!
- Very inefficient! Not reusing shared features between overlapping patches.
- In practice nobody uses this.
- The second idea is designing a network as a bunch of Convolutional layers to make predictions for pixels all at once!
- Input is the whole image. Output is the image with each pixel labeled.
- We need a lot of labeled data. And its very expensive data.
- It needs a deep Conv. layers.
- The loss is cross entropy between each pixel provided.
- Data augmentation are good here.
- The problem with this implementation that convolutions at original image resolution will be very expensive.
- So in practice we don’t see something like this right now.
- The third idea is based on the last idea. The difference is that we are downsampling and upsampling inside the network.
- We downsample because using the whole image as it is very expensive. So we go on multiple layers downsampling and then upsampling in the end.
- Downsampling is an operation like Pooling and strided convolution.
- Upsampling is like “Nearest Neighbor” or “Bed of Nails” or “Max unpooling”
- Nearest Neighbor example:
Input: 1 2 Output: 1 1 2 2
3 4 1 1 2 2
3 3 4 4
3 3 4 4
- **Bed of Nails** example:
Input: 1 2 Output: 1 0 2 0
3 4 0 0 0 0
3 0 4 0
0 0 0 0
* **Max unpooling** is depending on the earlier steps that was made by max pooling. You fill the pixel where max pooling took place and then fill other pixels by zero.
* Max unpooling seems to be the best idea for upsampling.
* There are an idea of Learnable Upsampling called "**Transpose Convolution**"
* Rather than making a convolution we make the reverse.
* Also called:
* Upconvolution.
* Fractionally strided convolution
* Backward strided convolution
* Learn the artimitic of the upsampling please refer to chapter 4 in this [paper](https://arxiv.org/abs/1603.07285).
- Classification + Localization:
- In this problem we want to classify the main object in the image and its location as a rectangle.
- We assume there are one object.
- We will create a multi task NN. The architecture are as following:
- Convolution network layers connected to:
- FC layers that classify the object.
# The plain classification problem we know
- FC layers that connects to a four numbers
(x,y,w,h)
- We treat Localization as a regression problem.
- FC layers that classify the object.
- Convolution network layers connected to:
- This problem will have two losses:
- Softmax loss for classification
- Regression (Linear loss) for the localization (L2 loss)
- Loss = SoftmaxLoss + L2 loss
- Often the first Conv layers are pretrained NNs like AlexNet!
- This technique can be used in so many other problems like: Human Pose Estimation.
- Object Detection
- A core idea of computer vision. We will talk by details in this problem.
- The difference between “Classification + Localization” and this problem is that here we want to detect one or mode different objects and its locations!
- First idea is to use a sliding window
- Worked well and long time.
- The steps are:
- Apply a CNN to many different crops of the image, CNN classifies each crop as object or background.
- The problem is we need to apply CNN to huge number of locations and scales, very computationally expensive!
- The brute force sliding window will make us take thousands of thousands of time.
- Region Proposals will help us deciding which region we should run our NN at:
- Find blobby image regions that are likely to contain objects.
- Relatively fast to run; e.g. Selective Search gives 1000 region proposals in a few seconds on CPU
- So now we can apply one of the Region proposals networks and then apply the first idea.
- There is another idea which is called R-CNN
- The idea is bad because its taking parts of the image -With Region Proposalsif different sizes and feed it to CNN after scaling them all to one size. Scaling is bad
- Also its very slow.
- Fast R-CNN is another idea that developed on R-CNN
- It uses one CNN to do everything.
- Faster R-CNN does its own region proposals by Inserting Region Proposal Network (RPN) to predict proposals from features.
- The fastest of the R-CNNs.
- Another idea is Detection without Proposals: YOLO / SSD
- YOLO stands for you only look once.
- YOLO/SDD is two separate algorithms.
- Faster but not as accurate.
- Takeaways
- Faster R-CNN is slower but more accurate.
- SSD/YOLO is much faster but not as accurate.
- Denese Captioning
- Denese Captioning is “Object Detection + Captioning”
- Paper that covers this idea can be found here.
- Instance Segmentation
- This is like the full problem.
- Rather than we want to predict the bounding box, we want to know which pixel label but also distinguish them.
- There are a lot of ideas.
- There are a new idea “Mask R-CNN”
- Like R-CNN but inside it we apply the Semantic Segmentation
- There are a lot of good results out of this paper.
- It sums all the things that we have discussed in this lecture.
- Performance of this seems good.
12. Visualizing and Understanding
- We want to know what’s going on inside ConvNets?
- People want to trust the black box (CNN) and know how it exactly works and give and good decisions.
- A first approach is to visualize filters of the first layer.
- Maybe the shape of the first layer filter is 5 x 5 x 3, and the number of filters are 16. Then we will have 16 different “colored” filter images.
- It turns out that these filters learns primitive shapes and oriented edges like the human brain does.
- These filters really looks the same on each Conv net you will train, Ex if you tried to get it out of AlexNet, VGG, GoogleNet, or ResNet.
- This will tell you what is the first convolution layer is looking for in the image.
- We can visualize filters from the next layers but they won’t tell us anything.
- Maybe the shape of the first layer filter is 5 x 5 x 20, and the number of filters are 16. Then we will have 16*20 different “gray” filter images.
- In AlexNet, there was some FC layers in the end. If we took the 4096-dimensional feature vector for an image, and collecting these feature vectors.
- If we made a nearest neighbors between these feature vectors and get the real images of these features we will get something very good compared with running the KNN on the images directly!
- This similarity tells us that these CNNs are really getting the semantic meaning of these images instead of on the pixels level!
- We can make a dimensionality reduction on the 4096 dimensional feature and compress it to 2 dimensions.
- This can be made by PCA, or t-SNE.
- t-SNE are used more with deep learning to visualize the data. Example can be found here.
- We can Visualize the activation maps.
- For example if CONV5 feature map is 128 x 13 x 13, We can visualize it as 128 13 x 13 gray-scale images.
- One of these features are activated corresponding to the input, so now we know that this particular map are looking for something.
- Its done by Yosinski et. More info are here.
- There are something called Maximally Activating Patches that can help us visualize the intermediate features in Convnets
- The steps of doing this is as following:
- We choose a layer then a neuron
- Ex. We choose Conv5 in AlexNet which is 128 x 13 x 13 then pick channel (Neuron) 17/128
- Run many images through the network, record values of chosen channel.
- Visualize image patches that correspond to maximal activations.
- We will find that each neuron is looking into a specific part of the image.
- Extracted images are extracted using receptive field.
- We choose a layer then a neuron
- The steps of doing this is as following:
- Another idea is Occlusion Experiments
- We mask part of the image before feeding to CNN, draw heat-map of probability (Output is true) at each mask location
- It will give you the most important parts of the image in which the Conv. Network has learned from.
- Saliency Maps tells which pixels matter for classification
- Like Occlusion Experiments but with a completely different approach
- We Compute gradient of (unnormalized) class score with respect to image pixels, take absolute value and max over RGB channels. It will get us a gray image that represents the most important areas in the image.
- This can be used for Semantic Segmentation sometimes.
- (guided) backprop Makes something like Maximally Activating Patches but unlike it gets the pixels in which we are caring of.
- In this technique choose a channel like Maximally Activating Patches and then compute gradient of neuron value with respect to image pixels
- Images come out nicer if you only backprop positive gradients through each RELU (guided backprop)
- Gradient Ascent
-
Generate a synthetic image that maximally activates a neuron.
-
Reverse of gradient decent. Instead of taking the minimum it takes the maximum.
-
We want to maximize the neuron with the input image. So here instead we are trying to learn the image that maximize the activation:
-
* ```python
# R(I) is Natural image regularizer, f(I) is the neuron value.
I *= argmax(f(I)) + R(I)
```
*
- Steps of gradient ascent
- Initialize image to zeros.
- Forward image to compute current scores.
- Backprop to get gradient of neuron value with respect to image pixels.
- Make a small update to the image
R(I)
may equal to L2 of generated image.- To get a better results we use a better regularizer:
- penalize L2 norm of image; also during optimization periodically:
- Gaussian blur image
- Clip pixels with small values to 0
- Clip pixels with small gradients to 0
- penalize L2 norm of image; also during optimization periodically:
- A better regularizer makes out images cleaner!
- The results in the latter layers seems to mean something more than the other layers.
- We can fool CNN by using this procedure:
- Start from an arbitrary image.
# Random picture based on nothing.
- Pick an arbitrary class.
# Random class
- Modify the image to maximize the class.
- Repeat until network is fooled.
- Start from an arbitrary image.
- Results on fooling the network is pretty surprising!
- For human eyes they are the same, but it fooled the network by adding just some noise!
- DeepDream: Amplify existing features
- Google released deep dream on their website.
- What its actually doing is the same procedure as fooling the NN that we discussed, but rather than synthesizing an image to maximize a specific neuron, instead try to amplify the neuron activations at some layer in the network.
- Steps:
- Forward: compute activations at chosen layer.
# form an input image (Any image)
- Set gradient of chosen layer equal to its activation.
- Equivalent to
I* = arg max[I] sum(f(I)^2)
- Equivalent to
- Backward: Compute gradient on image.
- Update image.
- Forward: compute activations at chosen layer.
- The code of deep dream is online you can download and check it yourself.
- Feature Inversion
- Gives us to know what types of elements parts of the image are captured at different layers in the network.
- Given a CNN feature vector for an image, find a new image that:
- Matches the given feature vector.
- looks natural (image prior regularization)
- Texture Synthesis
- Old problem in computer graphics.
- Given a sample patch of some texture, can we generate a bigger image of the same texture?
- There is an algorithm which doesn’t depend on NN:
- Wei and Levoy, Fast Texture Synthesis using Tree-structured Vector Quantization, SIGGRAPH 2000
- Its a really simple algorithm
- The idea here is that this is an old problem and there are a lot of algorithms that has already solved it but simple algorithms doesn’t work well on complex textures!
- An idea of using NN has been proposed on 2015 based on gradient ascent and called it “Neural Texture Synthesis”
- It depends on something called Gram matrix.
- Neural Style Transfer = Feature + Gram Reconstruction
- Gatys, Ecker, and Bethge, Image style transfer using Convolutional neural networks, CVPR 2016
- Implementation by pytorch here.
- Style transfer requires many forward / backward passes through VGG; very slow!
- Train another neural network to perform style transfer for us!
- Fast Style Transfer is the solution.
- Johnson, Alahi, and Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, ECCV 2016
- https://github.com/jcjohnson/fast-neural-style
- There are a lot of work on these style transfer and it continues till now!
- Summary:
- Activations: Nearest neighbors, Dimensionality reduction, maximal patches, occlusion
- Gradients: Saliency maps, class visualization, fooling images, feature inversion
- Fun: DeepDream, Style Transfer
13. Generative models
-
Generative models are type of Unsupervised learning.
-
Supervised vs Unsupervised Learning:
-
Supervised Learning Unsupervised Learning Data structure Data: (x, y), and x is data, y is label Data: x, Just data, no labels! Data price Training data is expensive in a lot of cases. Training data are cheap! Goal Learn a function to map x -> y Learn some underlying hidden structure of the data Examples Classification, regression, object detection, semantic segmentation, image captioning Clustering, dimensionality reduction, feature learning, density estimation -
Autoencoders are a Feature learning technique.
- It contains an encoder and a decoder. The encoder downsamples the image while the decoder upsamples the features.
- The loss are L2 loss.
-
Density estimation is where we want to learn/estimate the underlaying distribution for the data!
-
There are a lot of research open problems in unsupervised learning compared with supervised learning!
-
Generative Models
- Given training data, generate new samples from same distribution.
- Addresses density estimation, a core problem in unsupervised learning.
- We have different ways to do this:
- Explicit density estimation: explicitly define and solve for the learning model.
- Learn model that can sample from the learning model without explicitly defining it.
- Why Generative Models?
- Realistic samples for artwork, super-resolution, colorization, etc
- Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
- Training generative models can also enable inference of latent representations that can be useful as general features
- Taxonomy of Generative Models:
- In this lecture we will discuss: PixelRNN/CNN, Variational Autoencoder, and GANs as they are the popular models in research now.
-
PixelRNN and PixelCNN
- In a full visible belief network we use the chain rule to decompose likelihood of an image x into product of 1-d distributions
p(x) = sum(p(x[i]| x[1]x[2]....x[i-1]))
- Where p(x) is the Likelihood of image x and x[i] is Probability of i’th pixel value given all previous pixels.
- To solve the problem we need to maximize the likelihood of training data but the distribution is so complex over pixel values.
- Also we will need to define ordering of previous pixels.
- PixelRNN
- Founded by [van der Oord et al. 2016]
- Dependency on previous pixels modeled using an RNN (LSTM)
- Generate image pixels starting from corner
- Drawback: sequential generation is slow! because you have to generate pixel by pixel!
- PixelCNN
- Also Founded by [van der Oord et al. 2016]
- Still generate image pixels starting from corner.
- Dependency on previous pixels now modeled using a CNN over context region
- Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images)
- Generation must still proceed sequentially still slow.
- There are some tricks to improve PixelRNN & PixelCNN.
- PixelRNN and PixelCNN can generate good samples and are still active area of research.
- In a full visible belief network we use the chain rule to decompose likelihood of an image x into product of 1-d distributions
-
Autoencoders
- Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data.
- Consists of Encoder and decoder.
- The encoder:
- Converts the input x to the features z. z should be smaller than x to get only the important values out of the input. We can call this dimensionality reduction.
- The encoder can be made with:
- Linear or non linear layers (earlier days days)
- Deep fully connected NN (Then)
- RELU CNN (Currently we use this on images)
- The decoder:
- We want the encoder to map the features we have produced to output something similar to x or the same x.
- The decoder can be made with the same techniques we made the encoder and currently it uses a RELU CNN.
- The encoder is a conv layer while the decoder is deconv layer! Means Decreasing and then increasing.
- The loss function is L2 loss function:
L[i] = |y[i] - y'[i]|^2
- After training we though away the decoder.
# Now we have the features we need
- After training we though away the decoder.
- We can use this encoder we have to make a supervised model.
- The value of this it can learn a good feature representation to the input you have.
- A lot of times we will have a small amount of data to solve problem. One way to tackle this is to use an Autoencoder that learns how to get features from images and train your small dataset on top of that model.
- The question is can we generate data (Images) from this Autoencoder?
-
Variational Autoencoders (VAE)
- Probabilistic spin on Autoencoders - will let us sample from the model to generate data!
- We have z as the features vector that has been formed using the encoder.
- We then choose prior p(z) to be simple, e.g. Gaussian.
- Reasonable for hidden attributes: e.g. pose, how much smile.
- Conditional p(x|z) is complex (generates image) => represent with neural network
- But we cant compute integral for P(z)p(x|z)dz as the following equation:
- After resolving all the equations that solves the last equation we should get this:
- Variational Autoencoder are an approach to generative models but Samples blurrier and lower quality compared to state-of-the-art (GANs)
- Active areas of research:
- More flexible approximations, e.g. richer approximate posterior instead of diagonal Gaussian
- Incorporating structure in latent variables
-
Generative Adversarial Networks (GANs)
- GANs don’t work with any explicit density function!
- Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game.
- Yann LeCun, who oversees AI research at Facebook, has called GANs:
-
The coolest idea in deep learning in the last 20 years
-
- Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this as we have discussed!
- Solution: Sample from a simple distribution, e.g. random noise. Learn transformation to training distribution.
- So we create a noise image which are drawn from simple distribution feed it to NN we will call it a generator network that should learn to transform this into the distribution we want.
- Training GANs: Two-player game:
- Generator network: try to fool the discriminator by generating real-looking images.
- Discriminator network: try to distinguish between real and fake images.
- If we are able to train the Discriminator well then we can train the generator to generate the right images.
- The loss function of GANs as minimax game are here:
- The label of the generator network will be 0 and the real images are 1.
- To train the network we will do:
- Gradient ascent on discriminator.
- Gradient ascent on generator but with different loss.
- You can read the full algorithm with the equations here:
- Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training is an active area of research.
- Convolutional Architectures:
- Generator is an upsampling network with fractionally-strided convolutions Discriminator is a Convolutional network.
- Guidelines for stable deep Conv GANs:
- Replace any pooling layers with strided convs (discriminator) and fractional-strided convs with (Generator).
- Use batch norm for both networks.
- Remove fully connected hidden layers for deeper architectures.
- Use RELU activation in generator for all layers except the output which uses Tanh
- Use leaky RELU in discriminator for all the layers.
- 2017 is the year of the GANs! it has exploded and there are some really good results.
- Active areas of research also is GANs for all kinds of applications.
- The GAN zoo can be found here: https://github.com/hindupuravinash/the-gan-zoo
- Tips and tricks for using GANs: https://github.com/soumith/ganhacks
- NIPS 2016 Tutorial GANs: https://www.youtube.com/watch?v=AJVyzd0rqdc
14. Deep reinforcement learning
- This section contains a lot of math.
- Reinforcement learning problems are involving an agent interacting with an environment, which provides numeric reward signals.
- Steps are:
- Environment –> State
s[t]
–> Agent –> Actiona[t]
–> Environment –>Reward r[t]
+ Next states[t+1]
–> Agent –> and so on..
- Environment –> State
- Our goal is learn how to take actions in order to maximize reward.
- An example is Robot Locomotion:
- Objective: Make the robot move forward
- State: Angle and position of the joints
- Action: Torques applied on joints
- 1 at each time step upright + forward movement
- Another example is Atari Games:
- Deep learning has a good state of art in this problem.
- Objective: Complete the game with the highest score.
- State: Raw pixel inputs of the game state.
- Action: Game controls e.g. Left, Right, Up, Down
- Reward: Score increase/decrease at each time step
- Go game is another example which AlphaGo team won in the last year (2016) was a big achievement for AI and deep learning because the problem was so hard.
- We can mathematically formulate the RL (reinforcement learning) by using Markov Decision Process
- Markov Decision Process
- Defined by (
S
,A
,R
,P
,Y
) where:S
: set of possible states.A
: set of possible actionsR
: distribution of reward given (state, action) pairP
: transition probability i.e. distribution over next state given (state, action) pairY
: discount factor# How much we value rewards coming up soon verses later on.
- Algorithm:
- At time step
t=0
, environment samples initial states[0]
- Then, for t=0 until done:
- Agent selects action
a[t]
- Environment samples reward from
R
with (s[t]
,a[t]
) - Environment samples next state from
P
with (s[t]
,a[t]
) - Agent receives reward
r[t]
and next states[t+1]
- Agent selects action
- At time step
- A policy
pi
is a function from S to A that specifies what action to take in each state. - Objective: find policy
pi*
that maximizes cumulative discounted reward:Sum(Y^t * r[t], t>0)
- For example:
- Solution would be:
- Defined by (
- The value function at state
s
, is the expected cumulative reward from following the policy from states
:V[pi](s) = Sum(Y^t * r[t], t>0) given s0 = s, pi
- The Q-value function at state s and action
a
, is the expected cumulative reward from taking actiona
in states
and then following the policy:Q[pi](s,a) = Sum(Y^t * r[t], t>0) given s0 = s,a0 = a, pi
- The optimal Q-value function
Q*
is the maximum expected cumulative reward achievable from a given (state, action) pair:Q*[s,a] = Max(for all of pi on (Sum(Y^t * r[t], t>0) given s0 = s,a0 = a, pi))
- Bellman equation
- Important thing is RL.
- Given any state action pair (s,a) the value of this pair is going to be the reward that you are going to get r plus the value of the state that you end in.
Q*[s,a] = r + Y * max Q*(s',a') given s,a # Hint there is no policy in the equation
- The optimal policy
pi*
corresponds to taking the best action in any state as specified byQ*
- We can get the optimal policy using the value iteration algorithm that uses the Bellman equation as an iterative update
- Due to the huge space dimensions in real world applications we will use a function approximator to estimate
Q(s,a)
. E.g. a neural network! this is called Q-learning- Any time we have a complex function that we cannot represent we use Neural networks!
- Q-learning
- The first deep learning algorithm that solves the RL.
- Use a function approximator to estimate the action-value function
- If the function approximator is a deep neural network => deep q-learning
- The loss function:
- Now lets consider the “Playing Atari Games” problem:
- Our total reward are usually the reward we are seeing in the top of the screen.
- Q-network Architecture:
- Learning from batches of consecutive samples is a problem. If we recorded a training data and set the NN to work with it, if the data aren’t enough we will go to a high bias error. so we should use “experience replay” instead of consecutive samples where the NN will try the game again and again until it masters it.
- Continually update a replay memory table of transitions (
s[t]
,a[t]
,r[t]
,s[t+1]
) as game (experience) episodes are played. - Train Q-network on random minibatches of transitions from the replay memory, instead of consecutive samples.
- The full algorithm:
- A video that demonstrate the algorithm on Atari game can be found here: “https://www.youtube.com/watch?v=V1eYniJ0Rnk"
- Policy Gradients
- The second deep learning algorithm that solves the RL.
- The problem with Q-function is that the Q-function can be very complicated.
- Example: a robot grasping an object has a very high-dimensional state.
- But the policy can be much simpler: just close your hand.
- Can we learn a policy directly, e.g. finding the best policy from a collection of policies?
- Policy Gradients equations:
- Converges to a local minima of
J(ceta)
, often good enough! - REINFORCE algorithm is the algorithm that will get/predict us the best policy
- Equation and intuition of the Reinforce algorithm:
- the problem was high variance with this equation can we solve this?
- variance reduction is an active research area!
- Recurrent Attention Model (RAM) is an algorithm that are based on REINFORCE algorithm and is used for image classification problems:
- Take a sequence of “glimpses” selectively focusing on regions of the image, to predict class
- Inspiration from human perception and eye movements.
- Saves computational resources => scalability
- If an image with high resolution you can save a lot of computations
- Able to ignore clutter / irrelevant parts of image
- RAM is used now in a lot of tasks: including fine-grained image recognition, image captioning, and visual question-answering
- Take a sequence of “glimpses” selectively focusing on regions of the image, to predict class
- AlphaGo are using a mix of supervised learning and reinforcement learning, It also using policy gradients.
- A good course from Standford on deep reinforcement learning
- A good course on deep reinforcement learning (2017)
- A good article
15. Efficient Methods and Hardware for Deep Learning
- The original lecture was given by Song Han a PhD Candidate at standford.
- Deep Conv nets, Recurrent nets, and deep reinforcement learning are shaping a lot of applications and changing a lot of our lives.
- Like self driving cars, machine translations, alphaGo and so on.
- But the trend now says that if we want a high accuracy we need a larger (Deeper) models.
- The model size in ImageNet competation from 2012 to 2015 has increased 16x to achieve a high accurecy.
- Deep speech 2 has 10x training operations than deep speech 1 and thats in only one year!
# At Baidu
- There are three challenges we got from this
- Model Size
- Its hard to deploy larger models on our PCs, mobiles, or cars.
- Speed
- ResNet152 took 1.5 weeks to train and give the 6.16% accurecy!
- Long training time limits ML researcher’s productivity
- Energy Efficiency
- AlphaGo: 1920 CPUs and 280 GPUs. $3000 electric bill per game
- If we use this on our mobile it will drain the battery.
- Google mentioned in thier blog if all the users used google speech for 3 minutes, they have to double thier data-center!
- Where is the Energy Consumed?
- larger model => more memory reference => more energy
- Model Size
- We can improve the Efficiency of Deep Learning by Algorithm-Hardware Co-Design.
- From both the hardware and the algorithm perspectives.
- Hardware 101: the Family
- General Purpose
# Used for any hardware
- CPU
# Latency oriented, Single strong threaded like a single elepahnt
- GPU
# Throughput oriented, So many small threads like a lot of ants
- GPU
- GPGPU
- Specialized HW
#Tuned for a domain of applications
- FPGA# Programmable logic, Its cheaper but less effiecnet`
- ASIC
# Fixed logic, Designed for a certian applications (Can be designed for deep learning applications)
- Specialized HW
- CPU
- General Purpose
- Hardware 101: Number Representation
- Numbers in computer are represented with a discrete memory.
- Its very good and energy efficent for hardware to go from 32 bit to 16 bit in float point operations.
- Part 1: Algorithms for Efficient Inference
-
Pruning neural networks
-
Idea is can we remove some of the weights/neurons and the NN still behave the same?
-
In 2015 Han made AlexNet parameters from 60 million to 6 Million! by using the idea of Pruning.
-
Pruning can be applied to CNN and RNN, iteratively it will reach the same accurecy as the original.
-
Pruning actually happends to humans:
- Newborn(50 Trillion Synapses) ==> 1 year old(1000 Trillion Synapses) ==> Adolescent(500 Trillion Synapses)
-
Algorithm:
- Get Trained network.
- Evaluate importance of neurons.
- Remove the least important neuron.
- Fine tune the network.
- If we need to continue Pruning we go to step 2 again else we stop.
-
-
Weight Sharing
- The idea is that we want to make the numbers is our models less.
- Trained Quantization:
- Example: all weight values that are 2.09, 2.12, 1.92, 1.87 will be replaced by 2
- To do that we can make k means clustering on a filter for example and reduce the numbers in it. By using this we can also reduce the number of operations that are used from calculating the gradients.
- After Trained Quantization the Weights are Discrete.
- Trained Quantization can reduce the number of bits we need for a number in each layer significantly.
- Pruning + Trained Quantization can Work Together to reduce the size of the model.
- Huffman Coding
- We can use Huffman Coding to reduce/compress the number of bits of the weight.
- In-frequent weights: use more bits to represent.
- Frequent weights: use less bits to represent.
- Using Pruning + Trained Quantization + Huffman Coding is called deep compression.
- SqueezeNet
- All the models we have talked about till now was using a pretrained models. Can we make a new arcitecutre that saves memory and computations?
- SqueezeNet gets the alexnet accurecy with 50x fewer parameters and 0.5 model size.
- SqueezeNet can even be further compressed by applying deep compression on them.
- Models are now more energy efficient and has speed up a lot.
- Deep compression was applied in Industry through facebook and Baidu.
- SqueezeNet
-
Quantization
- Algorithm (Quantizing the Weight and Activation):
- Train with float.
- Quantizing the weight and activation:
- Gather the statistics for weight and activation.
- Choose proper radix point position.
- Fine-tune in float format.
- Convert to fixed-point format.
- Algorithm (Quantizing the Weight and Activation):
-
Low Rank Approximation
- Is another size reduction algorithm that are used for CNN.
- Idea is decompose the conv layer and then try both of the composed layers.
-
Binary / Ternary Net
- Can we only use three numbers to represent weights in NN?
- The size will be much less with only -1, 0, 1.
- This is a new idea that was published in 2017 “Zhu, Han, Mao, Dally. Trained Ternary Quantization, ICLR’17”
- Works after training.
- They have tried it on AlexNet and it has reached almost the same error as AlexNet.
- Number of operation will increase per register: https://xnor.ai/
-
Winograd Transformation
- Based on 3x3 WINOGRAD Convolutions which makes less operations than the ordiany convolution
- cuDNN 5 uses the WINOGRAD Convolutions which has improved the speed.
-
- Part 2: Hardware for Efficient Inference
- There are a lot of ASICs that we developed for deep learning. All in which has the same goal of minimize memory access.
- Eyeriss MIT
- DaDiannao
- TPU Google (Tensor processing unit)
- It can be put to replace the disk in the server.
- Up to 4 cards per server.
- Power consumed by this hardware is a lot less than a GPU and the size of the chip is less.
- EIE Standford
- By Han at 2016 [et al. ISCA’16]
- We don’t save zero weights and make quantization for the numbers from the hardware.
- He says that EIE has a better Throughput and energy efficient.
- There are a lot of ASICs that we developed for deep learning. All in which has the same goal of minimize memory access.
- Part 3: Algorithms for Efficient Training
- Parallelization
- Data Parallel–Run multiple inputs in parallel
- Ex. Run two images in the same time!
- Run multiple training examples in parallel.
- Limited by batch size.
- Gradients have to be applied by a master node.
- Model Parallel
- Split up the Model–i.e. the network
- Split model over multiple processors By layer.
- Hyper-Parameter Parallel
- Try many alternative networks in parallel.
- Easy to get 16-64 GPUs training one model in parallel.
- Data Parallel–Run multiple inputs in parallel
- Mixed Precision with FP16 and FP32
- We have discussed that if we use 16 bit real numbers all over the model the energy cost will be less by x4.
- Can we use a model entirely with 16 bit number? We can partially do this with mixed FP16 and FP32. We use 16 bit everywhere but at some points we need the FP32.
- By example in multiplying FP16 by FP16 we will need FP32.
- After you train the model you can be a near accuracy of the famous models like AlexNet and ResNet.
- Model Distillation
- The question is can we use a senior (Good) trained neural network(s) and make them guide a student (New) neural network?
- For more information look at Hinton et al. Dark knowledge / Distilling the Knowledge in a Neural Network
- DSD: Dense-Sparse-Dense Training
- Han et al. “DSD: Dense-Sparse-Dense Training for Deep Neural Networks”, ICLR 2017
- Has a better regularization.
- The idea is Train the model lets call this the Dense, we then apply Pruning to it lets call this sparse.
- DSD produces same model architecture but can find better optimization solution arrives at better local minima, and achieves higher prediction accuracy.
- After the above two steps we go connect the remain connection and learn them again (To dense again).
- This improves the performace a lot in many deep learning models.
- Parallelization
- Part 4: Hardware for Efficient Training
- GPUs for training:
- Nvidia PASCAL GP100 (2016)
- Nvidia Volta GV100 (2017)
- Can make mixed precision operations!
- So powerful.
- The new neclar bomb!
- Google Announced “Google Cloud TPU” on May 2017!
- Cloud TPU delivers up to 180 teraflops to train and run machine learning models.
- One of our new large-scale translation models used to take a full day to train on 32 of the best commercially-available GPUs—now it trains to the same accuracy in an afternoon using just one eighth of a TPU pod.
- GPUs for training:
- We have moved from PC Era ==> Mobile-First Era ==> AI-First Era
16. Adversarial Examples and Adversarial Training
- What are adversarial examples?
- Since 2013, deep neural networks have matched human performance at..
- Face recognition
- Object recognition
- Captcha recognition
- Because its accuracy was higher than humans, Websites tried to find another solution than Captcha.
- And other tasks..
- Before 2013 no body was surprised if they saw a computer made a mistake! But now the deep learning exists and its so important to know the problems and the causes.
- Adversarial are problems and unusual mistake that deep learning make.
- This topic wasn’t hot until deep learning can now do better and better than human!
- An adversarial is an example that has been carefully computed to to be misclassified.
- In a lot of cases the adversarial image isn’t changed much compared to the original image from the human perspective.
- History of recent papers:
- So the first story was in 2013. When Szegedy had a CNN that can classify images very well.
- He wanted to understand more about how CNN works to improve it.
- He give an image of an object and by using gradient ascent he tried to update the images so that it can be another object.
- Strangely he found that the result image hasn’t changed much from the human perspective!
- If you tried it you won’t notify any change and you will think that this is a bug! but it isn’t if you go for the image you will notice that they are completely different!
- These mistakes can be found in almost any deep learning algorithm we have studied!
- It turns out that RBF (Radial Basis Network) can resist this.
- Deep Models for Density Estimation can resist this.
- Not just for neural nets can be fooled:
- Linear models
- Logistic regression
- Softmax regression
- SVMs
- Decision trees
- Nearest neighbors
- Linear models
- Since 2013, deep neural networks have matched human performance at..
- Why do adversarial happen?
- In the process in trying to understand what is happening, in 2016 they thought it was from overfitting models in the high dimensional data case.
- Because in such high dimensions we could have some random errors which can be found.
- So if we trained a model with another parameters it should not make the same mistake?
- They found that not right. Models are reaching to the same mistakes so it doesn’t mean its overfitting.
- In the previous mentioned experiment the found that the problem is caused by systematic thing not a random.
- If they add some vector to an example it would misclassified to any model.
- Maybe they are coming from underfitting not overfitting.
- Modern deep nets are very piecewise linear
- Rectified linear unit
- Carefully tuned sigmoid
# Most of the time we are inside the linear curve
- Maxout
- LSTM
- Relation between the parameter and the output are non linear because it’s multiplied together thats what make training NN difficult, while mapping from linear from input and output are linear and much easier.
- In the process in trying to understand what is happening, in 2016 they thought it was from overfitting models in the high dimensional data case.
- How can adversarial be used to compromise machine learning systems?
-
If we are experimenting how easy a NN to fool, We want to make sure we are actually fooling it not just changing the output class, and if we are attackers we want to make this behavior to the NN (Get hole).
-
When we build Adversarial example we use the max norm constrain to perturbation.
-
The fast gradient sign method:
- This method comes from the fact that almost all NN are using a linear activations (Like RELU) the assumption we have told before.
- No pixel can be changed more than some amount epsilon.
- Fast way is to take the gradient of the cost you used to train the network with respect to the input and then take the sign of that gradient multiply this by epsilon.
- Equation:
Xdash = x + epslion * (sign of the gradient)
- Where Xdash is the adversarial example and x is the normal example
- So it can be detected by just using the sign (direction) and some epsilon.
-
Some attacks are based on ADAM optimizer.
-
Adversarial examples are not random noises!
-
NN are trained on some distribution and behaves well in that distribution. But if you shift this distribution the NN won’t answer the right answers. They will be so easy to fool.
-
deep RL can also be fooled.
-
Attack of the weights:
- In linear models, We can take the learned weights image, take the signs of the image and add it to any example to force the class of the weights to be true. Andrej Karpathy, “Breaking Linear Classifiers on ImageNet”
-
It turns out that some of the linaer models performs well (We cant get advertisal from them easily)
- In particular Shallow RBFs network resist adversarial perturbation # By The fast gradient sign method
- The problem is RBFs doesn’t get so much accuracy on the datasets because its just a shallow model and if you tried to get this model deeper the gradients will become zero in almost all the layers.
- RBFs are so difficult to train even with batch norm. algorithm.
- Ian thinks if we have a better hyper parameters or a better optimization algorithm that gradient decent we will be able to train RBFs and solve the adversarial problem!
- In particular Shallow RBFs network resist adversarial perturbation # By The fast gradient sign method
-
We also can use another model to fool current model. Ex use an SVM to fool a deep NN.
- For more details follow the paper: “Papernot 2016”
-
Transferability Attack
- Target model with unknown weights, machine learning algorithm, training set; maybe non differentiable
- Make your training set from this model using inputs from you, send them to the model and then get outputs from the model
- Train you own model. “Following some table from Papernot 2016”
- Create an Adversarial example on your model.
- Use these examples against the model you are targeting.
- You are almost likely to get good results and fool this target!
-
In Transferability Attack to increase your probability by 100% of fooling a network, You can make more than just one model may be five models and then apply them. “(Liu et al, 2016)”
-
Adversarial Examples are works for human brain also! for example images that tricks your eyes. They are a lot over the Internet.
-
In practice some researches have fooled real models from (MetaMind, Amazon, Google)
-
Someone has uploaded some perturbation into facebook and facebook was fooled :D
-
- What are the defenses?
- A lot of defenses Ian tried failed really bad! Including:
- Ensembles
- Weight decay
- Dropout
- Adding noise at train time or at test time
- Removing perturbation with an autoencoder
- Generative modeling
- Universal approximator theorem
- Whatever shape we would like our classification function to have a big enough NN can make it.
- We could have train a NN that detects the Adversarial!
- Linear models & KNN can be fooled easier than NN. Neural nets can actually become more secure than other models. Adversarial trained neural nets have the best empirical success rate on adversarial examples of any machine learning model.
- Deep NNs can be trained with non linear functions but we will just need a good optimization technique or solve the problem with using such linear activator like “RELU”
- A lot of defenses Ian tried failed really bad! Including:
- How to use adversarial examples to improve machine learning, even when there is no adversary?
- Universal engineering machine (model-based optimization)
#Is called Universal engineering machine by Ian
- For example:
- Imagine that we want to design a car that are fast.
- We trained a NN to look at the blueprints of a car and tell us if the blueprint will make us a fast car or not.
- The idea here is to optimize the input to the network so that the output will max this could give us the best blueprint for a car!
- Make new inventions by finding input that maximizes model’s predicted performance.
- Right now by using adversarial examples we are just getting the results we don’t like but if we have solve this problem we can have the fastest car, the best GPU, the best chair, new drugs…..
- For example:
- The whole adversarial is an active area of research especially defending the network!
- Universal engineering machine (model-based optimization)
- Conclusion
- Attacking is easy
- Defending is difficult
- Adversarial training provides regularization and semi-supervised learning
- The out-of-domain input problem is a bottleneck for model-based optimization generally
- There are a Github code that can make you learn everything about adversarial by code (Built above tensorflow):
- An adversarial example library for constructing attacks, building defenses, and benchmarking both: https://github.com/tensorflow/cleverhans
These Notes was made by Mahmoud Badry @2017
3 - MK Internet of Things
MK Internet of Things
- Kode: TKE194945 Internet of Things
- SKS: 3 SKS
- Jadwal
- Kelas A: Ruang E-205, Kamis 13.00, 12 mhs, 1 mhs MBKM
4 - MK Pengolahan Sinyal Digital
MK Pengolahan Sinyal Digital
- Kode: TKE192227 Pengolahan Sinyal Digital
- SKS : 3 SKS
- Jadwal:
- Kelas B: Ruang E-101, Rabu 07.00, 35 mhs
- Kelas A: Ruang E-101, Rabu 09.45, 50 mhs
Identitas
- Kode Mata Kuliah: TKE192227
- SKS Mata Kuliah: 3 SKS
- Semester Mata Kuliah: 4
- Sifat Mata Kuliah: Teknik Elektro Inti (TEI)
Materi
- Pertemuan 1
- Pertemuan 2
- Pertemuan 3
- Pertemuan 4
- Pertemuan 5
- Pertemuan 6
- Pertemuan 7
Referensi
- Li Tan, Digital Signal Processing
Link
5 - MK Sistem Kendali Cerdas
MK Sistem Kendali Cerdas
Identitas
- Kode : TKE194941 Sistem Kendali Cerdas
- SKS : 3 SKS
- Jadwal:
- Kelas A : Ruang E-201, Jum’at 13.55, 3 mhs
- Metode: Case-based dan Project-based Learning
- Semester Mata Kuliah: 6
- Sifat Mata Kuliah: Teknik Elektro Pendalaman (TED)
Materi
- Pendahuluan
- Dasar-dasar Logika Fuzzy
- Sistem Inferensi Fuzzy
- Sistem Inferensi Fuzzy untuk Sistem Kendali
- Proyek Sistem Inferensi Fuzzy untuk Sistem Kendali
- Proyek Sistem Inferensi Fuzzy untuk Sistem Kendali
- Pendahuluan Neural Network
- Neural Network dalam Sistem Kendali
- Neural Network dalam Sistem Kendali
- Neural Network dalam Sistem Kendali
- Sistem Neuro-Fuzzy
- Sistem Neuro-Fuzzy untuk Sistem Kendali
- Proyek Sistem Neuro-Fuzzy untuk Sistem Kendali
- Proyek Sistem Neuro-Fuzzy untuk Sistem Kendali
Referensi Utama
- Liu Jinkun, Intelligent Control Design and MATLAB Simulation [website] [m-file download]
- Fuzzy and Neural Control by Babuska
- Intelligent Control - A Hybrid Approach Based on Fuzzy Logic, Neural Networks and Genetic Algorithms - Nazmul Siddique - Springer
- Himanshu Singh & Yunis Ahmad Lone, Deep Neuro-Fuzzy Systems With Python: With Case Studies and Applications From the Industry [website][python download]
- Hung T. Nguyen & Nadipuram R. Prasad & Carol L. Walker & Elbert A. Walker, A First Course in Fuzzy and Neural Control [website]
Referensi Tambahan
- Roland S Burns, Advanced Control Engineering (Chapter 10) [website]
- Ali Zilouchian & Mo Jamshidi, Intelligent Control Systems Using Soft Computing Methodologies, [website][ebook download]
- Adrian A. Hopgood, Intelligent Systems for Engineers and Scientists, websites
- Adedeji Bodunde Badiru, Fuzzy Engineering Expert Systems With Neural Network Applications
- Ahmad M. Ibrahim, Fuzzy Logic for Embedded Systems Applications [website]
- Erdal Kayacan & Mojtaba Ahmadieh, Fuzzy Neural Networks for Real Time Control Applications: Concepts, Modeling and Algorithms for Fast Learning [website]
- James M. Keller & Derong Liu & David B Fogel, Fundamentals of Computational Intelligence: Neural Networks, Fuzzy Systems, and Evolutionary Computation [website]
- Steven L Brunton & J Nathan Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control [website][ebook download][MATLAB Codeand Data][Python Codeand Data]
- Intelligent Control: Fuzzy Logic Applications - 1st Edition - Clarence
- Fuzzy Logic in Control - René Jager - Google Books
Software Links
- GNU Octave
- Octave Online
- MATLAB and Simulink
- Anaconda
- Google Colab
- Fuzzylite : The FuzzyLite Libraries for Fuzzy Logic Control
Video Links
- Neural Network - Online Course - MATLAB Helper - YouTube
- Artificial Intelligence Tutorial - YouTube
- Data-Driven Control with Machine Learning - YouTube
E-learning Link
Neuro-fuzzy in Python
Libraries
- numpy
conda install -c conda-forge numpy
,pip install numpy
- scipy
conda install -c conda-forge scipy
,pip install scipy
- scikit fuzzy
conda install -c conda-forge scikit-fuzzy
,pip install scikit-fuzzy
- scikit learn
conda install -c conda-forge scikit-learn
,pip install scikit-learn
- fuzzylite
pip install pyfuzzylite
- pandas
conda install -c conda-forge pandas
,pip install pandas
- statsmodels
conda install -c conda-forge statsmodels
,pip install statsmodels
- keras
conda install -c conda-forge keras
,pip install keras
- anfis
pip install anfis
- bokeh
conda install -c conda-forge bokeh
,pip install bokeh
- fuzzycmeans
pip install fuzzycmeans
Downgrade Python for installing keras and tensorflow
python --version
conda search python
: check installed version of pythonconda install python=3.6.0
: downgrade to your preferred python
6 - MK Sistem Kendali
MK Sistem Kendali
Identitas MK
- Kode Mata Kuliah: TKE192221
- SKS Mata Kuliah: 2 SKS
- Semester Mata Kuliah: 4
- Sifat Mata Kuliah: Teknik Elektro Inti (TEI)
- Jadwal :
- Kelas A : Ruang C-201, Selasa 07.55, 45 mhs
- Kelas B : Ruang C-201, Selasa 10.40, 47 mhs
Referensi
- Norman S. Nise, Control Systems Engineering [website]
- Katsuhiko Ogata, Modern Control Engineering
- Richard C. Dorf and Robert H. Bishop, Modern Control Systems [website]
- Farid Golnaraghi and Benjamin C. Kuo, Automatic Control Systems [website]
- Brian Douglas, The Fundamentals of Control Theory [website][ebook]
- Pao C. Chau, Process Control: A First Course With MATLAB [website]
- Karl J. Åström and Richard M. Murray, Feedback Systems: An Introduction for Scientists and Engineers [website]
- R.V. Dukkipati, Analysis and Design of Control Systems using MATLAB
- Book: Introduction to Control Systems (Iqbal) - Engineering LibreTexts License: CC-BY-NC
- Book: Chemical Process Dynamics and Controls (Woolf) - Engineering LibreTexts License: CC-BY
Software
- GNU Octave
- Octave Online
- MATLAB - MathWorks - MATLAB & Simulink
- Python and Jupyter Notebook in Anaconda.org or in Google Colab
- Visual Model Q
Interactive Learning
Video
Kuliah
01-Pendahuluan Sistem Kendali
Interactive Course for Control Theory
- Akses situs Interactive Course for Control Theory
- Buat akun ICCT, cek email untuk mendapatkan username dan password
- Login ke Interactive Course for Control Theory
- Selanjutnya anda akan berinteraksi dengan Jupyter Notebook di ICCT
- Klik folder ICCT pada Jupyter Notebook, lalu klik
Table-of-Contents-ICCT.ipynb
- Klik salah satu link, misalnya
1.1.1 Complex Numbers in Cartesian Form
di folder1.1 Complex Numbers
- Pada link tersebut, anda berada di Jupyter Notebook
M-01_Complex_numbers_Cartesian_form.ipynb
- Tidak perlu terlalu panik dengan kode Python yang muncul.
- Pilih menu lalu
- Silakan baca Notebook-nya, pahami penjelasan atau penugasannya.
- Lalu anda secara interaktif melakukan pengubahan berbagai menu di dalam Notebook.
- Anda dapat pula unduh atau screenshoot citranya.
- Jika sudah cukup dan selesai, pilih menu lalu untuk mematikan Jupyter Notebook. Biasakan untuk melakukan hal ini setiap kali selesai bekerja dengan Jupyter Notebook.
Pertemuan 2
Pertemuan 3
Pertemuan 4
Pertemuan 5
Pertemuan 6
Pertemuan 7
UTS
Pertemuan 8
Pertemuan 9
Pertemuan 10
Pertemuan 11
Pertemuan 12
Pertemuan 13
Pertemuan 14
UAS
7 - Rangkaian Listrik
Rangkaian Listrik
Rangkaian AC (steady state)
$$ X_L=\omega L $$
$$ Z_L=jX_L=j\omega L=L\angle 90^{\circ} $$
$$ X_C= \frac{1}{\omega C} $$
$$ Z_C=-jX_C=\frac{-j}{\omega C}=\frac{1}{j\omega C}=C\angle -90^{\circ} $$
Rangkaian DC (steady state)
Pada DC, L adalah short circuit, C adalah open circuit. Bisa dicermati karena $\omega = 2 \pi f$ dengan $f=0$.
9 - Machine Learning Andrew Ng Quizzes
Machine Learning Andrew Ng Quizzes
Week 1
Introduction
- A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?
- 🗹 The probability of it correctly predicting a future date’s weather.
- ☐ The weather prediction task.
- ☐ The process of the algorithm examining a large amount of historical weather data.
- ☐ None of these.
- A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T?
- 🗹 The weather prediction task.
- ☐ None of these.
- ☐ The probability of it correctly predicting a future date’s weather.
- ☐ The process of the algorithm examining a large amount of historical weather data.
- Suppose you are working on weather prediction, and use a learning algorithm to predict tomorrow’s temperature (in degrees Centigrade/Fahrenheit).
Would you treat this as a classification or a regression problem?- 🗹 Regression
- ☐ Classification
- Suppose you are working on weather prediction, and your weather station makes one of three predictions for each day’s weather: Sunny, Cloudy or Rainy. You’d like to use a learning algorithm to predict tomorrow’s weather.
Would you treat this as a classification or a regression problem?- ☐ Regression
- 🗹 Classification
- Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this.
Would you treat this as a classification or a regression problem?- 🗹 Regression
- ☐ Classification
- Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy).
Would you treat this as a classification or a regression problem?- Regression
- 🗹 Classification
- Suppose you are working on stock market prediction, Typically tens of millions of shares of Microsoft stock are traded (i.e., bought/sold) each day. You would like to predict the number of Microsoft shares that will be traded tomorrow.
Would you treat this as a classification or a regression problem?- 🗹 Regression
- Classification
- Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
- 🗹 Given historical data of children’s ages and heights, predict children’s height as a function of their age.
- 🗹 Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict the gender of a new manuscript’s author (when the identity of this author is unknown).
- Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow “similar” or “related”.
- Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail.
- Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
- ☐ Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or “types” of patients in terms of how they respond to the drug, and if so what these categories are.
- ☐ Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments.
- 🗹 Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
- 🗹 Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years.
- Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
- ☐ Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow “similar” or “related”.
- 🗹 Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years.
- ☐ Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail.
- 🗹 Examine the statistics of two football teams, and predict which team will win tomorrow’s match (given historical data of teams’ wins/losses to learn from).
- Which of these is a reasonable definition of machine learning?
- ☐ Machine learning is the science of programming computers.
- ☐ Machine learning learns from labeled data.
- ☐ Machine learning is the field of allowing robots to act intelligently.
- 🗹 Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.
Linear Regression with One Variable :
-
Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year. Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is $h_\theta(x)=\theta_0+\theta_1x$ to denote the number of training examples.
For the training set given above (note that this training set may also be referenced in other questions in this $m$)? In the box below, please enter your answer (which should be a number between 0 and 10).4
-
Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.
You would like to use linear regression $h_\theta(x) = \theta_0 + \theta_1x$ to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for $\theta_0$ and $\theta_1$ ? You should be able to select the right answer without actually implementing linear regression.
- ☐ $\theta_0$ = −569.6, $\theta_1$ = 530.9
- ☐ $\theta_0$ = −1780.0, $\theta_1$ = −530.9
- 🗹 $\theta_0$ = −569.6, $\theta_1$ = −530.9
- ☐ $\theta_0$ = −1780.0, $\theta_1$ = 530.9
-
For this question, assume that we are using the training set from Q1.
Recall our definition of the cost function was $J(\theta_0, \theta_1 ) = \frac{1}{2m} \sum_{i=1}^{m} (h (x^{(i)} ) - y^{(i)})^2$
What is $J(0,1)$? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and ‘.’ as the decimal delimiter e.g., 1.5).0.5
-
Suppose we set $\theta_0 = 0, \theta_1 = 1.5$ in the linear regression hypothesis from Q1. What is $h_\theta(2)$ ?
3
-
Suppose we set $\theta_0 = -2, \theta_1 = 0.5$ in the linear regression hypothesis from Q1. What is $h_\theta(6)$?
1
-
Let $f$ be some function so that $f(\theta_0 , \theta_1 )$ outputs a number. For this problem, f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima).
Suppose we use gradient descent to try to minimize $f(\theta_0 , \theta_1 )$ as a function of $\theta_0$ and $\theta_1$.
Which of the following statements are true? (Check all that apply.)- 🗹 If $\theta_0$ and $\theta_1$ are initialized at the global minimum, then one iteration will not change their values.
- ☐ Setting the learning rate $\alpha$ to be very small is not harmful, and can only speed up the convergence of gradient descent.
- ☐ No matter how $\theta_0$ and $\theta_1$ are initialized, so long as $\alpha$ is sufficiently small, we can safely expect gradient descent to converge to the same solution.
- 🗹 If the first few iterations of gradient descent cause $f(\theta_0 , \theta_1)$ to increase rather than decrease, then the most likely cause is that we have set the learning rate $\alpha$ to too large a value.
-
In the given figure, the cost function $J(\theta_0, \theta_1)$ has been plotted against $\theta_0$ and $\theta_1$, as shown in ‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure, choose the correct options (check all that apply).
- ☐ If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0, \theta_1)$ is maximum at point A.
- ☐ If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point C, as the value of cost function $J(\theta_0, \theta_1)$ is minimum at point C.
- 🗹 Point P (the global minimum of plot 2) corresponds to point A of Plot 1.
- 🗹 If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0, \theta_1)$ is minimum at A.
- ☐ Point P (The global minimum of plot 2) corresponds to point C of Plot 1.
-
Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some $\theta_0, \theta_1$, such that $J(\theta_0 , \theta_1) = 0$.
Which of the statements below must then be true? (Check all that apply.)- ☐ Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
- ☐ For this to be true, we must have $\theta_0 = 0$ and $\theta_1 = 0$
so that $h_{\theta}(x) = 0$ - ☐ For this to be true, we must have $y^{(i)} = 0$ for every value of $i = 1, 2,…,m$.
- 🗹 Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.
Week 4
Logistic Regression :
- Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction $h_\theta(x) = 0.2$. This means (check all that apply):
- ☐ Our estimate for P(y = 1|x; θ) is 0.8.
- 🗹 Our estimate for P(y = 0|x; θ) is 0.8.
- 🗹 Our estimate for P(y = 1|x; θ) is 0.2.
- ☐ Our estimate for P(y = 0|x; θ) is 0.2.
- Suppose you have the following training set, and fit a logistic regression classifier $h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2)$.
Which of the following are true? Check all that apply.
- 🗹 Adding polynomial features (e.g., instead using $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2 ))$ could increase how well we can fit the training data.
- 🗹 At the optimal value of θ (e.g., found by fminunc), we will have $J(θ) ≥ 0$.
- ☐ Adding polynomial features (e.g., instead using $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2 ))$ would increase $J(θ)$ because we are now summing over more terms.
- ☐ If we train gradient descent for enough iterations, for some examples $x^{(i)}$ in the training set it is possible to obtain $h_\theta(x^{(i)} ) > 1$.
- For logistic regression, the gradient is given by $\frac{\partial }{\partial \theta_j } J(\theta) = \frac{1}{m} \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{i})x^{(i)}_j$. Which of these is a correct gradient descent update for logistic regression with a learning rate of $\alpha$ ? Check all that apply.
- 🗹 $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)-y^i}) x^{(i)}_j$ (simultaneously update for all j).
- ☐ $\theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m (\theta^Tx-y^{(i)}) x^{(i)}$.
- 🗹 $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left(\frac{1}{1+e^{-\theta^Tx^{(i)}}}-y^{(i)}\right) x^{(i)}_j$ (simultaneously update for all j).
- ☐ $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)-y^i}) x^{(i)}$ (simultaneously update for all j).
- Which of the following statements are true? Check all that apply.
- 🗹 The one-vs-all technique allows you to use logistic regression for problems in which each $y^{(i)}$ comes from a fixed, discrete set of values.
- ☐ For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).
- 🗹 The cost function $J(\theta)$ for logistic regression trained with $m \geq 1$ examples is always greater than or equal to zero.
- ☐ Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).
- Suppose you train a logistic classifier $h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2)$. Suppose $\theta_0 = 6$, $\theta_1 = -1$, $\theta_2 = 0$. Which of the following figures represents the decision boundary found by your classifier?
- 🗹 Figure:
- ☐ Figure:
- ☐ Figure:
- ☐ Figure:
- 🗹 Figure:
Regularization
- You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
- ☐ Introducing regularization to the model always results in equal or better performance on the training set.
- ☐ Introducing regularization to the model always results in equal or better performance on examples not in the training set.
- 🗹 Adding a new feature to the model always results in equal or better performance on the training set.
- ☐ Adding many new features to the model helps prevent overfitting on the training set.
- Suppose you ran logistic regression twice, once with $\lambda = 0$, and once with $\lambda = 1$. One of the times, you got parameters $\theta = \begin{bmatrix} 74.81\ 45.05 \end{bmatrix}$, and the other time you got $\theta = \begin{bmatrix} 1.37\ 0.51 \end{bmatrix}$. However, you forgot which value of $\lambda$ corresponds to which value of $\theta$. Which one do you think corresponds to $\lambda = 1$?
- 🗹 $\theta = \begin{bmatrix} 1.37\ 0.51 \end{bmatrix}$
- ☐ $\theta = \begin{bmatrix} 74.81\ 45.05 \end{bmatrix}$
- Which of the following statements about regularization are true? Check all that apply.
- ☐ Using a very large value of $\lambda$ hurt the performance of your hypothesis; the only reason we do not set $\lambda$ to be too large is to avoid numerical problems.
- ☐ Because logistic regression outputs values $0 \leq h_\theta(x) \leq 1$, its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
- 🗹 Consider a classification problem. Adding regularization may cause your classifier to incorrectly classify some training examples (which it had correctly classified when not using regularization, i.e. when $\lambda = 0$).
- ☐ Using too large a value of $\lambda$ can cause your hypothesis to overfit the data; this can be avoided by reducing $\lambda$.
- Which of the following statements about regularization are true? Check all that apply.
- ☐ Using a very large value of $\lambda$ hurt the performance of your hypothesis; the only reason we do not set $\lambda$ to be too large is to avoid numerical problems.
- ☐ Because logistic regression outputs values $0 \leq h_\theta(x) \leq 1$, its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
- ☐ Because regularization causes $J(\theta)$ to no longer be convex, gradient descent may not always converge to the global minimum (when $\lambda > 0$, and when using an appropriate learning rate $\alpha$).
- 🗹 Using too large a value of $\lambda$ can cause your hypothesis to underfit the data; this can be avoided by reducing $\lambda$.
- In which one of the following figures do you think the hypothesis has overfit the training set?
- 🗹 Figure:
- ☐ Figure:
- ☐ Figure:
- ☐ Figure:
- 🗹 Figure:
- In which one of the following figures do you think the hypothesis has underfit the training set?
- 🗹 Figure:
- ☐ Figure:
- ☐ Figure:
- ☐ Figure:
- 🗹 Figure:
Week 5
Neural Networks - Representation :
- Which of the following statements are true? Check all that apply.
- 🗹 Any logical function over binary-valued (0 or 1) inputs $x_1$ and $x_2$ can be (approximately) represented using some neural network.
- ☐ Suppose you have a multi-class classification problem with three classes, trained with a 3 layer network. Let $a^{(3)}1 = (h\theta(x))_1$ be the activation of the first output unit, and similarly $a^{(3)}2 = (h\theta(x))_2$ and $a^{(3)}3 = (h\theta(x))_3$. Then for any input x, it must be the case that $a^{(3)}_1 + a^{(3)}_2 + a^{(3)}_3 = 1$.
- ☐ A two layer (one input layer, one output layer; no hidden layer) neural network can represent the XOR function.
- 🗹 The activation values of the hidden units in a neural network, with the sigmoid activation function applied at every layer, are always in the range (0, 1).
- Consider the following neural network which takes two binary-valued inputs
$x_1,x_2 \ \epsilon \ {0,1}$ and outputs $h_\theta(x)$. Which of the following logical functions does it (approximately) compute?
- 🗹 AND
- ☐ NAND (meaning “NOT AND”)
- ☐ OR
- ☐ XOR (exclusive OR)
- Consider the following neural network which takes two binary-valued inputs
$x_1,x_2 \ \epsilon \ {0,1}$ and outputs $h_\theta(x)$. Which of the following logical functions does it (approximately) compute?
- ☐ AND
- ☐ NAND (meaning “NOT AND”)
- 🗹 OR
- ☐ XOR (exclusive OR)
- Consider the neural network given below. Which of the following equations correctly computes the activation $a_1^{(3)}$? Note: $g(z)$ is the sigmoid activation function.
- 🗹 $a_1^{(3)} = g(\theta_{1,0}^{(2)}a_0^{(2)}+\theta_{1,1}^{(2)}a_1^{(2)}+\theta_{1,2}^{(2)}a_2^{(2)})$
- ☐ $a_1^{(3)} = g(\theta_{1,0}^{(2)}a_0^{(1)}+\theta_{1,1}^{(2)}a_1^{(1)}+\theta_{1,2}^{(2)}a_2^{(1)})$
- ☐ $a_1^{(3)} = g(\theta_{1,0}^{(1)}a_0^{(2)}+\theta_{1,1}^{(1)}a_1^{(2)}+\theta_{1,2}^{(1)}a_2^{(2)})$
- ☐ $a_1^{(3)} = g(\theta_{2,0}^{(2)}a_0^{(2)}+\theta_{2,1}^{(2)}a_1^{(2)}+\theta_{2,2}^{(2)}a_2^{(2)})$
- You have the following neural network:
You’d like to compute the activations of the hidden layer $a^{(2)} \ \epsilon \ R^3$. One way to do
so is the following Octave code:
You want to have a vectorized implementation of this (i.e., one that does not use for loops). Which of the following implementations correctly compute ? Check all
that apply.- 🗹
z = Theta1 * x; a2 = sigmoid (z);
- ☐
a2 = sigmoid (x * Theta1);
- ☐
a2 = sigmoid (Theta2 * x);
- ☐
z = sigmoid(x); a2 = sigmoid (Theta1 * z);
- 🗹
- You are using the neural network pictured below and have learned the parameters $\theta^{(1)} = \begin{bmatrix} 1 & 1 & 2.4\ 1 & 1.7 & 3.2 \end{bmatrix}$ (used to compute $a^{(2)}$) and $\theta^{(2)} = \begin{bmatrix} 1 & 0.3 & -1.2 \end{bmatrix}$ (used to compute $a^{(3)}$ as a function of $a^{(2)}$). Suppose you swap the parameters for the first hidden layer between its two units so $\theta^{(1)} = \begin{bmatrix} 1 & 1.7 & 3.2 \ 1 & 1 & 2.4 \end{bmatrix}$ and also swap the output layer so $\theta^{(2)} = \begin{bmatrix} 1 & -1.2 & 0.3 \end{bmatrix}$. How will this change the value of the output $h_\theta(x)$?
- 🗹 It will stay the same.
- ☐ It will increase.
- ☐ It will decrease
- ☐ Insufficient information to tell: it may increase or decrease.
Neural Networks: Learning :
- You are training a three layer neural network and would like to use backpropagation to compute the gradient of the cost function. In the backpropagation algorithm, one of the steps is to update $\Delta_{ij}^{(2)} := \Delta_{ij}^{(2)} + \delta_i^{(3)} * (a^{(2)})_j$
for every i,j. Which of the following is a correct vectorization of this step?- ☐ $\Delta^{(2)} := \Delta^{(2)} + \delta^{(2)} * (a^{(3)})^T$
- ☐ $\Delta^{(2)} := \Delta^{(2)} + (a^{(2)})^T * \delta^{(3)}$
- ☐ $\Delta^{(2)} := \Delta^{(2)} + (a^{(2)})^T * \delta^{(2)}$
- 🗹 $\Delta^{(2)} := \Delta^{(2)} + \delta^{(3)} * (a^{(2)})^T$
- Suppose Theta1 is a 5x3 matrix, and Theta2 is a 4x6 matrix. You set thetaVec = [Theta1( : ), Theta2( : )]. Which of the following correctly recovers ?
- 🗹 reshape(thetaVec(16 : 39), 4, 6)
- ☐ reshape(thetaVec(15 : 38), 4, 6)
- ☐ reshape(thetaVec(16 : 24), 4, 6)
- ☐ reshape(thetaVec(15 : 39), 4, 6)
- ☐ reshape(thetaVec(16 : 39), 6, 4)
- Let $J(\theta) = 2\theta^3 + 2$. Let $\theta = 1$, and $\epsilon = 0.01$. Use the formula $\frac{J{(\theta + \epsilon)}-J{(\theta - \epsilon)}}{2\epsilon}$ to numerically compute an approximation to the derivative at $\theta = 1$. What value do you get? (When $\theta = 1$, the true/exact derivative is $\frac{\mathrm{d} J(\theta)}{\mathrm{d} \theta} = 6$.)
- ☐ 8
- 🗹 6.0002
- ☐ 6
- ☐ 5.9998
- Which of the following statements are true? Check all that apply.
- 🗹 For computational efficiency, after we have performed gradient checking to verify that our backpropagation code is correct, we usually disable gradient checking before using backpropagation to train the network.
- ☐ Computing the gradient of the cost function in a neural network has the same efficiency when we use backpropagation or when we numerically compute it using the method of gradient checking.
- 🗹 Using gradient checking can help verify if one’s implementation of backpropagation is bug-free.
- ☐ Gradient checking is useful if we are using one of the advanced optimization methods (such as in fminunc) as our optimization algorithm. However, it serves little purpose if we are using gradient descent.
- Which of the following statements are true? Check all that apply.
- 🗹 If we are training a neural network using gradient descent, one reasonable “debugging” step to make sure it is working is to plot $J(\theta)$ as a function of the number of iterations, and make sure it is decreasing (or at least non-increasing) after each iteration.
- ☐ Suppose you have a three layer network with parameters $\theta^{(1)}$ (controlling the function mapping from the inputs to the hidden units) and $\theta^{(2)}$ (controlling the mapping from the hidden units to the outputs). If we set all the elements of $\theta^{(1)}$ to be 0, and all the elements of $\theta^{(2)}$ to be 1, then this suffices for symmetry breaking, since the neurons are no longer all computing the same function of the input.
- 🗹 Suppose you are training a neural network using gradient descent. Depending on your random initialization, your algorithm may converge to different local optima (i.e., if you run the algorithm twice with different random initializations, gradient descent may converge to two different solutions).
- ☐ If we initialize all the parameters of a neural network to ones instead of zeros, this will suffice for the purpose of “symmetry breaking” because the parameters are no longer symmetrically equal to zero.
Week 6
Advice for Applying Machine Learning :
- You train a learning algorithm, and find that it has unacceptably high error on the test set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering from high bias, high variance, or neither?
- ☐ High variance
- ☐ Neither
- 🗹 High bias
- You train a learning algorithm, and find that it has unacceptably high error on the test set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering from high bias, high variance, or neither?
- 🗹 High variance
- ☐ Neither
- ☐ High bias
- Suppose you have implemented regularized logistic regression to classify what object is in an image (i.e., to do object recognition). However, when you test your hypothesis on a new set of images, you find that it makes unacceptably large errors with its predictions on the new images. However, your hypothesis performs well (has low error) on the training set. Which of the following are promising steps to take? Check all that apply.
NOTE: Since the hypothesis performs well (has low error) on the training set, it is suffering from high variance (overfitting)- ☐ Try adding polynomial features.
- ☐ Use fewer training examples.
- 🗹 Try using a smaller set of features.
- 🗹 Get more training examples.
- ☐ Try evaluating the hypothesis on a cross validation set rather than the test set.
- ☐ Try decreasing the regularization parameter λ.
- 🗹 Try increasing the regularization parameter λ.
- Suppose you have implemented regularized logistic regression to predict what items customers will purchase on a web shopping site. However, when you test your hypothesis on a new set of customers, you find that it makes unacceptably large errors in its predictions. Furthermore, the hypothesis performs poorly on the training set. Which of the following might be promising steps to take? Check all that apply.
NOTE: Since the hypothesis performs poorly on the training set, it is suffering from high bias (underfitting)- ☐ Try increasing the regularization parameter λ.
- 🗹 Try decreasing the regularization parameter λ.
- ☐ Try evaluating the hypothesis on a cross validation set rather than the test set.
- ☐ Use fewer training examples.
- 🗹 Try adding polynomial features.
- ☐ Try using a smaller set of features.
- 🗹 Try to obtain and use additional features.
- Which of the following statements are true? Check all that apply.
- ☐ Suppose you are training a regularized linear regression model. The recommended way to choose what value of regularization parameter to use is to choose the value of which gives the lowest test set error.
- ☐ Suppose you are training a regularized linear regression model.The recommended way to choose what value of regularization parameter to use is to choose the value of which gives the lowest training set error.
- 🗹 The performance of a learning algorithm on the training set will typically be better than its performance on the test set.
- 🗹 Suppose you are training a regularized linear regression model. The recommended way to choose what value of regularization parameter to use is to choose the value of which gives the lowest cross validation error.
- 🗹 A typical split of a dataset into training, validation and test sets might be 60% training set, 20% validation set, and 20% test set.
- ☐ Suppose you are training a logistic regression classifier using polynomial features and want to select what degree polynomial (denoted in the lecture videos) to use. After training the classifier on the entire training set, you decide to use a subset of the training examples as a validation set. This will work just as well as having a validation set that is separate (disjoint) from the training set.
- ☐ It is okay to use data from the test set to choose the regularization parameter λ, but not the model parameters (θ).
- 🗹 Suppose you are using linear regression to predict housing prices, and your dataset comes sorted in order of increasing sizes of houses. It is then important to randomly shuffle the dataset before splitting it into training, validation and test sets, so that we don’t have all the smallest houses going into the training set, and all the largest houses going into the test set.
- Which of the following statements are true? Check all that apply.
- 🗹 A model with more parameters is more prone to overfitting and typically has higher variance.
- ☐ If the training and test errors are about the same, adding more features will not help improve the results.
- 🗹 If a learning algorithm is suffering from high bias, only adding more training examples may not improve the test error significantly.
- 🗹 If a learning algorithm is suffering from high variance, adding more training examples is likely to improve the test error.
- 🗹 When debugging learning algorithms, it is useful to plot a learning curve to understand if there is a high bias or high variance problem.
- ☐ If a neural network has much lower training error than test error, then adding more layers will help bring the test error down because we can fit the test set better.
Links
- Coursera-Machine-Learning/Week1Quiz.md at master · LiMengyang990726/Coursera-Machine-Learning
- atinesh-s/Coursera-Machine-Learning-Stanford: Machine learning-Stanford University
- Coursera-Machine Learning - Andrew NG - All weeks solutions of assignments and quiz - codemummy online technical computer science platform.
- datasciencecoursera/Stanford_Machine_Learning at master · mGalarnyk/datasciencecoursera
- APDaga DumpBox : The Thirst for Learning…: Machine Learning
10 - Statistic and Probability
Statistic and Probability
- Statistics 110: Probability
- Book 0: “Machine Learning: A Probabilistic Perspective” (2012) - pml-book
- Introduction to Machine Learning, Fourth Edition - The MIT Press
- An Introduction to Statistical Learning with Applications in R
- Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.
- Probabilistic Machine Learning Series
11 - MK Machine Learning
MK Machine Learning
- Kode: TKE194918
- SKS: 3
- Jadwal
- TKE194918 Machine Learning A RABU 15:00 - 17:30 GEDUNG TEKNIK E 201 - 12 mhs
Sumber Referensi
- Materi Kuliah dari Andrew Ng
- Python Data Science
- Python ML Course
- TensorFlow, Keras and deep learning, without a PhD, Github
- CS231n Github, CS231n Github Source
- Homemade Machine Learning in Python License: MIT
- Machine Learning Octave in Octave License: MIT
- Machine Learning Experiments License: MIT
- COMS W4995 Applied Machine Learning Spring 2019 - Schedule - Andreas C. Müller - Associate Research Scientist, amueller/COMS4995-s19: COMS W4995 Applied Machine Learning - Spring 19 License: CC0
Tools
Kuliah
Pekan 7-9
- Logistic regression - pdf - ppt
- Regularization - pdf - ppt
- Programming Exercise 2: Logistic Regression - pdf - Problem - Solution
- Lecture Notes
- Errata
- 06: Logistic Regression by Holehouse
- 07: Regularization by Holehouse
Pekan 10-12
- Neural Networks: Representation - pdf - ppt
- Programming Exercise 3: Multi-class Classification and Neural Networks - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- 08: Neural Networks - Representation by Holehouse
Pekan 13
- Neural Networks: Learning - pdf - ppt
- Programming Exercise 4: Neural Networks Learning - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- 09: Neural Networks - Learning by Holehouse
Visualizing Backpropagation
Pekan 14 : Deep Learning Introduction
12 - Electronics
Electronics
Electronic Blog
- Evil Mad Scientist Laboratories - Making the world a better place, one Evil Mad Scientist at a time.
Electronic Simulator
- Everycircuit
- Online circuit simulator & schematic editor - CircuitLab
- Circuit Diagram Web Editor
- Partsim
- library.io
- Tejotron
- hneemann/Digital: A digital logic designer and circuit simulator.
- List of Electronic Simulator
- SimulIDE : Real Time Electronic Circuit Simulator. With PIC, AVR and Arduino simulation.
- Home - QucsStudio
- Digital Logic Sim by Sebastian Lague]
Electronic Diagram
EDA Simulator
- EDA Playground : Verilog, VHDL
Electronic Forum
- Electronic Lab : Electronic Forum
- Forum for Electronics
Electronic Book
- Ultimate Electronics Book
- Socratic Electronics License: CC-BY
Electronic Tutorial
- Electronic Tutorials
- Learn Sparkfun
- All About Circuits
- Talking Electronics
- Learning the Art of Electronics: A Hands-on Approach - by Thomas C. Hayes
- Electronics with Jim Fiore at MVCC
- Electromagnetics, Volume 1
- Electronics I and II: Analog Devices Wiki
- Software-Defined Radio for Engineers, 2018 - Education - Analog Devices
13 - Free Online Course
Free Online Course
Online Course Platform
List of Free Online Course
- Resume Worded
- Class Central
- Free Udemy Courses
- Course List by Abakcus
- Course List by Brilliant
- Course List by Tombasche
Hacking Satellite Course
Machine Learning Course
- Machine Learning University from Amazon, Information from Amazon but these are mostly designed around Amazon products and do not teach much actual ML
- DEEP LEARNING Course Yann LeCun & Alfredo Canziani MATERIAL Google Drive, Notebooks NYU Site
- AI Course: Elements of AI
- Complete ML Coursework
Programming
Deep Learning Course
- DeepCourse
- Deep Learning by deeplearning.ai - Coursera
- Yann LeCun’s Deep Learning Course at CDS–NYU Center for Data Science
- briandalessandro/DataScienceCourse: This holds iPython notebooks and lecture slides for the Intro to Data Science Master’s course I teach at NYU.
Time Series Course
Machine Learning Course
- DeepCourse License: Apache
Linear Algebra
Electrical Circuit Course Notes
Machine Learning Course Notes
- ML 2021 Spring ML 2020 Spring Introduction—Learning Machine rentruewang/learning-machine: A handbook for machine learning License: GPL
- Best tutorials, courses, and blog posts
- Learn - AI Campus
Course
Course
Machine Learning
Course
- Full stack open 2021 License: CC-BY-NC-SA
HTML Learn
- Don’t Panic–It’s Only HTML (Crash Course For Beginners) - YouTube
- HTML Crash Course For Absolute Beginners - YouTube
- CSS Crash Course For Absolute Beginners - YouTube
Course
- MIT Open Learning Library - Open Learning License: CC-BY-NC-SA
Course
Programming Course
Machine Learning Course
- Teaching - CS 229
- SEE Standford CS 229 CC-BY-NC
- aidysft
- Designing, Visualizing and Understanding Deep Neural Networks : Public Domain
- Deep learning courses at UC Berkeley - berkeley-deep-learning.github.io
- CS 189/289A: Introduction to Machine Learning
Machine Learning
- Google’s ML crash course
- TensorFlow tutorials
- PyTorch tutorials
- FastAI course: Practical Deep Learning for Coders
- Full Stack Deep Learning course
- A Software Engineer’s trek into Machine Learning - Towards Data Science
Machine Learning Course
Machine Learning Course
- microsoft/ML-For-Beginners: 12 weeks, 24 lessons, classic Machine Learning for all License: MIT
- Machine Learning Crash Course - Google Developers
- Machine Learning University
Open Course
- The Missing Semester of Your CS Education · the missing semester of your cs education License: CC-NC
NLP Course
Course
Course
- Lesson Directory - Programming Historian Digital Tools for Humanity, License: CC-BY
Course
Distributed Systems
Course
- lijqhs/deeplearning-notes: Notes for Deep Learning Specialization Courses led by Andrew Ng. License: MIT
- https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html Lecture Slides from the 2012 Stanford Coursera course
- ISLR Textbook Slides, Videos and Resources Introduction to Statistical Learning: With Applications in R Lecture Slides and Videos
- Entire Computer Science Curriculum in 1000 YouTube Videos - Laconicml
Fast.ai Online Course
- Practical Deep Learning for Coders
- Part 2: Deep Learning from the Foundations
- Practical Data Ethics
- Computational Linear Algebra
- Code-First Introduction to Natural Language Processing
Course
Computer Science
Full Stack Deep Learning
Course Self Taught
- Teach Yourself Computer Science
- ossu/data-science: Path to a free self-taught education in Data Science!
- ossu/computer-science: Path to a free self-taught education in Computer Science!
Course
Introduction to Reinforcement Learning with David Silver (deepmind.com)
14 - Machine Learning by Andrew Ng Resources
Machine Learning by Andrew Ng Resources
Main Course
- Coursera : Machine Learning by Andrew Ng
- Youtube Playlists
- Video lectures Index https://class.coursera.org/ml/lecture/preview
- Programming Exercise Tutorials https://www.coursera.org/learn/machine-learning/discussions/all/threads/m0ZdvjSrEeWddiIAC9pDDA
- Programming Exercise Test Cases https://www.coursera.org/learn/machine-learning/discussions/all/threads/0SxufTSrEeWPACIACw4G5w
- Useful Resources https://www.coursera.org/learn/machine-learning/resources/NrY2G
More Machine Learning Courses
Suplementary Notes
- Holehouse Notes : review by holehouse
- Kaggle Notes
- Vkosuri Notes : ppt, pdf, course, errata notes, Github Repo
- Danlu Zhang : review by Danlu Zhang
- CSEAV
- Stanford : quiz discussion
Suplementary Codes
- Fengdu78 : ppt, code in python (ipynb)
- dibgerge : assignment code in python (ipynb)
- Kaleko : assignment code in python (ipynb)
- nsoojin : code in python
- lucasshenv : code in python (ipynb) using Tensorflow
- AvaisP : assignment code in Octave
- Benlau93 : assignment code in Python
- worldveil: code, pdf
- dibgerge/ml-coursera-python-assignments: Python assignments for the machine learning class by andrew ng on coursera with complete submission for grading capability and re-written instructions.
Week 1:
- Welcome - pdf - ppt
- Linear regression with one variable - pdf - ppt
- Linear Algebra review (Optional) - pdf - ppt
- Lecture Notes
- Errata
- Week 1 by danluzhang
- 01 and 02: Introduction, Regression Analysis and Gradient Descent by Holehouse
- 03: Linear Algebra - review by Holehouse
- adit.io: Linear Regression
Week 2:
- Linear regression with multiple variables - pdf - ppt
- Octave tutorial pdf
- Programming Exercise 1: Linear Regression - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 2 by danluzhang
- 04: Linear Regression with Multiple Variables by Holehouse
- 05: Octave by Holehouse
Week 3:
- Logistic regression - pdf - ppt
- Regularization - pdf - ppt
- Programming Exercise 2: Logistic Regression - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- adit.io: Logistic Regression
- Week 3 by danluzhang
- 06: Logistic Regression by Holehouse
- 07: Regularization by Holehouse
Week 4:
- Neural Networks: Representation - pdf - ppt
- Programming Exercise 3: Multi-class Classification and Neural Networks - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 4 by danluzhang
- 08: Neural Networks - Representation by Holehouse
Week 5:
- Neural Networks: Learning - pdf - ppt
- Programming Exercise 4: Neural Networks Learning - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 5 by danluzhang
- 09: Neural Networks - Learning by Holehouse
Week 6:
- Advice for applying machine learning - pdf - ppt
- Machine learning system design - pdf - ppt
- Programming Exercise 5: Regularized Linear Regression and Bias v.s. Variance - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 6 by danluzhang
- 10: Advice for applying machine learning techniques by Holehouse
- 11: Machine Learning System Design by Holehouse
Week 7:
- Support vector machines - pdf - ppt
- Programming Exercise 6: Support Vector Machines - pdf - Problem - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 7 by danluzhang
- 12: Support Vector Machines by Holehouse
Week 8:
- Clustering - pdf - ppt
- Dimensionality reduction - pdf - ppt
- Programming Exercise 7: K-means Clustering and Principal Component Analysis - pdf - Problems - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 8 by danluzhang
- 13: Clustering by Holehouse
- 14: Dimensionality Reduction by Holehouse
Week 9:
- Anomaly Detection - pdf - ppt
- Recommender Systems - pdf - ppt
- Programming Exercise 8: Anomaly Detection and Recommender Systems - pdf - Problems - Solution
- Lecture Notes
- Errata
- Program Exercise Notes
- Week 9 by danluzhang
- 15: Anomaly Detection by Holehouse
- 16: Recommender Systems by Holehouse
Week 10:
- Large scale machine learning - pdf - ppt
- Lecture Notes
- Week 10 by danluzhang
- 17: Large Scale Machine Learning by Holehouse
Week 11:
- Application example: Photo OCR - pdf - ppt
- Week 11 by danluzhang
- 18: Application Example - Photo OCR by Holehouse
- 19: Course Summary by Holehouse
Extra Information
- Linear Algebra Review and Reference Zico Kolter
- CS229 Lecture notes
- CS229 Problems
- Financial time series forecasting with machine learning techniques
- Octave Examples
Machine Learning Online E Books
- Introduction to Machine Learning by Nils J. Nilsson free
- Introduction to Machine Learning by Alex Smola and S.V.N. Vishwanathan free
- Introduction to Data Science by Jeffrey Stanton free
- Bayesian Reasoning and Machine Learning by David Barber free
- Understanding Machine Learning, © 2014 by Shai Shalev-Shwartz and Shai Ben-David free
- Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman free
- Pattern Recognition and Machine Learning, by Christopher M. Bishop free, used
- Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch
Jason Brownlee, proprietary, used - Course in Machine Learning free, used
Machine Learning Tutorial
- Trekhleb Machine Learning with Octave, free, used
- Trekhleb Machine Learning with Python, free, used
- Trekhleb Deep Learning with Python, free, used
- Tutorials Point: Machine Learning with Python, used
- ML Cheatsheet free, used
Machine Learning Youtube
16 - Digital Signal Processing
Digital Signal Processing
Signal Processing Jupyter Notebooks
- Sound Analysis with the Fourier Transform. A set of IPython Notebooks by Caleb Madrigal to explain what the Fourier Transform is and how to use it for basic audio processing applications.
- An introduction to Compressed Sensing, part of Python for Signal Processing: an entire book (and blog) on the subject by Jose Unpingco.
- Kalman and Bayesian Filters in Python. A textbook and accompanying filtering library on the topic of Kalman filtering and other related Bayesian filtering techniques.
- Classify human movements using Dynamic Time Warping & K Nearest Neighbors: Signals from a smart phone gyroscope and accelerometer are used to classify if the person is running, walking, sitting standing etc. This IPython notebook contains a python implementation of DTW and KNN algorithms along with explanations and a practical application.
- Digital Signal Processing A collection of notebooks that accompanies a masters course on the topic.
- An introduction to openCV An introduction course into using openCV for computer vision in python
- Signal: Filtering, STFT, and Laplace Transform Filtering signal with a butterworth low-pass filter and plotting the STFT of it with a Hanning window and then plotting the Laplace transform.
Tools
- noise.sh Music as Excel
- AudioMass - Audio Editor
- dsp.audio code editor
- Audio DSP Playground
- Harmonics
Filter Design Tools
- Filter Design Tool web based
- RF Tools - LC Filter Design Tool web based
- Filter Design and Analysis web based
- TFilter - Free online FIR filter design web based
- FIIIR! web based
- FIR Filter Designer web based
- List of FIR Filter tools
Tutorial
- DSPRelated.com - All About Digital Signal Processing
- Kalman and Bayesian Filter in Python License: CC-BY
- Digital Signal Processing Lecture License: CC-BY
- A Compact Primer on Digital Signal Processing Web License: Eclipse Public License
- Voice recording and processing for talks, streaming and conferencing. The Reference.
- How to Record Great Sounding / High Quality Audio at Home—Nick Janetakis
- Filter playground - Boris Smus
- Micromodeler - Launch Applications
- Music Information Retrieval in Python
- Fundamental of Music Processing - Slides
- MUMT 307 Week #1
- GNU Octave: Audio Processing
- Digital Sound & Music–Linking Science, Art, and Practice Through Digital Sound
- Practical FIR Filter Design: Part 1 - Design with Octave or Matlab - Technical Articles
- DSP Course JF Engin 100-300
- gnebbia/OctaveMultimediaProcessing: Octave Multimedia content processing examples
- E4896 Music Signal Processing - outline
- DSP First
- Interactive DSP Laboratory
- EE445S Real-Time DSP Laboratory - Lectures and Labs
- willfehlmusic/Python_Sketchpads: Tutorial Python projects covering a number of topics. These are tutorials to make, certain concepts of interest, absolutely clear to the user.
- Virtual Labs
- Lab 0 - Introduction to Module Set - Purdue Digital Signal Processing Labs (ECE 438) - OpenStax CNX
- Making sounds using SDL and visualizing them on a simulated oscilloscope. - NICK TASIOS
Audio Programming
- Pure Data—Pd Community Site Pure Data (or just Pd) is an open source visual programming language for multimedia.
- elk.audio Audio Operating Systems
- VCV Rack - The Eurorack Simulator for Windows/Mac/Linux
- Sassy by sol_hsa Sassy is an audio spreadsheet. Or, as it stands, it’s THE audio spreadsheet.
- JUCE - JUCE The leading framework for multi-platform audio applications
- Tone.js
DSP Notes
- Introduction to Filters: FIR versus IIR
- Highres spectrograms with the DFT Shift Theorem - GLSL & Sound
- Difference between IIR and FIR filters: a practical design guide - ASN Home
- A Narrow Bandpass Filter in Octave or Matlab - Paul Lovell An Efficient Lowpass Filter in Octave - Paul Lovell
- Signal Analysis I: What is a Wave? An Introduction to Fouriers Theorem Digital Filter Design: Why is Linear Phase Important?
- Étude in C minor
- Digital Audio Basics: Audio Sample Rate and Bit Depth
- WASM SYNTH, or, how music taught me the beauty of math
DSP Tools
- olilarkin/awesome-musicdsp: A curated list of my favourite music DSP and audio programming resources
- Fragment - Real-time audiovisual live coding environment
- CCWT
DSP Books
- DSP Illustration
- The Scientist and Engineer’s Guide to Digital Signal Processing by Steven W. Smith, Ph.D.
- SP4Comm: Signal Processing for Communication
- Free DSP Books
- Wireless Communications: Signal Processing Perspectives-Poor and Wornell
- Think DSP License: CC-BY-NC
- SPECTRAL AUDIO SIGNAL PROCESSING
- INTRODUCTION TO DIGITAL FILTERS
- Preface for Digital Signal Processing: A User’s Guide - DSPA - OpenStax CNX
- Preface for Digital Signal Processing: A User’s Guide - Introduction to DSP - OpenStax CNX
- Book Series Overview
- Digital Filter Design
- Circles Sines and Signals - Introduction License : Eclipse Public
DSP Lectures
- Digital Signal Processing Lecture License: CC-BY
DSP Interactive
- Fourier Transform
- Premier on Digital Signal Processing, Github, License: Eclipse Public License
Software Defined Radio
- PySDR: A Guide to SDR and DSP using Python by Dr. Marc Lichtman GitHub License: -
- Software-Defined Radio for Engineers Material Supports GitHub
Music Retrieval Course
- Music Information Retrieval License: MIT
Speech Recognition
- Libre ASR: An On-Premises, Streaming Speech Recognition System
Signal Processing Notes
- Exploring Sound : Why does an A note on a piano sound different from an A note on a violin?
- Everything you need to know about surround sound in headphones - SoundGuys
- HeSuVi download - SourceForge.net
- Headphone 7.1 Surround Comparison (GSX vs SBX vs Atmos vs CMSS vs DH vs DTSH:X vs Sonic vs HRTF) - YouTube
Signal Processing
- Free Online Audio Tests, Test Tones and Tone Generators
- Spectro
- img-encode - Convert an image to sound spectrum (image to sound)
Free Books on Signal Processing
DSP: THEORY
- The Scientist and Engineer’s Guide to Digital Signal Processing- Steven W. Smith
- Introduction to Signal Processing -Sophocles J. Orfanidis
- Astronomical Image and Data Analysis -JL Starck and F Murtagh
- The theory of linear prediction- Vaidyanathan, P. P.
- Introduction to Statistical Signal Processing - R.M. Gray
- Mixed Signal and DSP Design Techniques - edited by Walt Kester
- Modern Signal Processing - Edited by Edited by Daniel N. Rockmore and Dennis M. Healy
- Advances in Signal Transforms: Theory and Applications - Edited by: J. Astola, and L. Yaroslavsky
- Advances in Nonlinear Signal and Image Processing -Edited by: Stephen Marshall and Giovanni L. Sicuranza
- The Data Conversion Handbook - Walt Kester
- Mathematics Of The Discrete Fourier Transform (DFT) - Julius O. Smith III
- Principles of Sigma-Delta Modulation for A/D Converters - Sangil Park
- Using the ADSP-2100 Family Vol. 1 & Vol. 2 -Analog Devices Inc.
- A Technical Tutorial on Digital Signal Synthesis-Analog Devices Inc.
DSP: COMMUNICATIONS
- Signal Processing for Communications -Paolo Prandoni and Martin Vetterli
- Signals, Samples and Stuff: A DSP Tutorial: Part 1, Part 2, Part 3, Part 4 - Doug Smith
- FAQs on Digital Signal Processing-
- Wireless Communications: Signal Processing Perspectives-Poor and Wornell
- Signal Processing with Fractals: A Wavelet-Based Approach-G. W. Wornell
- Wireless Communications: Signal Processing Perspectives-Poor and Wornell
- Stochastic Processes, Detection and Estimation-A. S. Willsky and G. W. Wornell
DSP: IMAGE PROCESSING
- Fundamentals of Image Processing - Young, Gerbrands and Vliet
- Advances in Nonlinear Signal and Image Processing -Edited by: Stephen Marshall and Giovanni L. Sicuranza
- Image Processing and Data Analysis: The Multiscale Approach -JL Starck, F Murtagh and A Bijaoui
- Principles of Computerized Tomographic Imaging - Kak and Slaney
- IMAGE ESTIMATION BY EXAMPLE: Geophysical Soundings Image Construction - Jon Claerbout and Sergey Fomel
- BASIC EARTH IMAGING- Jon Claerbout
- EARTH SOUNDINGS ANALYSIS: Processing versus Inversion - Jon Claerbout
- IMAGING THE EARTH’S INTERIOR- Jon Claerbout
- FUNDAMENTALS OF GEOPHYSICAL DATA PROCESSING - Jon Claerbout
- Genetic and Evolutionary Computation for Image Processing and Analysis -Stefano Cagnoni, Evelyne Lutton, and Gustavo Olague
- Advances in Nonlinear Signal and Image Processing -Edited by: Stephen Marshall and Giovanni L. Sicuranza
- Image Processing in C: Analyzing and Enhancing Digital ImagesDwayne Phillips
DSP: AUDIO
- Introduction to Sound Processing -Davide Rocchesso
- Introduction To Digital Filters, With Audio Applications -Julius Smith
- Mathematics of the Discrete Fourier Transform (DFT), With Audio Applications -Julius Smith
- Physical Audio Signal Processing For Virtual Musical Instruments and Audio Effects -Julius Smith
- High-Fidelity Multichannel Audio Coding - Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo
- Physical Audio Signal Processing-Julius O. Smith III
- Spectral Audio Signal Processing -Julius O. Smith III
DSP: SPECTRAL ANALYSIS
- Bayesian Spectrum Analysis and Parameter Estimation -G. Larry Bretthorst
- Chebyshev and Fourier Spectral Methods - John Boyd
- The Temporal and Spectral Characteristics of Ultrawideband Signals -William Kissick
DSP: MISCELLANEOUS TOPICS
- Biomedical Digital Signal Processing -Willis J. Tompkins
- Stochastic Optimal Control: The Discrete-Time Case -Bertsekas
- Signal Processing with Fractals: A Wavelet-Based Approach - Gregory Wornell
- Nonlinear Systems Theory: The Volterra/Wiener Approach -Wilson Rugh
- Detection of Abrupt Changes - Theory and Application -Basseville and Nikiforov
- An Introduction to Signal Processing in Chemical Analysiy - T. OHaver
- Multimedia Fingerprinting Forensics for Traitor Tracing -K. J. Ray Liu, Wade Trappe, Z. Jane Wang, Min Wu, and Hong Zhao
- Genomic Signal Processing and Statistics -Edited by:Dougherty, Shmulevich, Chen, and Wang
DSP: IMPLEMENTATION
- Computer Aids for VLSI Design -Steven Rubin
- Application-Specific Integrated Circuits - Michael Smith
- The VHDL Cookbook -Peter Ashenden
- Controlling Noise and Radiation in Mixed-Signal and Digital Systems - Nicholas Gray
Free Books on Signal Processing II
- Introduction to Digital Signal Processing - Paolo Prandoni
- Efficient Digital Fiilters -Matthew Donadio
- Discrete-Time Signal Processing - MIT
- Modern Signal Processing- Edited by Daniel N. Rockmore and Dennis M. Healy, Jr.
- Signals and Systems - MIT
Signal Processing
17 - MK Sistem Kendali Lanjut
MK Sistem Kendali Lanjut
- Kode: TKE193154
- SKS: 3
- Jadwal 2020
- TKE193154 Sistem Kendali Lanjut A JUMAT 13:20 - 15:50 GEDUNG TEKNIK E 204 - 15 mhs
Referensi
- Norman S. Nise, Control Systems Engineering [website]
- Katsuhiko Ogata, Modern Control Engineering
- Richard C. Dorf and Robert H. Bishop, Modern Control Systems [website]
- Farid Golnaraghi and Benjamin C. Kuo, Automatic Control Systems [website]
- Brian Douglas, The Fundamentals of Control Theory [website][ebook]
- Pao C. Chau, Process Control: A First Course With MATLAB [website]
- Karl J. Åström and Richard M. Murray, Feedback Systems: An Introduction for Scientists and Engineers [website]
- R.V. Dukkipati, Analysis and Design of Control Systems using MATLAB
- Ricone Website
Software
Online Course
Online Video Course
Kuliah
Pekan-1
- Pendahuluan
- Steady State Error
- Video Pendukung
- Final Value Theorem and Steady State Error Brian Douglas
- Recap of Steady-State Error The Ryder Project
- Steady-State Error #1, using Error Constants The Ryder Project
- Steady-State Error #1, using Final Value Theorem The Ryder Project
- Steady-State Error #2, using Error Constants The Ryder Project
- Finding Requirements for SSE The Ryder Project
Pekan-2
- Analisis Kestabilan Routh Hurwitz
- Video Pendukung
- Introduction to System Stability and Control Brian Douglas
- Stability of Closed Loop Control Systems Brian Douglas
- Routh-Hurwitz Criterion, An Introduction Brian Douglas
- Routh-Hurwitz Criterion, Special Cases Brian Douglas
- Routh-Hurwitz Criterion, Beyond Stability Brian Douglas
- Recap of Stability The Ryder Project
- Stability Example #1 The Ryder Project
- Stability Example #2 The Ryder Project
- Stability Example #3 The Ryder Project
- Octave
equ=[1 2 3] %characteristic equation polynomial
roots(equ)
Pekan-3
- Root Locus (Tempat Kedudukan Akar)
- Video Pendukung:
- Plot root locus di Octave atau Matlab
pkg load control
num=[1] %numerator
den=[1 2 3] %denumerator
sys=tf(num,den) %transfer function
rlocus(sys)
Pekan-4
- Desain Sistem Kendali dengan Root Locus
- Video Pendukung:
Tugas
- Persiapan
- Silakan presensi dulu di Eldiru pada tanggal 26 Desember
- Akses situs Interactive Course for Control Theory
- Buat akun ICCT, cek email untuk mendapatkan username dan password
- Login ke Interactive Course for Control Theory
- Untuk mempermudah silakan akses video berikut
- Latihan Jupyter Notebook di ICCT
- Anda akan berinteraksi dengan Jupyter Notebook di ICCT
- Klik folder ICCT pada Jupyter Notebook, lalu klik
Table-of-Contents-ICCT.ipynb
- Klik kanan, open di new tab file Link
1.1.1 Complex Numbers in Cartesian Form
di folder1.1 Complex Numbers
- Anda berada di Jupyter Notebook
M-01_Complex_numbers_Cartesian_form.ipynb
- Pilih menu lalu
- Silakan baca Notebook-nya, baca penjelasan atau penugasaannya.
- Lalu anda ubah nilai bilangan kompleksnya, tekan atau
- Lalu anda variasikan operasinya seperti , dll.
- Anda bisa unduh atau screenshoot citranya.
- Pilih menu lalu untuk mematikan Jupyter Notebook.
- Tugas (dengan waktu 2 pekan)
- Sesuai dengan distribusi (terlampir di Eldiru), lakukan hal sebagai berikut:
- Jalankan berkas Jupyter Notebook sebagaimana yang didistribusikan kepada anda.
- Untuk setiap berkas Jupyter Notebook buat laporan mini dalam berkas
.docx
atau.odt
yang terdiri dari:- Judul, disertai penjelasan (dalam terjemah bahasa Indonesia) dari berkas Jupyter Notebook. (Kode Python pada Jupyter Notebook tak perlu disertakan.)
- Pembahasan. Pembahasan ringkas dari aktivitas yang anda lakukan, jika perlu lengkapi unduhan gambar (screenshot).
- Simpan setiap berkas dalam nama
NIM-TugasXXX.docx
misalnyaH1A018091-Tugas385.odt
. Gabungkan ketiga berkas penugasan dalam file.zip
lalu unggah ke laman Assignment di Eldiru.
Istilah Sistem Kendali
- Bandwidth and 3dB. The bandwidth of a band pass filter is the frequency range that is allowed to pass through with minimal attenuation. The frequency at which the power level of the signal decreases by 3 dB from its maximum value is called the 3 dB bandwidth. A 3 dB decrease in power means the signal power becomes half of its maximum value. This occurs when the output voltage has dropped to $1/{\sqrt{2}}$ (~0.707) of the maximum output voltage and the power has dropped by half (since $P=V^2/R$. Exact: $20\log _{10}\left({\tfrac {1}{\sqrt {2}}}\right)\approx -3.0103\ \mathrm {dB}$
- Half-power point - Wikipedia
18 - Machine Learning CS299
Machine Learning CS299
- Brandon McKinzie for CS299 etc.
- PythonAndr for CS299 by Andrew Ng
Deep Learning Specialization by Andrew Ng
Machine Learning Course
- Machine Learning at CUNI NZ
- Elements of AI
- TensorFlow, Keras and deep learning, without a PhD, Github, License: Apache
- CS231n Stanford, CS231n Github, CS231n GIthub Source License: MIT
- Schedule - EECS 498-007 / 598-005: Deep Learning for Computer Vision
- AI-Sys Sp19 ucbrise/cs294-ai-sys-sp19: CS294; AI For Systems and Systems For AI
- CS182/282A Designing, Visualizing and Understanding Deep Neural Networks Spring 2020: Designing, Visualizing and Understanding Deep Neural Networks (Spring 2020)
- MIT Deep Learning 6.S191
Practical Deep Learning
- rajatkb/Practical-Deep-Learning
- sjchoi86/dl_tutorials: Deep Learning Presentation and Tutorial License: MIT
Visualizing Backpropagation
Machine Learning Course
19 - MK Dasar Teknik Elektro
MK Dasar Teknik Elektro
- Kode: TKE191113
- SKS: 2
- Jadwal
- TKE191121 Dasar Teknik Elektro B RABU 10:20 - 12:00 GEDUNG TEKNIK E 104 - 46 mhs
- TKE191121 Dasar Teknik Elektro A RABU 12:30 - 14:10 GEDUNG TEKNIK E 101 - 44 mhs
Capaian Pembelajaran Lulusan (CPL) Program Studi
- Pengetahuan-PU03 : menguasai pengetahuan keteknikan dan ilmu komputasi untuk menganalisa dan merancang piranti listrik dan elektronik kompleks, perangkat lunak, dan sistem yang terdiri dari komponen perangkat keras dan perangkat lunak;
- Pengetahuan-PU04 : menguasai pengetahuan inti (core knowledge) bidang teknik elektro meliputi: rangkaian elektrik, sistem dan sinyal, sistem digital, elektromagnetik, dan elektronika, beserta penerapan mereka;
- Keterampilan Khusus-KK02 : mampu menerapkan pengetahuan matematika, sains dasar, dan topik keteknikan dalam bidang teknik elektro;
Capaian Pembelajaran Mata Kuliah (CPMK)
- Memahami pengetahuan matematika dan sains dasar, dan topik keteknikan dalam bidang teknik elektro;
- Memahami lingkup dasar-dasar keteknikan dan ilmu komputasi yang diperlukan untuk menganalisis dan merancang
- piranti listrik,
- piranti elektronik,
- perangkat lunak, dan
- sistem (perangkat lunak dan perangkat keras);
- Memahami lingkup pengetahuan inti (core knowledge) bidang teknik elektro meliputi: rangkaian elektrik, sistem dan sinyal, sistem digital, elektromagnetik, dan elektronika, beserta penerapan mereka;
Bahan Kajian
- Ikhtisar pengetahuan matematika dan sains dasar untuk bidang teknik elektro
- Ikhtisar pengetahuan keteknikan untuk teknik elektro
- Ikhtisar topik keteknikan (terkini) dalam bidang teknik elektro
- Ikhtisar ilmu komputasi untuk teknik elektro
- Ikhtisar metode analisis dan perancangan piranti listrik dan elektronik
- Ikhtisar metode analisis dan perancangan perangkat lunak
- Pengenalan rangkaian elektrik dan penerapannya di bidang teknik elektro
- Pengenalan sistem dan sinyal dan penerapannya di bidang teknik elektro
- Pengenalan sistem digital dan penerapannya di bidang teknik elektro
- Pengenalan elektronika dan penerapannya di bidang teknik elektro
Referensi
Referensi Bebas dan Terbuka
- Lesson of Electrical Circuit by Tony R. Kuphaldt or Lesson of Electrical Circuit in allaboutcircuits.com
- All about Circuits Worksheets
- Modular Electronics Learning Project by Tony R. Kuphaldt
- Fundamentals of Electrical Engineering I (PDF) by Don H. Johnson or Fundamentals of Electrical Engineering in OpenStax
- DOE Fundamentals Handbook Electrical Science Volume (4 volumes)
Referensi Berbayar
- Electrical Engineering: Know It All by Clive Maxfield et.al. : digunakan di kuliah
- Electrical and Electronic Principles and Technology by John Bird
- Principles and Applications of Electrical Engineering by Giorgio Rizzoni
- Fundamentals of Electrical Engineering by Giorgio Rizzoni
- Comprehensive Dictionary of Electrical Engineering by Philip A. Laplante
- Fundamental Electrical and Electronic Principles by Christopher R. Robertson
- Electrical Engineering Principles and Applications by Allan R. Hambley
- The Electrical Engineering Handbook by Richard C. Dorf
- The Electrical Engineering Handbook by Wai Kai Chen
- The Resource Handbook of Electronics by Jerry C. Whitaker
- Practical Electrical Engineering by Sergey N. Makarov
Referensi Kuliah Online
- EdX Electrical Engineering Online Course
- Khan Academy on Electrical Engineering
- Electrical Engineering Playlist by Reach
- Electrical Engineering Playlist by Zach Star
Kuliah
Pekan-1
- Topik
- Pendahuluan
- Ikhtisar Ilmu Teknik Elektro
- Video Pendukung
- The Story of Electricity - BBC Documentary
- Is Electrical Engineering a good career? - REACH
- How hard is Electrical Engineering? - REACH
- What can you do with an Electrical Engineering degree - REACH
- Map of the Electrical Engineering Curriculum - REACH
- What Is Electrical Engineering? - Zach Star
- What Can You Really Do As An Electrical Engineer? - Zach Star
Pekan-2
- Topik:
- Ikhtisar Dasar-dasar Keteknikan untuk Teknik Elektro
- Tugas:
- Terjemah: Electrical Engineering: Know It All by Clive Maxfield et.al.
Pekan-3
- Topik:
- Pengenalan Sinyal dan Sistem
20 - Kuliah
Kuliah
202020212
- TKE192221 Sistem Kendali A [FAR; IMR; ],[2019]; C 201 Selasa 07.55, 2
- TKE192221 Sistem Kendali B [FAR; IMR; ],[2019]; C 201 Selasa 10.40, 2
- TKE192227 Pengolahan Sinyal Digital B [AZS; IMR; ],[2019]; E 101 Rabu 07.00, 3
- TKE192227 Pengolahan Sinyal Digital A [AZS; IMR; ],[2019]; E 101 Rabu 09.45, 3
- TKE194945 Internet of Things A [AZS; IMR; ],[2018]; E 205 Kamis 13.00, 3
- TKE194941 Sistem Kendali Cerdas A [IMR; AGU; ],[2018]; E 201 Jumat 13.55, 3
202020211
- TKE194917 Sistem Adaptif A SELASA 09:30 - 12:00 GEDUNG TEKNIK E 202
- TKE191121 Dasar Teknik Elektro B RABU 10:20 - 12:00 GEDUNG TEKNIK E 104
- TKE191121 Dasar Teknik Elektro A RABU 12:30 - 14:10 GEDUNG TEKNIK E 101
- TKE194918 Machine Learning A RABU 15:00 - 17:30 GEDUNG TEKNIK E 201 - 12 mhs
- TKE191113 Matematika Teknik A KAMIS 07:00 - 09:30 GEDUNG TEKNIK C 101
- TKE191113 Matematika Teknik B KAMIS 09:30 - 12:00 GEDUNG TEKNIK C 101
- TKE194021 Proyek Keteknikan A JUMAT 07:50 - 09:30 GEDUNG TEKNIK C 103 - 1 mhs
- TKE193154 Sistem Kendali Lanjut A JUMAT 13:20 - 15:50 GEDUNG TEKNIK E 204
201920202
- TKE192227 Pengolahan Sinyal Digital
- TKE192221 Sistem Kendali
- TKE194941 Sistem Kendali Cerdas
201920201
- TKE191121 Dasar Teknik Elektro
- TKE193153 Sistem Kendali Digital
201820192
- TKE132207 Pengolahan Sinyal Digita
- TKE134103 Proyek Keteknikan
- TKE132201 Sistem Kontrol
201820191
- TKE131104 Dasar Teknik Elektro
- TKE134026 Jaringan Sensor
- TKE134103 Proyek Keteknikan
- TKE134033 Sistem Adaptif
201720182
- TKE133201 Instrumentasi
- TKE131201 Metode Transformasi
- TKE132207 Pengolahan Sinyal Digital
- TKE132201 Sistem Kontrol
201720181
- TKE131104 Dasar Teknik Elektro
- TKE134026 Jaringan Sensor
- TKE132102 Matematika Teknik
- TKE134103 Proyek Keteknikan
- TKE134033 Sistem Adaptif
21 - Linear Algebra
Linear Algebra
Software
PC
- SpeedCrunch–open source software, fast, simple
- GNU Octave–open source software
- Online Octave - online GNU Octave
- Matlab–proprietary
- Anaconda for Python Data Science Programming
Android
MOOC
- edX - Linear Algebra - Foundations to Frontiers : good interactive HW exercises, very clear instruction and time-efficient
- MIT OCW - Linear Algebra by Gilbert Strang Youtube: Gilbert Strang is good.
- Coursera - Mathematics for Machine Learning: Linear Algebra
- Algebra 1 - Khan Academy and Algebra 2 - Khan Academy
Youtube
List of Books
Proprietary Books
- Handbook of Linear Algebra by Leslie Hogben
- Introduction to Applied Linear Algebra Vectors, Matrices, and Least Squares by Stephen Boyd
- Linear Algebra and Its Applications by David C. Lay
- Elementary Linear Algebra: Application Version by Howard Anton
- Elementary Linear Algebra by Ron Larson
- Linear Algebra Ideas and Applications by Richard C. Penney
- Linear Algebra Done Right by Sheldon Axler
- A Concise Text on Advanced Linear Algebra by Yisong Yang
Free Books
- An Intuitive Overview of Linear Algebra Fundamentals
- Introduction to Linear Algebra by Thomas L. Scofield - PDF
- Algebra by Paul Dawkins PDF
- Linear Algebra Abridged by Sheldon Axler
- Intuitive Overview of Linear Algebra Fundamentals
- Linear Algebra A Course for Physicists and Engineers by Arak M. Mathai License: CC-BY-NC-ND
- Linear Algebra License: CC-BY-NC
- Math 1410 Elementary Linear Algebra by Sean Fitzpatrick License: CC-BY-NC
- Lecture Notes for Math 3410, with Computational Examples by Sean Fitzpatrick License: CC-BY-NC
- Linear Algebra with Application by Keith Nicholson PDF License: CC-BY-NC
- Immersive Linear Algebra
- Introduction to Applied Linear Algebra–Vectors, Matrices, and Least Squares
Open Books
- Linear Algebra, Theory And Applications by Kenneth Kuttler PDF License: CC-BY
- Linear Algebra by Jim Hefferon License : GFDL and CC-SA
- Interactive Linear Algebras by Dan Margalit License: GPL/GFDL Pretext Book
- Discover Linear Algebra by Jeremy Sylvestre License: GFDL Pretext Book
- A First Course in Linear Algebra by Robert A. Beezer PDF or its public beta version of A First Course in Linear Algebra License: GFDL
- Understanding Linear Algebra by David Austin License: CC-BY Pretext Book
- Linear Algebra, Theory And Applications by Kenneth Kuttler PDF License: CC-BY
- Linear Algebra for ML License: MIT
- Open Resources for Community College Algebra (ORCCA) or the book License: CC-BY Pretext Book
- MATH 1220 Linear Algebra 1 by Michael Doob License: CC (?) Pretext Book
- Elements of Linear and Multilinear Algebra by John M. Erdman PDF License: CC-BY
Book Recommendation
- Amazon - Linear Algebra with Applications: Williams, Gareth: 9781284120097: Books
- Matrix Analysis & Applied Linear Algebra
- Linear Algebra for Everyone, Gilbert Strang
- Introduction to Linear Algebra (Gilbert Strang): Strang, Gilbert: 9780980232776: Amazon.com: Books
- Linear Algebra: A Modern Introduction: Poole, David: 8601421990653: Books - Amazon
- Linear Algebra and Its Applications: Lay, David, Lay, Steven, McDonald, Judi: 9780321982384: Amazon.com: Books
- Numerical Linear Algebra: Lloyd N. Trefethen, David Bau III: 8581000033141: Books: Amazon.com
- Linear Algebra: Step by Step: Singh, Kuldeep: 8601300149776: Books: Amazon.com
- No Bullshit Guide to Linear Algebra
- Linear Algebra - Mathematics - MIT OpenCourseWare
- Free Linear Algebra textbook, from Jim Hefferon CC-BY
- Linear Algebra Done Wrong CC-BY
- Introduction to Applied Linear Algebra–Vectors, Matrices, and Least Squares
- Practical Linear Algebra: A Geometry Toolbox , Fourth Edition
- Linear Algebra - Mathematics - MIT OpenCourseWare
- Linear and Geometric Algebra
- Contents - 3D Math Primer for Graphics and Game Development
Video Recommendation
- 3Blue1Brown: Linear Alg - YouTube
- Essence of linear algebra - YouTube
- Part 1 Linear Algebra: An In-Depth Introduction with a Focus on Applications - YouTube
Table of Contents some of the Open Books
Interactive Linear Algebras by Dan Margalit
- Systems of Linear Equations: Algebra (pp 1-27)
- Systems of Linear Equations: Geometry (pp 29-112)
- Linear Transformations and Matrix Algebra (pp 113-185)
- Determinants (pp 187-235)
- Eigenvalues and Eigenvectors (pp 237-337)
- Orthogonality (pp 339-407)
Discover Linear Algebra by Jeremy Sylvestre
- Systems of Equations and Matrices (pp 7-169)
- Systems of linear equations
- Solving systems using matrices
- Using systems of equations
- Matrices and matrix operations
- Matrix inverses
- Elementary matrices
- Special forms of square matrices
- Determinants
- Determinants versus row operations
- Determinants, the adjoint, and inverses
- Vector Spaces (pp 170-374)
- Introduction to vectors
- Geometry of vectors
- Orthogonal vectors
- Geometry of linear systems
- Abstract vector spaces
- Subspaces
- Linear independence
- Basis and Coordinates
- Dimension
- Column, row, and null spaces
- Introduction to Matrix Forms (pp 375-413)
- Eigenvalues and eigenvectors
- Diagonalization
Linear Algebra by Jim Hefferon
- Linear Systems
- Solving Linear Systems
- Linear Geometry
- Reduced Echelon Form
- Vector Spaces
- Definition of Vector Space
- Linear Independence
- Basis and Dimension
- Maps Between Spaces
- Isomorphisms
- Homomorphisms
- Computing Linear Maps
- Matrix Operations
- Change of Basis
- Projection
- Determinants
- Definition
- Geometry of Determinants
- Laplace’s Formula
- Similarity
- Complex Vector Spaces
- Similarity
- Nilpotence
- Jordan Form
A First Course in Linear Algebra by Robert A. Beezer
- Systems of Linear Equations
- Vectors
- Matrices
- Vector Spaces
- Determinants
- Eigenvalues
- Linear Transformations
- Representations
Linear Algebra, Theory And Applications by Kenneth Kuttler
- Preliminaries
- Matrices and Linear Transformations
- Determinants
- Row Operations
- Some Factorizations
- Linear Programming
- Spectral Theory
- Vector Spaces and Fields
- Linear Transformations
- Linear Transformations Canonical Forms
- Markov Chains and Migration Processes
- Inner Product Spaces
- Self Adjoint Operators
- Norms for Finite Dimensional Vector Spaces
- Numerical Methods for Finding Eigenvalues
MATH 1220 Linear Algebra 1 by Michael Doob
- Systems of Linear Equations
- Matrix theory
- The Determinant
- Vectors in Euclidean n-space
- Eigenvalues and eigenvectors
- Linear transformations
Markov Chains
22 - MK Matematika Teknik
MK Matematika Teknik
- Kode: TKE191113
- SKS: 3
- Jadwal:
- TKE191113 Matematika Teknik A KAMIS 07:00 - 09:30 GEDUNG TEKNIK C 101 - 65 mhs
- TKE191113 Matematika Teknik B KAMIS 09:30 - 12:00 GEDUNG TEKNIK C 101 - 44 mhs
Capaian Pembelajaran Lulusan (CPL) Program Studi
- Pengetahuan-PU01 : menguasai pengetahuan matematika lanjut meliputi kalkulus integraldiferensial, persamaan diferensial, aljabar linier, variable kompleks, probabilitas dan statistik, dan matematika diskret serta penerapan mereka di bidang teknik elektro;
Capaian Pembelajaran Mata Kuliah
- Menguasai metode penyelesaian persamaan linear secara analitis dan numeris
- Menguasai operasi terhadap matriks dan penerapannya
- Menguasai konsep eigenvalue dan eigenvektor dan penerapannya
- Menguasai konsep vektor dan ruang vektor serta penerapannya
- Menguasai konsep transformasi linear dan penerapannya
Bahan Kajian
- Sistem Persamaan Linear
- Matriks dan Operasi Matriks
- Eigenvalue dan Eigenvektor
- Dekomposisi LU
- Diagonalisasi dan Bentuk Kuadrat (tambahan)
- Ruang Vektor Euklid
- Ruang Vektor Umum
- Transformasi Linear
- Aplikasi Aljabar Linear di bidang Teknik Elektro
Referensi
Referensi Berbayar
- Elementary Linear Algebra: Application Version by Howard Anton
- Linear Algebra and Its Applications by David C. Lay
Referensi Bebas Terbuka
- Interactive Linear Algebras by Dan Margalit
- Discover Linear Algebra by Jeremy Sylvestre
- Linear Algebra by Jim Hefferon
- Linear Algebra, Theory And Applications by Kenneth Kuttler
- A First Course in Linear Algebra by Robert A. Beezer or its public beta version of A First Course in Linear Algebra
- MATH 1220 Linear Algebra 1 by Michael Doob
- A First Course in LInear Algebra by Robert A. Beezer
- Systems of equations by David Austin
Kuliah Online
- edX - Linear Algebra - Foundations to Frontiers
- MIT OCW - Linear Algebra by Gilbert Strang Youtube.
- Algebra 1 - Khan Academy and Algebra 2 - Khan Academy
Youtube
Software
PC
- SpeedCrunch–open source software, fast, simple
- GNU Octave–open source software
- Online Octave - online GNU Octave
- Matlab–proprietary
- Anaconda for Python Data Science Programming
Android
Kuliah
Pekan-1
- Topik:
- Pendahuluan
- Sistem Persamaan Linear
- Buku Pendukung:
Pekan-2
- Topik:
- Matriks : Operasi
Pekan-3
23 - Control Design with Frequency Method
Control Design with Frequency Method
Perbandingan Metode Root Locus (RL) dan Respon Frekuensi (RF)
- Pada desain respon transien dan stabilitas dengan pengaturan gain (gain adjustment)
- RF lebih mudah, gain dapat diperoleh dari Bode Plot
- Pada desain respon transien dengan kompensasi seri (cascade compensation)
- RF tidak se-intuitif RL
- di RL titik tertentu diketahui memiliki karakteristik respon transien tertentu
- di RF :
- phase margin terkait dengan persen overshoot
- bandwidth terkait dengan damping ration dan settling time serta peak time
- Pada desain steady-state error dengan kompensasi seri
- di RF dapat dirancang kompensasi yang memperbaiki respon transien dan steady state error secara bersamaan.
- di RL ada banyak solusi yang memungkinkan untuk membuat kompensator (yang setiap solusinya akan memunculkan isu steady state error).
Desain Respon Frekuensi
- Sistem yang open loop-nya stabil akan stabil di closed-loop jika magnitude respon frekuensi open loop memiliki gain kurang dari 0 dB pada frekuensi yang mana fase-nya adalah 180 derajat
- Persen overshoot dikurangi dengan meningkatkan phase margin
- Respon dipercepat dengan meningkatkan bandwidth
- Steady state error diperbaiki dengan meningkatkan magnitude respon pada frekuensi rendah
Perbaikan Respon Transien dengan Pengaturan Gain (Gain Adjustment)
- Damping ratio ($\zeta$) (dan persen overshoot) dan PM (phase margin)
24 - Control Systems
Control Systems
Reference
- Norman S. Nise, Control Systems Engineering [website]
- Katsuhiko Ogata, Modern Control Engineering
- Richard C. Dorf and Robert H. Bishop, Modern Control Systems [website]
- Farid Golnaraghi and Benjamin C. Kuo, Automatic Control Systems [website]
- Brian Douglas, The Fundamentals of Control Theory [website][ebook]
- Pao C. Chau, Process Control: A First Course With MATLAB [website]
- Karl J. Åström and Richard M. Murray, Feedback Systems: An Introduction for Scientists and Engineers [website]
- R.V. Dukkipati, Analysis and Design of Control Systems using MATLAB
Online Book
- CSA - Your Controls Resource
- Book: Introduction to Control Systems (Iqbal) - Engineering LibreTexts License: CC-BY-NC
- Book: Chemical Process Dynamics and Controls (Woolf) - Engineering LibreTexts License: CC-BY
- Linear Physical Systems Analysis
Interactive Learning
Specific Topics
Control Theory Map
Software
Interactive Control Systems Learning
- ICCT: Interactive course for control theory, ICCT Interactive Course in Jupyter
- Umich Control Tutorials
Online Video Course
- Brian Douglas Youtube Control System Lectures
- Steve Brunton Control System Bootcamp
- MATLAB Control System
- MATLAB Channel: Control System in Practice
- MATLAB Channel: Understanding Control System
- MATLAB Channel: Understanding PID Control
Control Learning Videos
Control Theory Interactive
- Control Systems Academy - https://www.controlsystemsacademy.com/
- CBE30338: https://jckantor.github.io/CBE30338/
- Linear Physica l Systems Analysis: https://lpsa.swarthmore.edu/
- Python in Education (Institute of Control Theory): https://tu-dresden.de/ing/elektrotechnik/rst/studium/python-in-der-lehre?set_language=en
Computational Methods for Control of Infinite-dimensional Systems - Institute for Mathematics and its Applications
Python Control
- Python Control Systems Library—Python Control Systems Library dev documentation
- mpc.pytorch: A fast and differentiable MPC solver for PyTorch
Intelligent Control
- About the Book - DATA DRIVEN SCIENCE & ENGINEERING
- dynamicslab/databook_matlab: Matlab files with demo code intended as a companion to the book “Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control” by J. Nathan Kutz and Steven L. Brunton https://www.databookuw.com/
- dylewsky/Data_Driven_Science_Python_Demos: IPython notebooks with demo code intended as a companion to the book “Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control” by J. Nathan Kutz and Steven L. Brunton
Control Systems Online Curriculum
Level 1:
-
Math basics:
-
Physics Basics:
-
General Physics: https://www.khanacademy.org/science/physics
-
More “advanced” general physics: https://www.khanacademy.org/science/ap-physics-1 and https://www.khanacademy.org/science/ap-physics-2
-
-
MATLAB Basics:
Level 2:
-
Intermediate Math:
-
Linear Algebra: https://www.khanacademy.org/math/linear-algebra
-
Differential Equations: https://www.khanacademy.org/math/differential-equations
-
-
Intermediate Physics:
-
Calculus based Mechanics at the college level: https://ocw.mit.edu/courses/physics/8-012-physics-i-classical-mechanics-fall-2008/index.htm
-
E&M: https://ocw.mit.edu/courses/physics/8-02-physics-ii-electricity-and-magnetism-spring-2007/index.htm
-
Waves and vibrations: https://ocw.mit.edu/courses/physics/8-03-physics-iii-spring-2003/index.htm
-
-
Intro to Simulink: https://ctms.engin.umich.edu/CTMS/index.php?example=Introduction§ion=SimulinkModeling
Level 3:
-
More rigorous math courses:
-
Multivariable Calculus: https://www.khanacademy.org/math/multivariable-calculus
-
Higher level linear algebra: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm
-
Higher level differential equations: https://ocw.mit.edu/courses/mathematics/18-03-differential-equations-spring-2010/
-
-
More rigorous physics:
-
Beginning Engineering:
-
Electrical:
-
Mechanical:
-
Beginning dynamics: https://ocw.mit.edu/courses/mechanical-engineering/2-003sc-engineering-dynamics-fall-2011/syllabus/
-
More Dynamics and intro to control: https://ocw.mit.edu/courses/mechanical-engineering/2-003j-dynamics-and-control-i-spring-2007/index.htm
-
-
Level 4:
-
Helpful Math:
-
Signal Processing:
-
Control:
-
Dynamics and control 2: https://ocw.mit.edu/courses/mechanical-engineering/2-004-dynamics-and-control-ii-spring-2008/index.htm
-
More systems and control: https://ocw.mit.edu/courses/mechanical-engineering/2-04a-systems-and-controls-spring-2013/index.htm
-
Feedback Control: https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-30-feedback-control-systems-fall-2010/index.htm
-
More intro control: https://www.edx.org/course/introduction-control-system-design-first-mitx-6-302-0x?utm_source=OCW&utm_medium=CHP&utm_campaign=OCW
-
More state space intro: https://www.edx.org/course/introduction-state-space-control-mitx-6-302-1x?utm_source=OCW&utm_medium=CHP&utm_campaign=OCW
-
Recommended Resources for this level in addition/ to help with the courses above, these will also help with some of the “higher” level stuff:
-
katkimshow Intro to control: https://www.youtube.com/playlist?list=PLmK1EnKxphikZ4mmCz2NccSnHZb7v1wV-
-
Brian Douglas Control System Lectures: https://www.youtube.com/playlist?list=PLUMWjy5jgHK3j74Z5Tq6Tso1fSfVWZC8L
-
Steve Brunton Control Bootcamp: https://www.youtube.com/playlist?list=PLMrJAkhIeNNR20Mz-VpzgfQs5zrYi085m
-
-
Level 5:
-
Optional Math:
-
Complex Variable: https://ocw.mit.edu/courses/mathematics/18-04-complex-variables-with-applications-fall-2003/
-
A course designed to help intuition: https://ocw.mit.edu/courses/mathematics/18-098-street-fighting-mathematics-january-iap-2008/index.htm
-
-
More rigorous practice in signals and systems:
-
Control:
-
Higher level dynamics and control: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-241j-dynamic-systems-and-control-spring-2011/index.htm
-
Higher level feedback control: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-302-feedback-systems-spring-2007/calendar/
-
Slightly higher level control: https://ocw.mit.edu/courses/mechanical-engineering/2-14-analysis-and-design-of-feedback-control-systems-spring-2014/index.htm
-
Multi-variable control systems: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-245-multivariable-control-systems-spring-2004/index.htm
-
Level 6:
-
Optional Nonlinear Dynamics:
-
Non-Linear control:
-
More theory based: https://web.mit.edu/nsl/www/videos/lectures.html
-
More practice based: https://www.youtube.com/watch?v=9xDZy5mE-3I&list=PLrxYXaxBXgRoqgaBlitaAA_sgVZ8V6Teg (note, videos in english except introduction)
- Resources for these videos: https://sites.google.com/a/g2.nctu.edu.tw/nonlinear-control-systems-2017-fall/course-materials
-
Level 7:
-
More advanced, but optional, non-linear dynamics:
-
Control:
-
Sliding mode: https://www.youtube.com/watch?v=x9WxwM6Ebvo (Note, this is the only videos or online materials I can find in a course-manner on sliding mode, please suggest more if you find them)
-
Optimal and Robust control: https://www.youtube.com/watch?v=z64cXTZKw4I&list=PLMLojHoA_QPmRiPotD_TnfdUkglTexuqm\
-
Control eBook
25 - Fundamentals of Electrical Engineering
Fundamentals of Electrical Engineering
Free and Open References
- Lesson of Electrical Circuit, All About Circuit Version License: Design Science License
- All about Circuits Worksheets, original worksheet License: CC-BY
- Modular Electronics Learning Project License: CC-BY
- Fundamentals of Electrical Engineering I (PDF) by Don H. Johnson or Fundamentals of Electrical Engineering in OpenStax License: CC-BY
Free References
- Navy Electricity and Electronics Training
- DOE Fundamentals Handbook Electrical Science Volume (4 volumes)
Proprietary References
- Electrical and Electronic Principles and Technology by John Bird
- Principles and Applications of Electrical Engineering by Giorgio Rizzoni
- Fundamentals of Electrical Engineering by Giorgio Rizzoni
- Comprehensive Dictionary of Electrical Engineering by Philip A. Laplante
- Electrical Engineering: Know It All by Clive Maxfield et.al.
- Fundamental Electrical and Electronic Principles by Christopher R. Robertson
- Electrical Engineering Principles and Applications by Allan R. Hambley
- The Electrical Engineering Handbook by Richard C. Dorf
- The Electrical Engineering Handbook by Wai Kai Chen
- The Resource Handbook of Electronics by Jerry C. Whitaker
MOOC Course
Related Videos
26 - Course
Course
Kuliah Teknik Elektro Unsoed
- [[kuliah|Kuliah]]
- [[mk-dasar-teknik-elektro|MK Dasar Teknik Elektro]]
- [[mk-internet-of-things|MK Internet of Things]]
- [[mk-machine-learning|MK Machine Learning]]
- [[mk-matematika-teknik|MK Matematika Teknik]]
- [[mk-pengolahan-sinyal-digital|MK Pengolahan Sinyal Digital]]
- [[mk-sistem-kendali|MK Sistem Kendali]]
- [[mk-sistem-kendali-cerdas|MK Sistem Kendali Cerdas]]
- [[mk-sistem-kendali-lanjut|MK Sistem Kendali Lanjut]]
Online Courses
Course Resources
27 - Computer Science
Computer Science
Computer Science
Computer Science
28 - Control Systems Resources
Control Systems Resources
Control
Control System
Control
29 - Electronic Resources
Electronic Resources
Electronics
Electronics
30 - Math Resources
Math Resources
Math
Math Puzzle
- Think of a Number. How Do Math Magicians Know What It Is? - Hacker News
- Cheryl’s Birthday - Wikipedia
- Sum and Product Puzzle - Wikipedia
- “I don’t know the numbers”: a math puzzle · Caffeinspiration