This is a short scrap notes from the Machine Learning Course taught by Associate Professor,  Andrew Ng,  Stanford University on Coursera. 

hypothesis = hØ(x) + Øo + Ø1X

Parameters = Øo , Ø1

Cost function

Our Goal = minimise Ø0 , Ø1 that is the cost function J(Ø0, Ø1)

GRADIENT Descent. An algorithm wich minimises the cost function , J that J(Ø0, Ø1)

Gradient descent is also a more general algorithm for general functions

gradient descent algorithm

The α, which is the learning rate in gradient descent controls how big a step we take in updating our parameters Øj

Local minimum is when our derivate term is equal to zero

As we approach local minimum , gradient descent will automatically take smaller steps. So no need to decrease α over time

In machine learning , Gradient descent is normally called “Batch Gradient Descent” meaning each step of gradient descent uses all
the training examples

 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *