Hello Everyone! This is my second post of my journey of completing the Deep Learning Nanodegree. Today’s main day was spent on learning how Neural Networks worked.
Neural Network are just Multi layer perceptrons.
The third type of Error is the Softmax Function. It is same as the Sigmoid function but Sigmoid Function has a drawback that it can be used only when there would be two possible outcomes but on the other hand, Softmax Function is used when we have to deal with three or more possibilities, what it does is same, it converts the amount of a certain event to happen into probabilities, which makes it easier to use them later in the models. But the problem again raises, lets suppose you opt to sum the possibilities and then divide from the total number of probabilities, taking the average. This approach is okay but doesn’t work in case of Softmax function as it cannot handle if the inputs to the function are negative. Hence, to solve this problem, we use Exponential on the values to convert any negative number into a positive one.
Now, after this, we need to compute finals probabilities, those of output layer. For this, Cross Entropy is used. What Cross Entropy is, it is the sum of all the negative logarithms of the probabilities. Why we take Logarithms, you might ask.
Logarithm of a number always returns a number between 0 and 1.
We add all the negatives of these logs and if the Result is higher, it implies that our model is not performing well. This is because, when we take negative logarithm of bigger number, it returns a higher value, hence, more error.
And after this, we move on from these topics of Errors for the time being. Lets move on.
To understand this, consider this example:
Let’s suppose you are on a mountain and your goal is to return to the ground. And you’re confused on which direction to move. And to get the optimal direction which will take you lowest, you consider each of the possible routs and then choose the best of them. And after moving once, you repeat the steps until you reach your goal, the ground in this case.
Talking about how data points behave with this, if a point is wrongly classified, the gradient will be huge and on the other side, if a point is correctly classified, the gradient is much lesser. How it works is, the Gradient Descent listens to each point and changes the model line according to the needs of the points.
The gradient is similar to what derivative is but in case of using derivatives, we can deal only two values, but using Gradients, we can handle multiple entries.
Mean Squared Error (MSE)
The next type of Error Function that is used is the Mean Squared Error (MSE). As the name implies, it is the sum of the squares of the error in the pridictions.
The next approach is to study about the different computations in a Neural Network. The list goes like:
- First, we compute the dot product of Weights and the Input Vector. Let's say you stored it in “h” variable.
- The output of the network is computed by passing “h” into the activation function, in this case, I’m using Sigmoid and saving the result into “output”.
- The “error” is the difference between the “y” and the predicted, output value.
- Now, we also need the derivative of the Activation Function to get the values by which we would change our Weights. We do this like so, output_grad = sigmoid(h) * (1- sigmoid(h))
- The next step is to determine the “Error_Step” and that is equal to the product of the output_grad and error.
- The Weights are updated as W = (Learning_Rate * Error_Step * X).
- And this whole process is repeated until we have a decent set of weights which have low errors.
So, this was about how much I could cover today. Completed 44% of the second module. There is still some learning left that I have to complete before attempting the Project of making the Neural Network from scratch. And the resource that helped me forming a solid understanding of these concepts is Deep learning, chapter 1 from 3Blue1Brown. Go through it and it surely will help you. Anyways, see you in the next post!