Discussing Feed Forward & Back Propagation in Neural Networks

Hello Everyone! This is my third writing in my journey of completing the Deep Learning NanoDegree in under a month.
Day- 3
I didn’t really covered further today but rather todays day was for revision of the module ‘Implementing Gradient Descent’. Hence, I brushed out all my notebooks and found the pens under all the sofas and started to note each and everything, and I must say, todays day was hard!

Feed Forward
The process of feed forward is to calculate the amount of how much the weights, being the only access that can be manipulated, has to be changed. The main steps of implementing Gradient Descent are:
- Initializing the Weight Steps and we set them as the following. The ‘**0.5’ raises the function to 0.5th power, in simpler terms, it takes its square root.
np.random.normal(0, constant**0.5, size=(matrix_rows, matrix_cols))
- Then, we loop through all the records, the number of rows of Input.ₖₖₗₘₙ In each iteration, calculate the output of the layer, by passing the data into your activation function. ₖₖₗₘ ₖₖₗₘ ₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘ Then, find the error of the output.ₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙ ₖₖₗₘₙ ₖₖₗ Next, finding the error_term. Please note that the ‘output_grad’ is the derivative of the the activation function and the trick to it to multiply it by (1- itself). ₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙ ₖₖₗₘₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙₖₖₗₘₙ ₙₙₙₙₙₙₙ And after these steps, we have to update the weight_step.
output = activation_function(np.dot(weights, input))
error = target - output
error_term = error * output_grad
weight_step += error_term * input
- After all that looping and getting the ultimate weight step, we update the weights as follows. Please note that ‘n_records’ is the number of rows of Input.
weight += learning_rate * (weight_step/n_records)
And this is the implementation part of the Gradient Descent for feeding forward in the model. Feeding Forward is essentially how the neural network performs half of its cycle. Lets move on.
Back Propagation
This is the part which mostly made me feel bad for not knowing the pre-required Calculus. What back propagation is, is that after calculating the updated weight, we run the process of calculating errors in opposite manner, its essentially reversing the working of how feed forwarding works. We calculate the errors of the model with the updated weights and then change them again. And we continue these steps until we have a model that has results. The process of Back Propagation includes:
- For Back Propagation, we take the inputs to the hidden layer and the output layer as separate entities and also their outputs are considered separately. Let’s deal with the Hidden layer entities first. Please note that weights_ih is the weights initialized for being used in between input layer to hidden layer.
hidden_layer_input = np.dot(inputs, weights_ih)
hidden_layer_output = activation_function(hidden_layer_input)
- Next, lets see how to deal with the output layers. We take the output of the previous, the hidden layer, and pass it as the input to this layer and its implementation is as follows and please note that ‘weights_ho’ here is the weights initialized for being used in between the hidden layer and the output layer. We’ll talk about this initialization a bit later.
The trick is that for the ‘input’ of the next layer, the output of the previous layer is considered.
output_layer_input = np.dot(hidden_layer_output, weights_ho)
final_output = output_layer_input
- Calculating the error same as before. The new thing here is calculating the error terms. And we find these terms for the hidden layer as well as the output as:
output_error_term = error * output * (1 - output)
hidden_error_term = np.dot(output_error_term, weights_ho) * (sigmoid(hidden_layer_output) * (1- sigmoid(hidden_layer_output))
- The next thing to update the weight steps and we do them as follows and note that ‘[:, None]’ is just an expression used to display a vector as a column matrix. If we use the transpose property on a vector, it doesn’t work, this is why we need to use this.
delta_wih = learning_rate * hidden_error_term * inputs[:, None]
delta_who = learning_rate * output_error_term * hidden_layer_output
And this is the implementation of the Back Propagation step. Back Propagation completed the other half of the cycle how what the neural network does in one epoch.
Back Propagation algorithm is probably the most fundamental building block in a neural network.
Side Notes
The weights to be used in between different layers has a general formula:
weight_x_to_y = np.random.normal(0, scale=0.1, size=(x,y))
- Lets suppose our network has three input nodes, two hidden layers and one output node. Hence, when we dive deeper into how Neural Networks fundamentally work, we can’t forget the fact that nodes are linked with the next layers just to transfer some information. What that information is, that nodes are just to represent various factors to on which the model predicts the answers. And the links are just the measure of how much the respective node influences the model. For example, if a NN is of a body, and has input nodes of hair and hands, and if the prediction is of whether the body is of a male or a female. Then the hair node plays a greater role in determining as compared to the hand node, and hence, the link between the hair node and the next layer is more dense.

The lines that are bold signify that they carry more influence to the prediction of the model than the rest.
- And lastly, lets discuss the concept of how to find the input to a hidden layer. How this works is, first write the nodes and their influence lines in the form of a matrix. The influence of node ‘x1’ on the hidden unit ‘h1’ is ‘w11’ and on hidden unit ‘h2’ is ‘w12’ and so on. And the input vector is just a vector containing the input nodes, [x1, x2, x3] in this case. How we find the input of ‘h1’ is by multiplying the inputs vector by the first column of the matrix, and so on. Why this works is that you’re essentially multiplying the inputs to the weights of how much in influences that unit, and the influence is exactly the value that has to be passed in it.

And this is how much I could cover today. The main helping material of today was What is backpropagation really doing? & Backpropagation calculus from 3Blue1Brown. I’ll also link to a useful blog written on this topic and that was it for todays writing. See you in the next one!