How Dropout helps regularize deep neural networks

Sanskar Sharma
3 min readMar 6, 2022

--

Photo by Rod Long on Unsplash

Deep learning has resulted in deeper and wider neural network architectures, increasing the number of trainable parameters. However, models with such complicated architectures are prone to over-fitting on training data.

Regularization techniques such as L1 regularization, L2 regularization, early-stopping, and others have been developed. These solutions, however, were unable to address the issue of co-adaptation. The term co-adaptation refers to the fact that certain neurons are extremely dependent on others. If such independent neurons receive poor inputs, the dependent neurons may suffer as well, reducing the model’s performance.

WHAT IS DROPOUT

Dropout, which was proposed in the paper “Improving neural networks by preventing co-adaptation of feature detectors” published in 2012, solves this problem of co-adaptation.

Dropout is a computationally cheap and highly effective regularization strategy for reducing over-fitting and generalization error in all types of deep neural networks.

In dropout neural network units (nodes) are randomly dropped along with all of their incoming and outgoing connections, resulting in a thinned network. For N nodes 2^N thinned networks can be created.

Each node is retained with a fixed probability in dropout, which is commonly p = 0.5 for hidden nodes and p = 0.8 for input nodes.

As a result, dropout provides an effective approximate method of combining an exponentially large number of independent neural networks in a single model.

Dropping out nodes along with their connections [Source]

Dropout ensures that no model’s parameters are untrained or inadequately trained by sharing the weights across all networks, and sampling a different network for each training instance.

BUT WHAT HAPPENS AT TESTING TIME?

It is impossible to combine the outputs of all thinned networks during testing. Instead, the entire Neural Network is used, and each node’s output is scaled by the fraction of time it was active during training.

Scaling the output of node during testing

EXAMPLE OF DROPOUT IN KERAS

Now that the theory behind dropout is clear, implementing dropout in Keras is actually very simple. Just add a Dropout layer between any two layers and specify the fraction of the input units to drop as a float between 0 and 1.

# Dropout between fully connected layersfrom keras.layers import Densefrom keras.layers import Dropout...model.add(Dense(32))model.add(Dropout(0.5))model.add(Dense(1))...

CONCLUSION

Dropout essentially inhibits the hidden units from co-adapting by masking the hidden units with noise. This prevents a hidden unit from becoming overly reliant on other units, since they could be dropped at any point. This makes the model more robust and prevents over-fitting.

--

--

Sanskar Sharma
Sanskar Sharma

No responses yet