How Dropout helps regularize deep neural networks
Deep learning has resulted in deeper and wider neural network architectures, increasing the number of trainable parameters. However, models with such complicated architectures are prone to over-fitting on training data.
Regularization techniques such as L1 regularization, L2 regularization, early-stopping, and others have been developed. These solutions, however, were unable to address the issue of co-adaptation. The term co-adaptation refers to the fact that certain neurons are extremely dependent on others. If such independent neurons receive poor inputs, the dependent neurons may suffer as well, reducing the model’s performance.
WHAT IS DROPOUT
Dropout, which was proposed in the paper “Improving neural networks by preventing co-adaptation of feature detectors” published in 2012, solves this problem of co-adaptation.
Dropout is a computationally cheap and highly effective regularization strategy for reducing over-fitting and generalization error in all types of deep neural networks.
In dropout neural network units (nodes) are randomly dropped along with all of their incoming and outgoing connections, resulting in a thinned network. For N nodes 2^N thinned networks can be created.
Each node is retained with a fixed probability in dropout, which is commonly p = 0.5 for hidden nodes and p = 0.8 for input nodes.
As a result, dropout provides an effective approximate method of combining an exponentially large number of independent neural networks in a single model.
Dropout ensures that no model’s parameters are untrained or inadequately trained by sharing the weights across all networks, and sampling a different network for each training instance.
BUT WHAT HAPPENS AT TESTING TIME?
It is impossible to combine the outputs of all thinned networks during testing. Instead, the entire Neural Network is used, and each node’s output is scaled by the fraction of time it was active during training.
EXAMPLE OF DROPOUT IN KERAS
Now that the theory behind dropout is clear, implementing dropout in Keras is actually very simple. Just add a Dropout layer between any two layers and specify the fraction of the input units to drop as a float between 0 and 1.
# Dropout between fully connected layersfrom keras.layers import Densefrom keras.layers import Dropout...model.add(Dense(32))model.add(Dropout(0.5))model.add(Dense(1))...
CONCLUSION
Dropout essentially inhibits the hidden units from co-adapting by masking the hidden units with noise. This prevents a hidden unit from becoming overly reliant on other units, since they could be dropped at any point. This makes the model more robust and prevents over-fitting.