In deep learning, overfitting is a common challenge where a model learns patterns that work well on training data but fails to generalize to unseen data.
One effective technique to mitigate overfitting is Dropout, which randomly deactivates a fraction of neurons during training. In TensorFlow, this is implemented using tf.keras.layers.Dropout.
Syntax of tf.keras.layers.Dropout:
tf.keras.layers.Dropout(rate, noise_shape=None, seed=None, **kwargs)
Parameters:
- rate (float, required): The fraction of input units to drop (between 0 and 1).
- noise_shape (tuple, optional): The shape of the binary dropout mask. Default is None, meaning each unit is dropped independently.
- seed (int, optional): Random seed to ensure reproducibility.
- kwargs: Other layer-specific arguments.
Applying Dropout in a Neural Network
Letβs build a simple neural network using tf.keras with Dropout applied to prevent overfitting.
The model architecture contains fully connected neural network and the dropout layer is used after hidden layers to reduce overfitting:
- layers.Dropout(0.3): Drops 30% of neurons in the first hidden layer.
- layers.Dropout(0.2): Drops 20% of neurons in the second hidden layer.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
# Generate dummy dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define the model with Dropout
model = keras.Sequential([
layers.Flatten(input_shape=(28, 28)),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3), # Drop 30% of the neurons
layers.Dense(64, activation='relu'),
layers.Dropout(0.2), # Drop 20% of the neurons
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
Output:
Epoch 1/5
1875/1875 ββββββββββββββββββββ 14s 7ms/step - accuracy: 0.8072 - loss: 0.6074 - val_accuracy: 0.9584 - val_loss: 0.1393
Epoch 2/5
1875/1875 ββββββββββββββββββββ 17s 5ms/step - accuracy: 0.9444 - loss: 0.1890 - val_accuracy: 0.9667 - val_loss: 0.1116
Epoch 3/5
1875/1875 ββββββββββββββββββββ 9s 5ms/step - accuracy: 0.9566 - loss: 0.1458 - val_accuracy: 0.9705 - val_loss: 0.0962
Epoch 4/5
1875/1875 ββββββββββββββββββββ 7s 4ms/step - accuracy: 0.9606 - loss: 0.1285 - val_accuracy: 0.9696 - val_loss: 0.0972
Epoch 5/5
1875/1875 ββββββββββββββββββββ 11s 5ms/step - accuracy: 0.9651 - loss: 0.1115 - val_accuracy: 0.9752 - val_loss: 0.0902
<keras.src.callbacks.history.History at 0x7968dc3ca790>
By using tf.keras.layers.Dropout, we can randomly deactivate neurons, forcing the network to become more robust and preventing overfitting.