Defensive distillation

Defensive distillation is a technique aimed at improving the robustness of deep neural networks against adversarial attacks. It involves training a more robust network by transferring knowledge from a larger and more complex model known as teacher model to a smaller and simpler model known as student model.

Working of defensive distillation

Defensive distillation involves two main stages:

Training the teacher model
Training the student model

1. Training the Teacher model

Teacher model is a model which is trained on the original dataset which is used to make predictions and produce soft labels.
The first step is to train a teacher neural network on the original dataset using standard training procedures.
After training the teacher model's class probabilities contain more information than the hard labels which are used as soft targets for the next step.

2. Training the Student model

A student model is a model which can have the same or a different architecture as the teacher model.
Student model is then trained using the soft targets obtained from the teacher model.
The student model learns to generate better outputs by mimicking the teacher model's output distribution which includes the confidence levels for each class.

Advantages

Improved robustness: When the student model learns from soft outputs it understands patterns more smoothly which makes the model less sensitive to tiny changes in input which helps protect it from being fooled by adversarial attacks.
Transfer of knowledge: Defensive distillation helps pass on the learning from a large complex model to a smaller and simpler one which is useful when you need to run models on devices like phones.
Model compression: The student model is usually smaller and faster than the teacher model. It can still give good accuracy but with lower memory use and faster predictions which is great for real time applications.

Disadvantages

Extra computation: Defensive distillation involves training two models firstly the teacher and then the student which takes more time and computing power.
Needs careful tuning: There are settings like the temperature value which controls how soft or smooth the outputs are during training. To get these values right is necessary but it can take time and a lot of trial and error.
Not foolproof: Defensive distillation can help protect against adversarial attacks but it’s not a complete solution as if attackers know how the defence works they can create new attacks that can still fool the model.

Working of defensive distillation

1. Training the Teacher model

2. Training the Student model

Advantages

Disadvantages

Explore