浅析深度学习中的几种损失函数

最新推荐文章于 2026-04-01 05:07:50 发布

原创最新推荐文章于 2026-04-01 05:07:50 发布 · 551 阅读

6 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#深度学习 #人工智能 #loss函数 #分类交叉熵

代码/脚本/命令行/可执行文件同时被 2 个专栏收录

99 篇文章

订阅专栏

算法及人工智能

68 篇文章

订阅专栏

深度学习中的loss函数非常重要，每一个step，通过对一个batch样本，计算loss，从而作为训练模型时，更新模型参数的依据。本文介绍几种常见的loss函数，结合keras（以tensorflow作为backend）源码“[anaconda安装路径]\envs\[tensorflow环境]\Lib\site-packages\tensorflow\python\keras\losses.py”可以搞清楚具体各种loss函数的实现，如果光看理论公式，还是模棱两可的话，阅读代码就确切了。下面介绍几种loss函数，更多的可以参考源码的实现。

1、MSE和MAE

全称Mean Squared Error，Mean Absolute Error。MSE是假设真实值和预测值都是一个N维特征向量，这两个特征向量的各个维度之差的平方求和，再除以特征向量的维度N，得到的均值结果；MAE同理，只是换成对各个维度之差的绝对值求和。对于一个batch的样本数据求loss，等同于对每个样本求loss后，再做平均。

2、categorical crossentropy和sparse categorical crossentropy

这两个都是交叉熵，区别在于一个使用one-hot表示真实值，一个使用类别的索引表示真实值。比如1000个类别，用one-hot表示，就是有1000个维度，只有一个维度是1，其他的维度都是0。

举个例子：

>>> y_true = [[0, 1, 0], [0, 0, 1]]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> loss = tf.keras.losses.categorical_crossentropy(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> loss.numpy()
array([0.0513, 2.303], dtype=float32)

以上是分类交叉熵loss计算，其过程是：1 * log(0.95) = 0.0513, 1 * log(0.1) = 2.303。只计算1对应的维度分量带来的损失。log实际上是数学中的ln，不是log10函数。

如果是稀疏交叉熵，就是把上面的第一行换成： >>> y_true = [1, 2]

3、hinge

铰链损失，用于计算二元标签的分类带来的损失，同时考虑正负样本带来的损失。看一个例子就能明白：

"""Computes the hinge loss between `y_true` and `y_pred`.

`loss = maximum(1 - y_true * y_pred, 0)`

`y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are
provided we will convert them to -1 or 1.

Standalone usage:

>>> y_true = [[0., 1.], [0., 0.]]
>>> y_pred = [[0.6, 0.4], [0.4, 0.6]]
>>> # Using 'auto'/'sum_over_batch_size' reduction type.
>>> h = tf.keras.losses.Hinge()
>>> h(y_true, y_pred).numpy()
1.3

这里的1.3是怎么来的，第一个样本的hinge loss为(1.6+0.6)/2=1.1，第二个样本的hinge loss为(1.4+1.6)/2=1.5，这俩平均得到的。

4、cosine similarity

余弦相似度损失，当两个归一化的N维度向量的夹角为0度时，內积为1，最相似，这时真实值与预测值完全吻合，损失最小；而夹角为180度时，內积为-1，这时真实值与预测值最不相似，损失最大。所以损失表示为內积乘上-1。损失的范围也在[-1, 1]之间。

举个例子：

"""Computes the cosine similarity between labels and predictions.

Note that it is a number between -1 and 1. When it is a negative number
between -1 and 0, 0 indicates orthogonality and values closer to -1
indicate greater similarity. The values closer to 1 indicate greater
dissimilarity. This makes it usable as a loss function in a setting
where you try to maximize the proximity between predictions and
targets. If either `y_true` or `y_pred` is a zero vector, cosine
similarity will be 0 regardless of the proximity between predictions
and targets.

`loss = -sum(l2_norm(y_true) * l2_norm(y_pred))`

Standalone usage:

>>> y_true = [[0., 1.], [1., 1.], [1., 1.]]
>>> y_pred = [[1., 0.], [1., 1.], [-1., -1.]]
>>> loss = tf.keras.losses.cosine_similarity(y_true, y_pred, axis=1)
>>> loss.numpy()
array([-0., -0.999, 0.999], dtype=float32)

上面的l2_norm操作是对向量进行二范数归一化。