RNN 循环神经网络计算过程（通俗+公式版+运行实例）

原创已于 2026-05-14 17:08:43 修改 · 581 阅读

11 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#rnn #人工智能 #nlp

于 2026-04-14 10:43:12 首次发布

llm 专栏收录该内容

5 篇文章

订阅专栏

RNN 循环神经网络计算过程（通俗+公式版）

RNN 的核心是对序列数据按时间步依次计算，每个时刻都复用同一套权重，并把上一时刻的状态传给下一时刻。

1. 符号定义

- $xt\Large x_t$ ：第 $t\Large t$ 时刻输入
- $ht−1\Large h_{t-1}$ ：上一时刻隐藏状态
- $ht\Large h_t$ ：当前时刻隐藏状态
- $Wxh\Large W_{xh}$ ：输入到隐藏层权重
- $Whh\Large W_{hh}$ ：隐藏层自循环权重
- $bh\Large b_h$ ：隐藏层偏置
- $Why\Large W_{hy}$ ：隐藏到输出权重
- $by\Large b_y$ ：输出偏置
- $tanh⁡\Large \tanh$ ：激活函数

2. 单步计算流程

对每个时间步 $t\Large t$ ：

① 计算隐藏状态

$\Large h_t = \tanh\big(W_{xh} x_t + W_{hh} h_{t-1} + b_h\big)$

初始状态： $h0=0\Large h_0 = \mathbf{0}$ （全零向量）

② 计算输出

$\Large y_t = W_{hy} h_t + b_y$
（分类任务后接 softmax，回归任务直接输出）

3. 完整前向传播（序列展开）

在这里插入图片描述

给定序列 $x1,x2,…,xT\Large x_1,x_2,\dots,x_T$ ：

1. $t=1\Large t=1$
$\Large h_1 = \tanh(W_{xh}x_1 + W_{hh}h_0 + b_h),\quad y_1 = W_{hy}h_1 + b_y$

2. $t=2\Large t=2$
$\Large h_2 = \tanh(W_{xh}x_2 + W_{hh}h_1 + b_h),\quad y_2 = W_{hy}h_2 + b_y$
3. 一直重复到 $t=T\Large t=T$
$hT=tanh⁡(WxhxT+WhhhT−1+bh),yT=WhyhT+by\Large h_T = \tanh(W_{xh}x_T + W_{hh}h_{T-1} + b_h),\quad y_T = W_{hy}h_T + b_y$

关键：所有权重 $Wxh,Whh,Why\Large W_{xh},W_{hh},W_{hy}$ 在所有时间步共享。

4. 反向传播 BPTT 要点

损失： $L=∑t=1TLt(yt,y^t)\Large L = \sum_{t=1}^T L_t(y_t, \hat{y}_t)$
梯度沿时间反向传播，对 $Wxh,Whh,Why\Large W_{xh},W_{hh},W_{hy}$ 求导并累加
容易出现梯度消失/爆炸，所以后来才有 LSTM、GRU

5. 极简总结

RNN 就是：
输入 + 上一时刻状态 → 激活 → 当前状态 → 输出
一步一步沿着序列走，权重全程不变。

好，我们直接来一个标准矩阵版 RNN 手算例子，维度小、步骤全，能看清每一步矩阵运算。

1. 设定参数

输入维度： $d_x = 2$
隐藏层维度： $d_h = 2$
输出维度： $d_y = 1$
序列长度 $T = 2$ ，输入：
$\Large x_1=\begin{bmatrix}1\\0\end{bmatrix},\quad x_2=\begin{bmatrix}0\\1\end{bmatrix}$
初始隐藏状态：
$h_0 = \begin{bmatrix}0\\0\end{bmatrix}$

权重矩阵（手动给定）

$\Large W_{xh} = \begin{bmatrix} 0.1 & 0.2\\ 0.3 & 0.4 \end{bmatrix},\quad W_{hh} = \begin{bmatrix} 0.5 & 0.6\\ 0.7 & 0.8 \end{bmatrix},\quad b_h = \begin{bmatrix}0.1\\0.2\end{bmatrix}$

$\Large W_{hy} = \begin{bmatrix}0.9 & 1.0\end{bmatrix},\quad b_y = 0.1$
激活函数： $tanh⁡\tanh$

2. 计算公式

$\Large h_t = \tanh\left(W_{xh}x_t + W_{hh}h_{t-1} + b_h\right)$

$\Large y_t = W_{hy}h_t + b_y$

3. 计算 t=1

线性部分
$\Large W_{xh}x_1 = \begin{bmatrix}0.1&0.2\\0.3&0.4\end{bmatrix}\begin{bmatrix}1\\0\end{bmatrix} = \begin{bmatrix}0.1\\0.3\end{bmatrix}$

$\Large W_{hh}h_0 = 0$

$\Large z_1 = W_{xh}x_1 + W_{hh}h_0 + b_h = \begin{bmatrix}0.1\\0.3\end{bmatrix} + \begin{bmatrix}0.1\\0.2\end{bmatrix} = \begin{bmatrix}0.2\\0.5\end{bmatrix}$
2. 激活
$\Large h_1 = \tanh\begin{bmatrix}0.2\\0.5\end{bmatrix} \approx \begin{bmatrix}0.197\\0.462\end{bmatrix}$
3. 输出
$\Large y_1 = \begin{bmatrix}0.9&1.0\end{bmatrix}\begin{bmatrix}0.197\\0.462\end{bmatrix} + 0.1 \approx 0.177 + 0.462 + 0.1 = 0.739$

4. 计算 t=2

线性部分
$\Large W_{xh}x_2 = \begin{bmatrix}0.1&0.2\\0.3&0.4\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix} = \begin{bmatrix}0.2\\0.4\end{bmatrix}$

$\Large W_{hh}h_1 =\begin{bmatrix}0.5&0.6\\0.7&0.8\end{bmatrix}\begin{bmatrix}0.197\\0.462\end{bmatrix} \approx\begin{bmatrix}0.376\\0.507\end{bmatrix}$