CppCon 2022 学习:C++ in the World of Embedded Systems

原创已于 2025-10-07 16:13:48 修改 · 658 阅读

29 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#学习 #c++ #开发语言

于 2025-10-07 16:11:12 首次发布

CppCon 专栏收录该内容

642 篇文章

订阅专栏

CppCon 2022 学习:C++ in the World of Embedded Systems

1⃣ C++20 新增属性 `[[likely]]` 与 `[[unlikely]]`

基本用法示例

if (n > 5) [[unlikely]] {
    g(0);
    return n * 2 + 1;
}
switch (n) {
case 1:
    g(1);
    [[fallthrough]];
[[likely]] case 2:
    g(2);
    break;
}

[[likely]] 表示该路径比其他路径更可能被执行。
[[unlikely]] 表示该路径比其他路径更不可能被执行。
在例子中：
n > 5 被标记为不太可能。
n == 2 被标记为很可能。
注意：
“arbitrarily likely/unlikely” = 编译器可以假设这种情况几乎必然/几乎不可能发生。
过度使用可能反而导致性能下降（优化器可能做了错误的假设）。

2⃣ 历史背景

早期做法是 GCC 内置宏：

#define likely(x)   __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)

作用相同：告诉编译器某个条件更可能为真或假。
Linux 内核广泛使用。
C++20 引入属性后，可以直接写在语句或标签上，而不是包在宏里，更现代化、更语义化。

3⃣ 标准对用法的限制（摘自 P0479）

likely 和 unlikely 只能应用于 标签或语句。
同一语句/标签不能同时出现 likely 和 unlikely。
一个执行路径包含一个标签，仅当存在跳转到该标签时，才认为该路径包含这个标签。
推荐做法：
- [[likely]] 用于明显更可能的路径。
- [[unlikely]] 用于明显更不可能的路径。
注意：
- 滥用可能导致性能下降。

4⃣ 编译器能利用这些信息做什么？

优化器获取了“执行可能性”的提示，主要作用：

改善代码布局（Instruction Cache, I-Cache）
- 高频执行路径放在一起，减少缓存未命中。
改善分支预测（Branch Prediction）
- 告诉 CPU 哪条分支更可能被走，减少错误预测导致的流水线刷新。
- 对于简单分支，也可能将条件分支优化成 cmov（无跳转条件移动指令）。
微架构相关
- 不同 CPU 对分支预测、缓存布局、流水线优化方式不同，因此效果依赖硬件。

总结理解

[[likely]] / [[unlikely]] = 编译器提示，帮助优化“热路径”和“冷路径”。
本质上是 性能优化 hint，不会改变语义。
滥用会适得其反，因为可能破坏分支预测或缓存布局。
历史上已有 GCC 的 __builtin_expect 实现，现在 C++20 提供标准语法，更安全、易读。

                             ┌─────────────────┐            ┌──────────────────────┐                                         
                         ┌───┤      BPU        ├────────────▶   2K L1 Instruction  ◀─────────┐                               
                         │   └─────────────────┘            │       Cache          │         │                               
                         │                                  └─────────────────┬────┘         │                               
                         │                                                    │              │                               
                         │                                                    │              │                               
                      ┌──▼─────────────┐          ┌──────────────────────┐    │              │                               
┌──────────┐          │ Decoded Icache │          │  Legacy Decode       ◀────┘              │                               
│  MSROM   │          │      (DSB)     ◀──────────┤     Pipeline         │                   │                               
└────┬─────┘          └────────┬───────┘          └────────────┬─────────┘                   │                               
     │                         │                               │                             │                               
     │4 uops/cyde              │6 uops/cyde         5 uops/cyde│                             │                               
     │                         │                               │                             │                               
     │                         │                               │                             │                               
┌────▼─────────────────────────▼───────────────────────────────▼────────────────────┐        │                               
│              Instruction Decode Queue (IDQ, or micro-op queue)                    │        │                               
└──────────────────────────────┬────────────────────────────────────────────────────┘        │                               
                               │                                                             │                               
┌──────────────────────────────▼────────────────────────────────────────────────────┐        │                               
│                    Alocate/Rename/Retire/MoveElimination/Zeroldiom                │        │                               
└──────────────────────────────┬────────────────────────────────────────────────────┘        │                               
                               │                                                             │                               
┌──────────────────────────────▼────────────────────────────────────────────────────┐        │                               
│                             Scheduler                                             │        │                               
└────▲──────────▲──────────▲───────────▲───────────┬────────────────────────────────┘        │                               
     │          │          │           │           │                                         │                               
     │Port0     │Port1     │Port5      │Port6      │         ┌──────────┐                    │                               
     │          │          │           │           │         │  Port 2  │                    │                               
┌────▼────┐ ┌───▼────┐ ┌───▼─────┐ ┌───▼────┐      ├─────────▶  LD/STA  ◀────┐               │                               
│Int ALU, │ │Int ALU,│ │Int ALU, │ │Int ALU,│      │         └──────────┘    │               │                               
│Vec FMA, │ │Fast LEA│ │Fast LEA,│ │Int Shft,      │         ┌──────────┐    │       ┌───────┴────────┐          ┌──────────┐
│Vec MUL, │ │Vec FMA,│ │Vec SHUF,│ │Branch1,│      │         │  Port3   ◀───┐│       │  256K L2 Cache │          │ Main     │
│Vec Add, │ │Vec MUL,│ │Vec ALU, │ └────────┘      ├─────────▶  LD/STA  │   ││       │  (Unified)     ◀──────────▶ Memory   │
│Vec ALU, │ │Vec Add,│ │CVT      │                 │         └──────────┘   ││       └────────────▲───┘          └──────────┘
│Vec Shft,│ │Vec ALU,│ └─────────┘                 │         ┌──────────┐   ││          ┌─────────┘                          
│Divide,  │ │Vec Shft│                             │         │  Port 4  │   ││    ┌─────▼─────────────┐                      
│Branch2  │ │Int MUL,│                             ├─────────▶  STD     ├──┐│└────▶                   │                      
└─────────┘ │Slow LEA│                             │         └──────────┘  ││     │                   │                      
            └────────┘                             │         ┌──────────┐  │└─────▶                   │                      
                                                   │         │  Port 7  ├─┐│      │ 32K L1 Data Cache │                      
                                                   └─────────▶  STA     │ │└──────▶                   │                      
                                                             └──────────┘ │       │                   │                      
                                                                          └───────▶                   │                      
                                                                                  └───────────────────┘

Intel Core i9-9900K 的流水线图用详细解析，并把各个模块的作用和 CPU 内部流程说明清楚。

1⃣ CPU 指令执行总体流程

从上到下，CPU 的指令执行可以分为几个主要阶段：

指令获取 → 分支预测 → 指令缓存 → 指令解码 → 微操作队列 → 分配/重命名 → 调度 → 执行端口 → 数据缓存/主存

图里就是这个流程的可视化。

2⃣ 指令获取与预测

BPU (Branch Prediction Unit, 分支预测单元)
- 对条件分支、循环等指令预测哪个分支更可能被执行。
- 对 [[likely]] / [[unlikely]] 的 hint 特别敏感，会提高预测准确性。
- 输出预测结果给下一阶段，减少流水线冲刷。
2K L1 指令缓存
- 存储最近解码或即将执行的指令。
- 热路径指令尽量放在这里，提高命中率。

3⃣ 指令解码阶段

Decoded Icache (DSB, 微操作缓存)
- 将复杂 x86 指令解码为固定长度的微操作（uops），方便流水线调度。
Legacy Decode Pipeline
- 处理传统复杂指令集的解码逻辑。
MSROM (微码存储器)
- 用于一些复杂指令，需要多条微操作序列才能实现的指令。

4⃣ 微操作队列

Instruction Decode Queue (IDQ, uop 队列)
- 缓存已解码的微操作，按顺序等待执行。

5⃣ 分配/重命名/优化阶段

Allocate / Rename / Retire / Move Elimination / Zero Idiom
- Allocate：为微操作分配寄存器。
- Rename：寄存器重命名，避免写后读冲突。
- Retire：指令完成后从流水线中删除。
- Move Elimination：优化无用数据移动。
- Zero Idiom：零值寄存器的优化。

6⃣ 调度器 (Scheduler)

根据端口可用性和指令依赖关系，将微操作分配到对应 执行端口。
调度器负责优化流水线并行度和端口负载平衡。

7⃣ 执行端口与功能单元

CPU 有多个端口，每个端口对应不同类型的运算：

端口	功能单元示例
Port0	整数 ALU，向量 FMA/MUL/Add/Shift/Divide，Branch2
Port1	整数 ALU，Fast/Slow LEA，向量 FMA/MUL/Add/Shift
Port2	Load/Store 地址生成
Port3	Load/Store 数据
Port4	Store 数据
Port5	整数 ALU，Fast LEA，向量 Shuffle/ALU/CVT
Port6	整数 Shift，Branch1
Port7	Store 数据

uops 会根据类型分配到不同端口并行执行。

8⃣ 数据缓存与内存访问

32K L1 Data Cache
- 执行 Load/Store 指令时优先访问 L1 数据缓存。
256K L2 Cache (统一)
- L1 未命中时访问 L2，延迟更高，但仍比主存快。
主存 (Memory)
- L2 未命中时访问 DRAM，延迟最大。

9⃣ 与 `[[likely]]` / `[[unlikely]]` 的关系

BPU 分支预测：[[likely]] 告诉 CPU 该路径热，[[unlikely]] 告诉 CPU 冷路径。
指令缓存 (I-Cache)：热路径指令放在 L1 I-Cache，减少访问延迟。
流水线 flush：预测错误会导致整个流水线清空，浪费周期；预测正确则流水线平稳执行。
执行端口调度：热路径指令更容易被调度器优化端口分配，提高吞吐量。

总结理解

CPU 内部是深度流水线 + 超标量 + 多端口执行单元。
每条指令经过：
- 分支预测 → 指令缓存 → 解码 → 微操作队列 → 分配/重命名 → 调度 → 执行端口 → 数据缓存 → 主存
[[likely]] / [[unlikely]] 是优化 hint，让 CPU 更聪明地预测分支和安排热路径，提高性能。
滥用 hint 可能破坏优化效果，反而降低性能。

int foo0();
int foo1();
int foo2();
int foo3();
int foo4();
int foo5();
int foo6();
int foo7();
int foo8();
int foo9();
int bar(int);
int foo(int x)
{
    switch (x)
    {
        case 0:
            return foo0();
        case 1:
            return foo1();
        case 2:
            return foo2();
        case 3:
            return foo3();
        case 4:
            return foo4();
        [[unlikely]] case 5:
            return foo5();
        case 6:
            return foo6();
        [[likely]] case 7:
            return foo7();
        case 8:
            return foo8();
        case 9:
            return foo9();
        default:
            return bar(x);
    }
}

foo(int):
        cmp     edi, 9
        ja      .L2
        mov     edi, edi
        jmp     [QWORD PTR .L4[0+rdi*8]]
.L4:
        .quad   .L13
        .quad   .L12
        .quad   .L11
        .quad   .L10
        .quad   .L9
        .quad   .L8
        .quad   .L7
        .quad   .L6
        .quad   .L5
        .quad   .L3
.L5:
        jmp     foo8()
.L3:
        jmp     foo9()
.L13:
        jmp     foo0()
.L12:
        jmp     foo1()
.L11:
        jmp     foo2()
.L10:
        jmp     foo3()
.L9:
        jmp     foo4()
.L8:
        jmp     foo5()
.L7:
        jmp     foo6()
.L6:
        jmp     foo7()
.L2:
        jmp     bar(int)

int foo0();
int foo1();
int foo2();
int foo3();
int foo4();
int foo5();
int foo6();
int foo7();
int foo8();
int foo9();
int bar(int);
int foo(int x)
{
    switch (x)
    {
        case 0:
            return foo0();
        case 1:
            return foo1();
        case 2:
            return foo2();
        case 3:
            return foo3();
        case 4:
            return foo4();
        [[unlikely]] case 5:
            return foo5();
        case 6:
            return foo6();
        [[likely]] case 7:
            return foo7();
        case 8:
            return foo8();
        case 9:
            return foo9();
        default:
            return bar(x);
    }
}

foo(int):
        cmp     edi, 9
        ja      .L2
        mov     edi, edi
        jmp     [QWORD PTR .L4[0+rdi*8]]
.L4:
        .quad   .L13
        .quad   .L12
        .quad   .L11
        .quad   .L10
        .quad   .L9
        .quad   .L8
        .quad   .L7
        .quad   .L6
        .quad   .L5
        .quad   .L3
.L6:
        jmp     foo7()
.L5:
        jmp     foo8()
.L7:
        jmp     foo6()
.L9:
        jmp     foo4()
.L10:
        jmp     foo3()
.L11:
        jmp     foo2()
.L12:
        jmp     foo1()
.L13:
        jmp     foo0()
.L3:
        jmp     foo9()
.L8:
        jmp     foo5()
.L2:
        jmp     bar(int)

int foo0();
int foo1();
int foo2();
int foo3();
int foo4();
int foo5();
int foo6();
int foo7();
int foo8();
int foo9();
int bar(int);
int foo(int x)
{
    switch (x)
    {
        case 0:
            return foo0();
        case 1:
            return foo1();
        case 2:
            return foo2();
        case 3:
            return foo3();
        case 4:
            return foo4();
        [[unlikely]] case 5:
            return foo5();
        case 6:
            return foo6();
        case 7:
            return foo7();
        case 8:
            return foo8();
        case 9:
            return foo9();
        [[likely]]  default:
            return bar(x);
    }
}

foo(int):
        cmp     edi, 9
        ja      .L2
        mov     edi, edi
        jmp     [QWORD PTR .L4[0+rdi*8]]
.L4:
        .quad   .L13
        .quad   .L12
        .quad   .L11
        .quad   .L10
        .quad   .L9
        .quad   .L8
        .quad   .L7
        .quad   .L6
        .quad   .L5
        .quad   .L3
.L2:
        jmp     bar(int)
.L5:
        jmp     foo8()
.L6:
        jmp     foo7()
.L7:
        jmp     foo6()
.L9:
        jmp     foo4()
.L10:
        jmp     foo3()
.L11:
        jmp     foo2()
.L12:
        jmp     foo1()
.L13:
        jmp     foo0()
.L3:
        jmp     foo9()
.L8:
        jmp     foo5()

C++ `switch` + `[[likely]]` / `[[unlikely]]` 示例和对应汇编的代码布局优化用整理理解。

1⃣ 背景：代码布局优化

“热路径” (hot path)
- 高概率执行的代码，标记为 [[likely]]
- 应该放得靠近其他热代码，尽量在 L1 指令缓存里连续排列。
“冷路径” (cold path)
- 很少执行的代码，标记为 [[unlikely]]
- 可以放远一些，减少对 I-Cache 的压力，给热路径留空间。

2⃣ 示例分析

原始 switch

switch (x) {
    case 0: return foo0();
    ...
    [[unlikely]] case 5: return foo5();
    [[likely]] case 7: return foo7();
    default: return bar(x);
}

默认 case 在最后，编译器默认认为 default 不常用。
case 5 标记为 [[unlikely]] → 代码生成在偏远位置。
case 7 标记为 [[likely]] → 代码生成靠近开头或其他热路径。
汇编中看到 .L6 (foo7) 靠近其他热代码，.L8 (foo5) 较远。

编译器生成汇编特点

foo(int):
    cmp     edi, 9          ; 检查 x 是否 > 9
    ja      .L2             ; 条件跳转到 default
    mov     edi, edi
    jmp     [QWORD PTR .L4[0+rdi*8]] ; 间接跳转到 case 地址

间接跳转表 (.L4) 将各 case 对应的地址存起来。
代码布局顺序：
- 冷路径（unlikely）放远 → 减少缓存占用。
- 热路径（likely）放近 → 提高 I-Cache 命中率。

修改 default 为 `[[likely]]`

[[likely]] default:
    return bar(x);

汇编中 .L2（原 default）现在直接跳转执行，放在开头或热路径附近。
注意：
- 如果 case 7 仍标记 [[likely]]，gcc 可能不接受 → 因此移除。
原理：
- 热路径优先布局在一起，提高指令缓存效率。
- 冷路径靠后布局，减少 I-Cache 竞争。

3⃣ 总结原则

代码布局 = 热路径靠前 + 冷路径靠后
- 减少 L1 I-Cache 压力，提升 CPU 指令流水线效率。
[[likely]] / [[unlikely]] 提供 hint
- 编译器用来安排跳转表和布局顺序。
默认 case 一般认为不常用
- 写在最后，或者标记 [[unlikely]] / [[likely]] 让编译器优化布局。
间接跳转表 + case 布局
- 汇编中看到 .L4 间接跳转表 + 各 .Lx 标签对应 case。
- 编译器会根据热/冷信息调整 .Lx 地址顺序。

实用经验

标记 hot/cold 路径可以提升性能，但不要滥用。
观察汇编 可以验证布局是否符合预期。
switch 中 热 case 靠近，冷 case 远离，默认 case 可以根据实际概率调整布局。

编译器与 CPU 的双分支预测以及它们之间关系的内容做一个理解整理。

1⃣ 编译器拿到 `[[likely]]` / `[[unlikely]]` 的信息能做什么？

优化代码布局 (Code Layout)
- 将热路径（likely）放在连续位置，冷路径（unlikely）放远一些。
- 作用：提高 L1 指令缓存 (I-Cache) 命中率，减少缓存压力。
优化分支指令 (Branch Instructions)
- 让 CPU 更容易正确预测条件分支。
- 区别不同类型的分支：
  - 有跳转的条件分支 vs 无跳转的条件执行（cmov 指令）
- 优化效果依赖 CPU 微架构。

2⃣ 两种分支预测机制 (“A Tale of Two Branch Predictors”)

2.1 CPU 的分支预测器

输入：运行时的指令流（bytecodes）。
特点：
- 根据之前的执行历史预测分支走向。
- 预测影响流水线调度。
- 错误预测代价高：流水线被清空，浪费 ~16–20 个周期（x86 微架构）。
- 必须存在，否则每次遇到分支都要停顿。

2.2 编译器的分支预测器

输入：完整的高层代码（C++），编译时分析。
特点：
- 静态分析代码，判断哪些分支可能热、哪些可能冷。
- 根据 [[likely]] / [[unlikely]] 或者 PGO（Profile-Guided Optimization）信息生成更优汇编。
- 影响最终的机器码布局、跳转表顺序、条件分支的生成方式。
- 可在 IR（中间表示）上标记源代码信息帮助优化。

3⃣ 二者的联系

方面	CPU Predictor	Compiler Predictor
预测依据	运行时历史数据	编译时静态分析、PGO
可见范围	逐条指令	全局代码结构
作用	影响流水线实时执行	影响机器码生成、代码布局、跳转顺序
信息来源	真实执行流	源代码 hint (`[[likely]]` / `[[unlikely]]`)
联系总结：

编译器优化 → 影响机器码布局 & 分支生成 → 辅助 CPU 预测
- 比如热路径靠前、冷路径靠后，间接让 CPU 分支预测更准确。
CPU 执行时 → 结合历史执行数据进行动态预测
- 即使编译器做了优化，CPU 仍然可以基于运行时行为做修正。
PGO (Profile-Guided Optimization) 可以让编译器知道哪些路径真正在运行中热/冷，使静态预测更准确。

小结：编译器预测是静态的、提前布局热/冷路径；CPU 预测是动态的、执行时调整，两者互相配合提升性能。
CPU’s Branch Predictor

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}

graph TD
    A[res = -1] --> B{n > 0}
    B -.->|FALSE| C[return res]
    subgraph Process
    B -->|"TRUE (likely)"| D[res++; <br> n /= 2]
    D -.-> B
    end

好的，我们把你贴的内容用整理理解一下，结合你提供的 log2 示例和 CPU 分支预测机制。

1⃣ CPU 的分支预测器 (Branch Predictor)

CPU 在执行程序时并不知道源代码结构，只能根据运行时指令流做预测。
对循环、条件判断等分支指令，CPU 会猜测最可能的执行路径，以避免流水线停顿。
分支预测正确 → 流水线平稳执行
分支预测错误 → 流水线清空，损失 16~20 个周期（Intel 微架构典型代价）。

示例：`log2` 函数

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}

用 mermaid 流程图表示：

graph TD
    A[res = -1] --> B{n > 0}
    B -.->|FALSE| C[return res]
    subgraph Process
    B -->|"TRUE (likely)"| D[res++; <br> n /= 2]
    D -.-> B
    end

graph TD
    A[res = -1] --> B{n > 0}
    B -.->|FALSE| C[return res]
    subgraph Process
    B -->|"TRUE (likely)"| D[res++; <br> n /= 2]
    D -.-> B
    end

CPU 不知道 while 循环的次数，但会尝试预测 n > 0 是 TRUE 还是 FALSE。
预测正确 → 循环连续执行，流水线不被打断。

2⃣ C++20 `[[likely]]` / `[[unlikely]]` 对 CPU 的影响？

[[likely]] / [[unlikely]] 是编译器 hint，告诉编译器哪个分支更热或更冷。
实际在 CPU 级别：
- x86-64（Intel, AMD）：
  - 历史上 Pentium 4 有 branch hints (0x2E = not taken, 0x3E = taken)。
  - 现在这些已经保留或改作 CET（Control-flow Enforcement Technology），不再用于现代 CPU。
  - 所以现代 x86-64 CPU 不会直接使用 [[likely]] hint。
- ARM（Arm7, AArch32, AArch64）：
  - 没有类似 branch hints。
- POWER / PowerPC：
  - 有 branch hints，但在 power64 编译中几乎不使用。
  - 只有在静态预测非常可靠的情况下才建议使用。

结论

[[likely]] / [[unlikely]] 主要影响编译器的优化：
1. 代码布局 → 热路径靠近，冷路径偏远
2. 生成更合理的跳转顺序
现代 CPU 的动态分支预测器并不会直接读取 hint，但间接受益：
- 热路径靠前 → CPU 更容易预测“分支走向正确”，流水线效率高
- 冷路径远离 → 分支预测失误的影响最小化

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}
int main() {  //
    log2(10);
}

xiaqiu@xz:~/test/CppCon/day394/code$ mkdir build
xiaqiu@xz:~/test/CppCon/day394/code$ cd build
xiaqiu@xz:~/test/CppCon/day394/code/build$ g++ -fdump-tree-all-graph main.cpp -o snippet -O3
xiaqiu@xz:~/test/CppCon/day394/code/build$ ls | grep profile
snippet-main.cpp.049t.profile_estimate
snippet-main.cpp.049t.profile_estimate.dot
xiaqiu@xz:~/test/CppCon/day394/code/build$ dot ./snippet-main.cpp.049t.profile_estimate.dot -Tsvg > estimate.svg
可以保存成svg或者png
xiaqiu@xz:~/test/CppCon/day394/code/build$ dot ./snippet-main.cpp.049t.profile_estimate.dot -Tpng > estimate.png
生成的svg

<!-- Generated by graphviz version 2.43.0 (0)
 -->
<!-- Title: snippet&#45;main.cpp.049t.profile_estimate Pages: 1 -->
<svg width="826pt" height="615pt" version="1.1" viewBox="0 0 826 615" xmlns="http://www.w3.org/2000/svg">
 <g class="graph" transform="scale(1) translate(4 611)">
  <title>snippet-main.cpp.049t.profile_estimate</title>
  <polygon points="-4 4 -4 -611 822 -611 822 4" fill="#fff" stroke="transparent"/>
  <g class="cluster">
   <title>cluster_log2</title>
   <polygon points="8 -8 8 -599 540 -599 540 -8" fill="none" stroke="#000" stroke-dasharray="5,2"/>
   <text x="274" y="-583.8" font-family="Times,serif" font-size="14" text-anchor="middle">log2 ()</text>
  </g>
  <g class="cluster">
   <title>cluster_0_1</title>
   <polygon points="262 -95 262 -393 532 -393 532 -95" fill="#e0e0e0" stroke="#006400" stroke-width="2"/>
   <text x="292" y="-377.8" font-family="Times,serif" font-size="14" text-anchor="middle">loop 1</text>
  </g>
  <g class="cluster">
   <title>cluster_main</title>
   <polygon points="548 -267 548 -599 810 -599 810 -267" fill="none" stroke="#000" stroke-dasharray="5,2"/>
   <text x="679" y="-583.8" font-family="Times,serif" font-size="14" text-anchor="middle">main ()</text>
  </g>
  <!-- fn_0_basic_block_4 -->
  <g class="node">
   <title>fn_0_basic_block_4</title>
   <polygon points="270 -224.5 270 -361.5 524 -361.5 524 -224.5" fill="#d3d3d3" stroke="#000"/>
   <text x="278" y="-346.3" font-family="Times,serif" font-size="14">COUNT:1073741824&lt;bb 4&gt;:</text>
   <polyline points="270 -338.5 524 -338.5" fill="none" stroke="#000"/>
   <text x="278" y="-323.3" font-family="Times,serif" font-size="14"># n_1 = PHI &lt;n_4(D)(2), n_8(3)&gt;</text>
   <polyline points="270 -315.5 524 -315.5" fill="none" stroke="#000"/>
   <text x="278" y="-300.3" font-family="Times,serif" font-size="14"># res_2 = PHI &lt;-1(2), res_7(3)&gt;</text>
   <polyline points="270 -292.5 524 -292.5" fill="none" stroke="#000"/>
   <g font-family="Times,serif" font-size="14">
    <text x="278" y="-277.3">if (n_1 &gt; 0)</text>
    <text x="278" y="-262.3">goto &lt;bb 3&gt;; [89.00%]</text>
    <text x="278" y="-247.3">else</text>
    <text x="278" y="-232.3">goto &lt;bb 5&gt;; [11.00%]</text>
   </g>
  </g>
  <!-- fn_0_basic_block_3 -->
  <g class="node">
   <title>fn_0_basic_block_3</title>
   <polygon points="288.5 -103.5 288.5 -172.5 505.5 -172.5 505.5 -103.5" fill="#d3d3d3" stroke="#000"/>
   <text x="296.5" y="-157.3" font-family="Times,serif" font-size="14">COUNT:955630225&lt;bb 3&gt;:</text>
   <polyline points="288.5 -149.5 505.5 -149.5" fill="none" stroke="#000"/>
   <text x="296.5" y="-134.3" font-family="Times,serif" font-size="14">res_7 = res_2 + 1;</text>
   <polyline points="288.5 -126.5 505.5 -126.5" fill="none" stroke="#000"/>
   <text x="296.5" y="-111.3" font-family="Times,serif" font-size="14">n_8 = n_1 &gt;&gt; 1;</text>
  </g>
  <!-- fn_0_basic_block_4&#45;&gt;fn_0_basic_block_3 -->
  <g class="edge">
   <title>fn_0_basic_block_4:s-&gt;fn_0_basic_block_3:n</title>
   <path d="m397-224v40.65" fill="none" stroke="#228b22" stroke-width="2"/>
   <polygon points="400.5 -183 397 -173 393.5 -183" fill="#228b22" stroke="#228b22" stroke-width="2"/>
   <text x="418.5" y="-194.8" font-family="Times,serif" font-size="14" text-anchor="middle">[89%]</text>
  </g>
  <!-- fn_0_basic_block_5 -->
  <g class="node">
   <title>fn_0_basic_block_5</title>
   <polygon points="36.5 -103.5 36.5 -172.5 253.5 -172.5 253.5 -103.5" fill="#d3d3d3" stroke="#000"/>
   <text x="44.5" y="-157.3" font-family="Times,serif" font-size="14">COUNT:118111600&lt;bb 5&gt;:</text>
   <polyline points="36.5 -149.5 253.5 -149.5" fill="none" stroke="#000"/>
   <text x="44.5" y="-134.3" font-family="Times,serif" font-size="14"># res_3 = PHI &lt;res_2(4)&gt;</text>
   <polyline points="36.5 -126.5 253.5 -126.5" fill="none" stroke="#000"/>
   <text x="44.5" y="-111.3" font-family="Times,serif" font-size="14">return res_3;</text>
  </g>
  <!-- fn_0_basic_block_4&#45;&gt;fn_0_basic_block_5 -->
  <g class="edge">
   <title>fn_0_basic_block_4:s-&gt;fn_0_basic_block_5:n</title>
   <path d="m397-224c0 25.78-205.18 23.3-245.26 43.56" fill="none" stroke="#ff8c00" stroke-width="2"/>
   <polygon points="154.31 -178.06 145 -173 149.12 -182.76" fill="#ff8c00" stroke="#ff8c00" stroke-width="2"/>
   <text x="363.5" y="-194.8" font-family="Times,serif" font-size="14" text-anchor="middle">[11%]</text>
  </g>
  <!-- fn_0_basic_block_3&#45;&gt;fn_0_basic_block_4 -->
  <g class="edge">
   <title>fn_0_basic_block_3:s-&gt;fn_0_basic_block_4:n</title>
   <path d="m397-102c0 12.06 99.39 6.9 108.5-1 21.8-18.9 38.82-238.52 18.5-259-4.1-4.13-87.72-6.92-116.99-3.83" fill="none" stroke="blue" stroke-dasharray="1,5" stroke-width="2"/>
   <polygon points="407.57 -362.35 397 -363 405.67 -369.09" fill="blue" stroke="blue" stroke-width="2"/>
   <text x="553" y="-194.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_0 -->
  <g class="node">
   <title>fn_0_basic_block_0</title>
   <polygon points="397 -568 335.33 -550 397 -532 458.67 -550" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="346.85 -553.36 346.85 -546.64"/>
    <polyline points="385.48 -535.36 408.52 -535.36"/>
    <polyline points="447.15 -546.64 447.15 -553.36"/>
    <polyline points="408.52 -564.64 385.48 -564.64"/>
   </g>
   <text x="397" y="-546.3" font-family="Times,serif" font-size="14" text-anchor="middle">ENTRY</text>
  </g>
  <!-- fn_0_basic_block_1 -->
  <g class="node">
   <title>fn_0_basic_block_1</title>
   <polygon points="145 -52 98.25 -34 145 -16 191.75 -34" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="109.44 -38.31 109.44 -29.69"/>
    <polyline points="133.8 -20.31 156.2 -20.31"/>
    <polyline points="180.56 -29.69 180.56 -38.31"/>
    <polyline points="156.2 -47.69 133.8 -47.69"/>
   </g>
   <text x="145" y="-30.3" font-family="Times,serif" font-size="14" text-anchor="middle">EXIT</text>
  </g>
  <!-- fn_0_basic_block_0&#45;&gt;fn_0_basic_block_1 -->
  <!-- fn_0_basic_block_2 -->
  <g class="node">
   <title>fn_0_basic_block_2</title>
   <polygon points="288.5 -438.5 288.5 -476.5 505.5 -476.5 505.5 -438.5" fill="#d3d3d3" stroke="#000"/>
   <text x="296.5" y="-461.3" font-family="Times,serif" font-size="14">COUNT:118111600&lt;bb 2&gt;:</text>
   <text x="296.5" y="-446.3" font-family="Times,serif" font-size="14">goto &lt;bb 4&gt;; [100.00%]</text>
  </g>
  <!-- fn_0_basic_block_0&#45;&gt;fn_0_basic_block_2 -->
  <g class="edge">
   <title>fn_0_basic_block_0:s-&gt;fn_0_basic_block_2:n</title>
   <path d="m397-532v44.34" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="400.5 -487.5 397 -477.5 393.5 -487.5" stroke="#000" stroke-width="2"/>
   <text x="423" y="-502.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_2&#45;&gt;fn_0_basic_block_4 -->
  <g class="edge">
   <title>fn_0_basic_block_2:s-&gt;fn_0_basic_block_4:n</title>
   <path d="m397-437.5v64.46" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="400.5 -373 397 -363 393.5 -373" stroke="#000" stroke-width="2"/>
   <text x="423" y="-404.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_5&#45;&gt;fn_0_basic_block_1 -->
  <g class="edge">
   <title>fn_0_basic_block_5:s-&gt;fn_0_basic_block_1:n</title>
   <path d="m145-102v39.85" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="148.5 -62 145 -52 141.5 -62" stroke="#000" stroke-width="2"/>
   <text x="171" y="-73.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_1_basic_block_0 -->
  <g class="node">
   <title>fn_1_basic_block_0</title>
   <polygon points="689 -568 627.33 -550 689 -532 750.67 -550" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="638.85 -553.36 638.85 -546.64"/>
    <polyline points="677.48 -535.36 700.52 -535.36"/>
    <polyline points="739.15 -546.64 739.15 -553.36"/>
    <polyline points="700.52 -564.64 677.48 -564.64"/>
   </g>
   <text x="689" y="-546.3" font-family="Times,serif" font-size="14" text-anchor="middle">ENTRY</text>
  </g>
  <!-- fn_1_basic_block_1 -->
  <g class="node">
   <title>fn_1_basic_block_1</title>
   <polygon points="689 -311 642.25 -293 689 -275 735.75 -293" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="653.44 -297.31 653.44 -288.69"/>
    <polyline points="677.8 -279.31 700.2 -279.31"/>
    <polyline points="724.56 -288.69 724.56 -297.31"/>
    <polyline points="700.2 -306.69 677.8 -306.69"/>
   </g>
   <text x="689" y="-289.3" font-family="Times,serif" font-size="14" text-anchor="middle">EXIT</text>
  </g>
  <!-- fn_1_basic_block_0&#45;&gt;fn_1_basic_block_1 -->
  <!-- fn_1_basic_block_2 -->
  <g class="node">
   <title>fn_1_basic_block_2</title>
   <polygon points="576 -434.5 576 -480.5 802 -480.5 802 -434.5" fill="#d3d3d3" stroke="#000"/>
   <text x="584" y="-465.3" font-family="Times,serif" font-size="14">COUNT:1073741824&lt;bb 2&gt;:</text>
   <polyline points="576 -457.5 802 -457.5" fill="none" stroke="#000"/>
   <text x="584" y="-442.3" font-family="Times,serif" font-size="14">return 0;</text>
  </g>
  <!-- fn_1_basic_block_0&#45;&gt;fn_1_basic_block_2 -->
  <g class="edge">
   <title>fn_1_basic_block_0:s-&gt;fn_1_basic_block_2:n</title>
   <path d="m689-532v40.65" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="692.5 -491 689 -481 685.5 -491" stroke="#000" stroke-width="2"/>
   <text x="715" y="-502.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_1_basic_block_2&#45;&gt;fn_1_basic_block_1 -->
  <g class="edge">
   <title>fn_1_basic_block_2:s-&gt;fn_1_basic_block_1:n</title>
   <path d="m689-434v111.71" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="692.5 -322 689 -312 685.5 -322" stroke="#000" stroke-width="2"/>
   <text x="715" y="-404.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
 </g>
</svg>

CPU 分支预测和编译器分支预测，结合你生成的 .svg 文件，可以总结如下理解：

1. CPU 的分支预测（动态预测）

作用：CPU 在执行分支指令时，需要猜测哪条路径最可能被执行，以避免流水线停顿。
特性：
- 只看到机器码（bytecode）流，不知道程序的高层逻辑。
- 根据历史运行数据（过去分支的执行情况）做预测。
- 分支错误预测代价高：可能丢弃流水线，浪费 16~20 个周期。
- 对“冷分支”（第一次遇到的分支）有默认预测：
  - Intel：
    - 向前条件分支（forward branch）默认预测 not taken
    - 向后条件分支（backward branch）默认预测 taken
  - AMD：
    - 总是预测 not taken
- 间接分支（如 switch/case jump table、函数指针调用）更复杂。

2. 编译器的分支预测（静态预测）

作用：在编译期通过代码分析给出分支“静态概率”，生成优化后的机器码。
特点：
- 可看到整个源代码（C++ 级别）。
- 可以使用 [[likely]] 或 [[unlikely]] 给编译器提示：
  - [[likely]] 表示这个分支很可能被执行。
  - [[unlikely]] 表示这个分支不太可能被执行。
- 主要影响 代码布局（hot/cold path），间接减少 I-Cache 压力和提升预测效率。
- 对现代 x86-64 CPU 的分支预测器，[[likely]] 并不会直接改变 CPU 的硬件预测行为，只是给编译器优化提示。

3. 例子：`log2(int n)` 循环分析

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}

执行概率：
- 循环内 while (n > 0) 被执行的概率高（大多数情况下 n > 0）。
- n <= 0 的退出条件是冷路径（不常发生）。
在 .svg 文件中：
- [89%] 标记：大部分时间走循环体。
- [11%] 标记：退出循环的冷路径。
结论：
- CPU 的动态预测会通过历史数据快速学会循环多次执行。
- 编译器看到循环的热路径（while 内）会优先布局，提高 I-Cache 利用率。

4. 总结

CPU 的分支预测依赖运行时历史数据，动态调整。
编译器的静态预测依赖源代码分析和提示（[[likely]]/[[unlikely]]）。
[[likely]] 主要影响编译器优化（代码布局），对 CPU 预测器的直接硬件行为影响微乎其微。
.svg 图可以直观看到循环体是热路径（hot path），循环退出是冷路径（cold path）。
你贴的例子是关于在循环条件上加 [[likely]] 的效果，理解如下：

1. 原始函数 `log2(int n)`

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}

对应汇编：

mov eax, -1         ; res = -1
test edi, edi       ; 检查 n 是否为 0
jle .L4             ; 如果 n <= 0, 跳到循环外
.p2align ...
.L3:
add eax, 1          ; res++
sar edi             ; n /= 2
jne .L3             ; 如果 n != 0, 回到循环
ret
.p2align ...
.L4:
ret

循环条件 while (n > 0) 是热路径（大概率成立）。
CPU 分支预测器会根据历史数据学会预测 jne .L3 往循环体跳转。

2. 加上 `[[likely]]` 的函数 `log2_likely(int n)`

int log2_likely(int n) {
    int res = -1;
    while (n > 0) { [[likely]]
        res++;
        n /= 2;
    }
    return res;
}

对应汇编：

mov eax, -1
test edi, edi
jle .L4
.p2align ...
.L3:
add eax, 1
sar edi
jne .L3
ret
.p2align ...
.L4:
ret

3. 对比分析

观察：
- 两段汇编几乎完全一样，循环的 jne .L3 没有因为 [[likely]] 改变。
原因：
- [[likely]] 是给 编译器的提示，告诉它“这个分支很可能被执行”，主要用于：
  - 代码布局优化（hot path vs cold path）。
  - 生成分支权重信息（优化 jump table、函数内块顺序）。
- 对现代 x86/x64 CPU 的 动态分支预测器，[[likely]] 不会直接修改硬件预测逻辑。
- 编译器生成的汇编在循环这种明显热路径情况下，本身就会把循环体放在顺序执行路径上，因此 [[likely]] 没有实际改变汇编。

4. 结论

[[likely]] 主要影响 编译器静态预测/代码布局。
对 CPU 的 动态分支预测器，几乎没有直接效果，尤其是像 while (n > 0) 这种明显热循环。
在这种例子中，[[likely]] 并没有让汇编变化，因为编译器本身就已经做了最优布局。

int log2(int n) {
    int res = -1;
    while (n > 0) {
        [[unlikely]] res++;
        n /= 2;
    }
    return res;
}
int main() {  //
    log2(10);
}

xiaqiu@xz:~/test/CppCon/day394/code/build$ rm -rf snippet*
xiaqiu@xz:~/test/CppCon/day394/code/build$ g++ -fdump-tree-all-graph …/main.cpp -o snippet -O3
xiaqiu@xz:~/test/CppCon/day394/code/build$ ls | grep profile
snippet-main.cpp.049t.profile_estimate
snippet-main.cpp.049t.profile_estimate.dot
xiaqiu@xz:~/test/CppCon/day394/code/build$ dot ./snippet-main.cpp.049t.profile_estimate.dot -Tsvg > estimate1.svg
xiaqiu@xz:~/test/CppCon/day394/code/build$

<!-- Generated by graphviz version 2.43.0 (0)
 -->
<!-- Title: snippet&#45;main.cpp.049t.profile_estimate Pages: 1 -->
<svg width="900pt" height="638pt" version="1.1" viewBox="0 0 900 638" xmlns="http://www.w3.org/2000/svg">
 <g class="graph" transform="scale(1) translate(4 634)">
  <title>snippet-main.cpp.049t.profile_estimate</title>
  <polygon points="-4 4 -4 -634 896 -634 896 4" fill="#fff" stroke="transparent"/>
  <g class="cluster">
   <title>cluster_log2</title>
   <polygon points="8 -8 8 -622 614 -622 614 -8" fill="none" stroke="#000" stroke-dasharray="5,2"/>
   <text x="311" y="-606.8" font-family="Times,serif" font-size="14" text-anchor="middle">log2 ()</text>
  </g>
  <g class="cluster">
   <title>cluster_0_1</title>
   <polygon points="264 -95 264 -416 606 -416 606 -95" fill="#e0e0e0" stroke="#006400" stroke-width="2"/>
   <text x="294" y="-400.8" font-family="Times,serif" font-size="14" text-anchor="middle">loop 1</text>
  </g>
  <g class="cluster">
   <title>cluster_main</title>
   <polygon points="622 -290 622 -622 884 -622 884 -290" fill="none" stroke="#000" stroke-dasharray="5,2"/>
   <text x="753" y="-606.8" font-family="Times,serif" font-size="14" text-anchor="middle">main ()</text>
  </g>
  <!-- fn_0_basic_block_4 -->
  <g class="node">
   <title>fn_0_basic_block_4</title>
   <polygon points="272 -247.5 272 -384.5 526 -384.5 526 -247.5" fill="#d3d3d3" stroke="#000"/>
   <text x="280" y="-369.3" font-family="Times,serif" font-size="14">COUNT:1073741824&lt;bb 4&gt;:</text>
   <polyline points="272 -361.5 526 -361.5" fill="none" stroke="#000"/>
   <text x="280" y="-346.3" font-family="Times,serif" font-size="14"># n_1 = PHI &lt;n_4(D)(2), n_8(3)&gt;</text>
   <polyline points="272 -338.5 526 -338.5" fill="none" stroke="#000"/>
   <text x="280" y="-323.3" font-family="Times,serif" font-size="14"># res_2 = PHI &lt;-1(2), res_7(3)&gt;</text>
   <polyline points="272 -315.5 526 -315.5" fill="none" stroke="#000"/>
   <g font-family="Times,serif" font-size="14">
    <text x="280" y="-300.3">if (n_1 &gt; 0)</text>
    <text x="280" y="-285.3">goto &lt;bb 3&gt;; [10.00%]</text>
    <text x="280" y="-270.3">else</text>
    <text x="280" y="-255.3">goto &lt;bb 5&gt;; [90.00%]</text>
   </g>
  </g>
  <!-- fn_0_basic_block_3 -->
  <g class="node">
   <title>fn_0_basic_block_3</title>
   <polygon points="271.5 -103.5 271.5 -195.5 598.5 -195.5 598.5 -103.5" fill="#d3d3d3" stroke="#000"/>
   <text x="279.5" y="-180.3" font-family="Times,serif" font-size="14">COUNT:107374184&lt;bb 3&gt;:</text>
   <polyline points="271.5 -172.5 598.5 -172.5" fill="none" stroke="#000"/>
   <text x="279.5" y="-157.3" font-family="Times,serif" font-size="14">// predicted unlikely by cold label predictor.</text>
   <polyline points="271.5 -149.5 598.5 -149.5" fill="none" stroke="#000"/>
   <text x="279.5" y="-134.3" font-family="Times,serif" font-size="14">res_7 = res_2 + 1;</text>
   <polyline points="271.5 -126.5 598.5 -126.5" fill="none" stroke="#000"/>
   <text x="279.5" y="-111.3" font-family="Times,serif" font-size="14">n_8 = n_1 &gt;&gt; 1;</text>
  </g>
  <!-- fn_0_basic_block_4&#45;&gt;fn_0_basic_block_3 -->
  <g class="edge">
   <title>fn_0_basic_block_4:s-&gt;fn_0_basic_block_3:n</title>
   <path d="m399-247c0 23.84 26.59 23.88 34.06 41.03" fill="none" stroke="#228b22" stroke-width="2"/>
   <polygon points="436.53 -206.48 435 -196 429.66 -205.15" fill="#228b22" stroke="#228b22" stroke-width="2"/>
   <text x="448.5" y="-217.8" font-family="Times,serif" font-size="14" text-anchor="middle">[10%]</text>
  </g>
  <!-- fn_0_basic_block_5 -->
  <g class="node">
   <title>fn_0_basic_block_5</title>
   <polygon points="36.5 -115 36.5 -184 253.5 -184 253.5 -115" fill="#d3d3d3" stroke="#000"/>
   <text x="44.5" y="-168.8" font-family="Times,serif" font-size="14">COUNT:966367640&lt;bb 5&gt;:</text>
   <polyline points="36.5 -161 253.5 -161" fill="none" stroke="#000"/>
   <text x="44.5" y="-145.8" font-family="Times,serif" font-size="14"># res_3 = PHI &lt;res_2(4)&gt;</text>
   <polyline points="36.5 -138 253.5 -138" fill="none" stroke="#000"/>
   <text x="44.5" y="-122.8" font-family="Times,serif" font-size="14">return res_3;</text>
  </g>
  <!-- fn_0_basic_block_4&#45;&gt;fn_0_basic_block_5 -->
  <g class="edge">
   <title>fn_0_basic_block_4:s-&gt;fn_0_basic_block_5:n</title>
   <path d="m399-247c0 26.26-207.71 31.57-247.47 53.82" fill="none" stroke="#ff8c00" stroke-width="2"/>
   <polygon points="154.15 -190.85 145 -185.5 148.82 -195.38" fill="#ff8c00" stroke="#ff8c00" stroke-width="2"/>
   <text x="377.5" y="-217.8" font-family="Times,serif" font-size="14" text-anchor="middle">[90%]</text>
  </g>
  <!-- fn_0_basic_block_3&#45;&gt;fn_0_basic_block_4 -->
  <g class="edge">
   <title>fn_0_basic_block_3:s-&gt;fn_0_basic_block_4:n</title>
   <path d="m435-102.5c0 9.08 157.1 5.44 163.5-1 28.98-29.16 7.49-52.08 0-92.5-16.39-88.46 1.44-137.74-72.5-189-4.78-3.31-87.96-6.64-117.05-3.75" fill="none" stroke="blue" stroke-dasharray="1,5" stroke-width="2"/>
   <polygon points="409.57 -385.29 399 -386 407.7 -392.04" fill="blue" stroke="blue" stroke-width="2"/>
   <text x="621" y="-217.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_0 -->
  <g class="node">
   <title>fn_0_basic_block_0</title>
   <polygon points="399 -591 337.33 -573 399 -555 460.67 -573" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="348.85 -576.36 348.85 -569.64"/>
    <polyline points="387.48 -558.36 410.52 -558.36"/>
    <polyline points="449.15 -569.64 449.15 -576.36"/>
    <polyline points="410.52 -587.64 387.48 -587.64"/>
   </g>
   <text x="399" y="-569.3" font-family="Times,serif" font-size="14" text-anchor="middle">ENTRY</text>
  </g>
  <!-- fn_0_basic_block_1 -->
  <g class="node">
   <title>fn_0_basic_block_1</title>
   <polygon points="145 -52 98.25 -34 145 -16 191.75 -34" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="109.44 -38.31 109.44 -29.69"/>
    <polyline points="133.8 -20.31 156.2 -20.31"/>
    <polyline points="180.56 -29.69 180.56 -38.31"/>
    <polyline points="156.2 -47.69 133.8 -47.69"/>
   </g>
   <text x="145" y="-30.3" font-family="Times,serif" font-size="14" text-anchor="middle">EXIT</text>
  </g>
  <!-- fn_0_basic_block_0&#45;&gt;fn_0_basic_block_1 -->
  <!-- fn_0_basic_block_2 -->
  <g class="node">
   <title>fn_0_basic_block_2</title>
   <polygon points="290.5 -461.5 290.5 -499.5 507.5 -499.5 507.5 -461.5" fill="#d3d3d3" stroke="#000"/>
   <text x="298.5" y="-484.3" font-family="Times,serif" font-size="14">COUNT:966367640&lt;bb 2&gt;:</text>
   <text x="298.5" y="-469.3" font-family="Times,serif" font-size="14">goto &lt;bb 4&gt;; [100.00%]</text>
  </g>
  <!-- fn_0_basic_block_0&#45;&gt;fn_0_basic_block_2 -->
  <g class="edge">
   <title>fn_0_basic_block_0:s-&gt;fn_0_basic_block_2:n</title>
   <path d="m399-555v44.34" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="402.5 -510.5 399 -500.5 395.5 -510.5" stroke="#000" stroke-width="2"/>
   <text x="425" y="-525.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_2&#45;&gt;fn_0_basic_block_4 -->
  <g class="edge">
   <title>fn_0_basic_block_2:s-&gt;fn_0_basic_block_4:n</title>
   <path d="m399-460.5v64.46" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="402.5 -396 399 -386 395.5 -396" stroke="#000" stroke-width="2"/>
   <text x="425" y="-427.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_0_basic_block_5&#45;&gt;fn_0_basic_block_1 -->
  <g class="edge">
   <title>fn_0_basic_block_5:s-&gt;fn_0_basic_block_1:n</title>
   <path d="m145-113.5v51.33" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="148.5 -62 145 -52 141.5 -62" stroke="#000" stroke-width="2"/>
   <text x="171" y="-73.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_1_basic_block_0 -->
  <g class="node">
   <title>fn_1_basic_block_0</title>
   <polygon points="763 -591 701.33 -573 763 -555 824.67 -573" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="712.85 -576.36 712.85 -569.64"/>
    <polyline points="751.48 -558.36 774.52 -558.36"/>
    <polyline points="813.15 -569.64 813.15 -576.36"/>
    <polyline points="774.52 -587.64 751.48 -587.64"/>
   </g>
   <text x="763" y="-569.3" font-family="Times,serif" font-size="14" text-anchor="middle">ENTRY</text>
  </g>
  <!-- fn_1_basic_block_1 -->
  <g class="node">
   <title>fn_1_basic_block_1</title>
   <polygon points="763 -334 716.25 -316 763 -298 809.75 -316" fill="#fff" stroke="#000"/>
   <g fill="none" stroke="#000">
    <polyline points="727.44 -320.31 727.44 -311.69"/>
    <polyline points="751.8 -302.31 774.2 -302.31"/>
    <polyline points="798.56 -311.69 798.56 -320.31"/>
    <polyline points="774.2 -329.69 751.8 -329.69"/>
   </g>
   <text x="763" y="-312.3" font-family="Times,serif" font-size="14" text-anchor="middle">EXIT</text>
  </g>
  <!-- fn_1_basic_block_0&#45;&gt;fn_1_basic_block_1 -->
  <!-- fn_1_basic_block_2 -->
  <g class="node">
   <title>fn_1_basic_block_2</title>
   <polygon points="650 -457.5 650 -503.5 876 -503.5 876 -457.5" fill="#d3d3d3" stroke="#000"/>
   <text x="658" y="-488.3" font-family="Times,serif" font-size="14">COUNT:1073741824&lt;bb 2&gt;:</text>
   <polyline points="650 -480.5 876 -480.5" fill="none" stroke="#000"/>
   <text x="658" y="-465.3" font-family="Times,serif" font-size="14">return 0;</text>
  </g>
  <!-- fn_1_basic_block_0&#45;&gt;fn_1_basic_block_2 -->
  <g class="edge">
   <title>fn_1_basic_block_0:s-&gt;fn_1_basic_block_2:n</title>
   <path d="m763-555v40.65" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="766.5 -514 763 -504 759.5 -514" stroke="#000" stroke-width="2"/>
   <text x="789" y="-525.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
  <!-- fn_1_basic_block_2&#45;&gt;fn_1_basic_block_1 -->
  <g class="edge">
   <title>fn_1_basic_block_2:s-&gt;fn_1_basic_block_1:n</title>
   <path d="m763-457v111.71" fill="none" stroke="#000" stroke-width="2"/>
   <polygon points="766.5 -345 763 -335 759.5 -345" stroke="#000" stroke-width="2"/>
   <text x="789" y="-427.8" font-family="Times,serif" font-size="14" text-anchor="middle">[100%]</text>
  </g>
 </g>
</svg>

1. [[likely]] 和 [[unlikely]] 的作用

作用对象：编译器的静态分支预测器（Compiler’s Predictor），不是 CPU 硬件预测器。
功能：
- 给编译器提示：这个分支/循环“很可能被执行”或“不太可能被执行”。
- 编译器会根据提示优化：
  - 代码布局（hot path vs cold path）
  - 生成分支权重信息（影响跳转表顺序、块顺序）
限制：
- 对现代 CPU 的动态分支预测器几乎没有直接影响。
- 对简单热循环，添加 [[likely]] 往往不会改变汇编。

2. 示例：循环的 [[likely]] / [[unlikely]]

循环条件 `while (n > 0)`

int log2(int n) {
    int res = -1;
    while (n > 0) {
        res++;
        n /= 2;
    }
    return res;
}

原始汇编：

test edi, edi
jle .L4
.L3:
add eax, 1
sar edi
jne .L3

加 [[likely]]：

while (n > 0) { [[likely]] }

汇编几乎无变化。
加 [[unlikely]]：

while (n > 0) { [[unlikely]] }

汇编顺序会稍微反向，但实际循环热路径仍然执行在连续指令路径上。
对 CPU 的性能影响很小。

结论：循环体这种明显热路径，[[likely]]/[[unlikely]] 用处不大。

3. 示例：早期返回 / 条件分支

判断质数 `is_prime()`

bool is_prime(int n) {
    for (int i = 2; i*i < n; ++i) {
        if (n % i == 0) return false;
    }
    return true;
}

如果我们认为 多数数是质数：
- 可以给 if (n % i == 0) 添加 [[unlikely]]。
- 编译器会把返回 false 的路径放到冷路径，优化 hot path（返回 true）。
如果多数数不是质数：
- 可以给 if (n % i == 0) 添加 [[likely]]。

[[likely]] / [[unlikely]] 对循环外的早期返回、异常处理、冷分支更有效。

4. 异常情况 / 异常分支

void validate(std::string_view sv) {
    if (sv.size() < 8) throw std::string("password too short");
}

抛异常通常是 冷分支。
编译器可以用 [[unlikely]] 提示，把异常路径放到代码末尾，优化热路径（正常执行）。

5. 实际经验总结

循环热路径：
- [[likely]] / [[unlikely]] 很少改变汇编。
- CPU 本身的动态预测足够智能。
条件早期返回 / 异常 / 少见情况：
- [[unlikely]] 可以显著改善代码布局和 I-cache 利用。
GCC vs Clang：
- GCC 会生成估计分支频率的可视化。
- Clang 在很多情况下生成相同汇编，不区分 [[likely]] / [[unlikely]]。
PGO（Profile-Guided Optimization）：
- 如果想真的优化分支预测，PGO 更可靠。
- [[likely]] / [[unlikely]] 只是粗略的静态提示。

解释每个函数和 `[[likely]]` / `[[unlikely]]` 的用途：

#include <iostream>
#include <string_view>
#include <string>
// 计算 n 的二进制位数 - 1
int log2(int n)
{
    int res = -1;
    while (n > 0) { // 热路径
        res++;
        n /= 2;
    }
    return res;
}
// 带 [[likely]] 的 log2 示例
int log2_likely(int n)
{
    int res = -1;
    while (n > 0) { [[likely]] // 循环体很可能被执行
        res++;
        n /= 2;
    }
    return res;
}
// 判断是否为质数
bool is_prime(int n)
{
    for (int i = 2; i * i < n; ++i) {
        if (n % i == 0) {
            return false;
        }
    }
    return true;
}
// 带 [[likely]] 的质数判断
bool is_prime_likely(int n)
{
    for (int i = 2; i * i < n; ++i) {
        [[likely]] // if 分支可能成立
        if (n % i == 0) {
            return false;
        }
    }
    return true;
}
// 带 [[unlikely]] 的质数判断
bool is_prime_unlikely(int n)
{
    for (int i = 2; i * i < n; ++i) {
        if (n % i == 0) {
            [[unlikely]] // if 分支很少成立
            return false;
        }
    }
    return true;
}
// 校验密码长度
void validate(std::string_view sv)
{
    if (sv.size() < 8) { // 冷路径
        throw std::string("password too short");
    }
}
// 异常路径带 [[likely]]（通常不推荐）
void validate_unlikely(std::string_view sv)
{
    if (sv.size() < 8) {
        [[likely]] 
        throw std::string("password too short");
    }
}
// main 函数示例
int main()
{
    std::cout << "log2(16) = " << log2(16) << "\n";
    std::cout << "log2_likely(16) = " << log2_likely(16) << "\n";
    int nums[] = {2, 3, 4, 5, 16, 17, 19};
    for (int n : nums) {
        std::cout << n << " is_prime: " << is_prime(n) << "\n";
        std::cout << n << " is_prime_likely: " << is_prime_likely(n) << "\n";
        std::cout << n << " is_prime_unlikely: " << is_prime_unlikely(n) << "\n";
    }
    try {
        std::string short_pw = "12345";
        validate(short_pw);
    } catch (const std::string &e) {
        std::cout << "validate exception: " << e << "\n";
    }
    try {
        std::string short_pw = "12345";
        validate_unlikely(short_pw);
    } catch (const std::string &e) {
        std::cout << "validate_unlikely exception: " << e << "\n";
    }
    return 0;
}

说明总结

循环热路径：
- log2 循环体是热路径，[[likely]] 对 GCC/Clang 编译结果影响很小。
条件分支：
- is_prime 中 if (n % i == 0)，加 [[likely]] 或 [[unlikely]] 可以优化热路径代码布局。
- [[likely]]：分支常发生 → 放在热路径。
- [[unlikely]]：分支少发生 → 放在冷路径。
异常 / 罕见情况：
- 异常抛出通常是冷路径，[[unlikely]] 更适合。
注意：
- [[likely]] / [[unlikely]] 仅影响编译器静态预测。
- 对现代 CPU 的动态分支预测影响微乎其微。
- 对真正性能优化，建议使用 PGO（Profile-Guided Optimization）。