KCC Negative Innovation Deadlock: Why N=3 Guarantees Convergence
0. Parameters and Notation
All calculations below are based on KCC default parameters, cross-verified in the source code (tcp_kcc.c):
| Parameter | Symbol | Default | Physical Meaning |
|---|---|---|---|
kcc_kalman_q | QQQ | 100 | Process noise covariance |
kcc_kalman_r | RRR | 400 | Measurement noise covariance |
kcc_kalman_scale | SSS | 1024 (2102^{10}210) | Fixed-point scaling factor |
kcc_kalman_p_est_floor | — | 10 | p_estp\_estp_est lower bound |
kcc_kalman_converged_k_ppm | — | 250,000 | Convergence criterion (K≤0.25K \leq 0.25K≤0.25) |
KCC_KALMAN_CONVERGED_MIN | — | 1 | Convergence threshold lower bound |
kcc_kalman_outlier_ms | — | 5ms | Outlier gate base threshold |
KCC_NEG_PERSIST_THRESH | NNN | 3 | Persistence test threshold |
KCC_FORCED_DROP_FLOOR_SHIFT | — | 3 | Floor gate shift (÷8\div 8÷8) |
kcc_kalman_max_consec_reject | — | 25 | Forced acceptance gate upper limit |
Derived constants:
converged_val=250000⋅4001000000−250000−100=1087.5×105−100=33dyn_thresh=5ms⋅1000(μs/ms)⋅1024=5.12×106(approx. 5ms × scale)q_boost_thresh=4⋅1ms⋅1000⋅1024=4.096×106(approx. 4ms × scale)\begin{aligned} converged\_val &= \frac{250000 \cdot 400}{1000000 - 250000} - 100 = \frac{10^8}{7.5 \times 10^5} - 100 = 33 \\[4pt] dyn\_thresh &= 5\text{ms} \cdot 1000\text{(μs/ms)} \cdot 1024 = 5.12 \times 10^6 \quad (\text{approx. 5ms × scale}) \\[4pt] q\_boost\_thresh &= 4 \cdot 1\text{ms} \cdot 1000 \cdot 1024 = 4.096 \times 10^6 \quad (\text{approx. 4ms × scale}) \end{aligned}converged_valdyn_threshq_boost_thresh=1000000−250000250000⋅400−100=7.5×105108−100=33=5ms⋅1000(μs/ms)⋅1024=5.12×106(approx. 5ms × scale)=4⋅1ms⋅1000⋅1024=4.096×106(approx. 4ms × scale)
Scenario: Chengdu → Hong Kong, normally routed through Shanghai (long path 65ms), occasionally flopping to Guangdong (short path 45ms).
Scaled physical quantities (×1024\times 1024×1024):
x_est(65ms)=66560,z(45ms)=46080,ν=46080−66560=−20480x\_est(\text{65ms}) = 66560, \quad z(\text{45ms}) = 46080, \quad \nu = 46080 - 66560 = -20480x_est(65ms)=66560,z(45ms)=46080,ν=46080−66560=−20480
Drop ratio = (65−45)/65=30.8%(65-45)/65 = 30.8\%(65−45)/65=30.8%, far exceeding the floor gate’s upper bound of 12.5%.
1. Old Code: Per-RTT Derivation (Why It Was Unreliable)
The old code (commit 95f28e7) had the following gating conditions:
Outlier gate: |ν| > dyn_thresh && p_est ≤ converged_val
Floor gate: p_est > converged_val || z ≥ x_est × 7/8
Case A: p_est=10p\_est = 10p_est=10 (filter at floor value, Q suppressed by a long-term clean path)
RTT 1: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=−20480, ∣ν∣=20480|\nu| = 20480∣ν∣=20480
① Outlier gate: 20480>5.12×10620480 > 5.12 \times 10^620480>5.12×106 ✓, p_est=10≤33p\_est = 10 \leq 33p_est=10≤33 ✓ → Reject
Upon rejection, p_estp\_estp_est is set to p_predp\_predp_pred. The filter state is updated (even if the sample is rejected):
p_pred=p_est+Q=10+100=110p\_pred = p\_est + Q = 10 + 100 = 110p_pred=p_est+Q=10+100=110
p_est:=p_pred=110p\_est := p\_pred = 110p_est:=p_pred=110
The rejected sample does not enter x_estx\_estx_est:
x_est=66560(unchanged, still 65ms)x\_est = 66560 \quad (\text{unchanged, still 65ms})x_est=66560(unchanged, still 65ms)
② The function returns, never reaching the floor gate.
RTT 2: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=−20480
① Outlier gate: p_est=110>33p\_est = 110 > 33p_est=110>33 → p_est≤33=falsep\_est \leq 33 = falsep_est≤33=false → Not triggered
② Sample bypasses the outlier gate and enters the floor gate:
p_est=110>33→short-circuit condition triggers→floor skippedp\_est = 110 > 33 \quad \rightarrow \quad \text{short-circuit condition triggers} \quad \rightarrow \quad \text{floor skipped}p_est=110>33→short-circuit condition triggers→floor skipped
x_est=z=46080(45ms, converged!)x\_est = z = 46080 \quad (\text{45ms, converged!})x_est=z=46080(45ms, converged!)
Case A conclusion: With Q=100Q=100Q=100, after RTT 1 rejects the sample, p_estp\_estp_est rises to 110, and RTT 2 bypasses the floor. Converged in 2 RTTs, no deadlock.
Case B: p_est=10p\_est = 10p_est=10, Q=0Q = 0Q=0 (extreme suppression – lower bound of adaptive Q)
RTT 1: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=−20480
① Outlier gate: 20480>5.12×10620480 > 5.12 \times 10^620480>5.12×106 ✓, 10≤3310 \leq 3310≤33 ✓ → Reject
With Q=0Q = 0Q=0: p_pred=p_est+Q=10+0=10p\_pred = p\_est + Q = 10 + 0 = 10p_pred=p_est+Q=10+0=10, p_est:=10p\_est := 10p_est:=10 (unchanged!)
RTT 2: p_est=10≤33p\_est = 10 \leq 33p_est=10≤33 ✓ → Outlier gate triggers again → Reject.
p_pred=10+0=10p\_pred = 10 + 0 = 10p_pred=10+0=10, p_est:=10p\_est := 10p_est:=10 (still unchanged!)
RTT 3: same as above → Reject.
RTT N: p_est=10p\_est = 10p_est=10 never rises → Outlier gate triggers forever → sample never reaches the floor gate → x_estx\_estx_est stuck at 65ms forever.
Case B conclusion: With Q=0Q = 0Q=0, the filter is permanently deadlocked.
The Fundamental Problem
The behavior of the old code depended on the value of QQQ — and QQQ is dynamically adaptive (depending on min_rtt_us / q_rtt_div), not a constant the designer can guarantee in advance. It is impossible to assert “this code will converge under all conditions.”
2. New Code: Per-RTT Derivation (Deterministic Proof)
The new code (commit 88f1b8e) completely replaced the p_estp\_estp_est gating for negative innovations — using the neg_skip_cnt persistence counter instead. The following derivation does not depend on any specific values of p_estp\_estp_est, QQQ, or RRR.
Assumptions (worst case):
- x_est=66560x\_est = 66560x_est=66560 (65ms, steady-state value of the long path)
- z=46080z = 46080z=46080 (45ms, measurement from the new short path, constant every RTT)
- neg_skip_cnt=0neg\_skip\_cnt = 0neg_skip_cnt=0 (starting state, no prior negative innovation)
- last_neg_mstamp=0last\_neg\_mstamp = 0last_neg_mstamp=0 (negative timestamp never recorded)
RTT 1
① neg_skip_cnt increment
if (innovation < 0) {
u32 threshold = rtt_us >> 1; // 45000/2 = 22500
threshold = max_t(u32, threshold, 1U);
// = 22500
bool enough_time_elapsed;
enough_time_elapsed = (0 == 0) || ((now - 0) >= 22500);
// last_neg_mstamp == 0 → true
if (enough_time_elapsed) {
ext->neg_skip_cnt = (0 < 254) ? 0+1 : 254;
// = 1
}
}
neg_skip_cnt: 0 → 1. last_neg_mstamp = now.
② Outlier gate
if (unlikely(!qboost_fired && abs_innov > dyn_thresh)) {
// 20480 > 5.12e6 → true
if (innovation < 0) {
outlier_reject = (1 < 3); // true
}
if (outlier_reject) {
// consec_reject_cnt(0) < 25 → true
// p_est = p_pred, update jitter, return
}
}
outlier_reject=trueoutlier\_reject = trueoutlier_reject=true → Reject.
Side effects on rejection: p_est:=p_pred=p_est+Qp\_est := p\_pred = p\_est + Qp_est:=p_pred=p_est+Q (p_estp\_estp_est increases, but this has no effect on the new gating — the outlier gate no longer inspects p_estp\_estp_est).
③ x_est unchanged: x_est=66560x\_est = 66560x_est=66560 (65ms).
RTT 2
① neg_skip_cnt increment
now−last_neg_mstamp≈45000μsnow - last\_neg\_mstamp \approx 45000\mu snow−last_neg_mstamp≈45000μs (one RTT ≈ 45ms)
45000≥22500→enough_time_elapsed=true45000 \geq 22500 \quad \rightarrow \quad enough\_time\_elapsed = true45000≥22500→enough_time_elapsed=true
neg_skip_cnt=1+1=2neg\_skip\_cnt = 1 + 1 = 2neg_skip_cnt=1+1=2
② Outlier gate
outlier_reject=(2<3)=true→Rejectoutlier\_reject = (2 < 3) = true \quad \rightarrow \quad \text{Reject}outlier_reject=(2<3)=true→Reject
③ x_est unchanged: x_est=66560x\_est = 66560x_est=66560 (65ms).
RTT 3
① neg_skip_cnt increment
45000≥22500→enough_time_elapsed=true45000 \geq 22500 \quad \rightarrow \quad enough\_time\_elapsed = true45000≥22500→enough_time_elapsed=true
neg_skip_cnt=2+1=3neg\_skip\_cnt = 2 + 1 = 3neg_skip_cnt=2+1=3
② Outlier gate
outlier_reject=(3<3)=false→Bypass!outlier\_reject = (3 < 3) = false \quad \rightarrow \quad \text{Bypass!}outlier_reject=(3<3)=false→Bypass!
The sample penetrates the outlier gate and enters subsequent processing.
③ Directional gate: ν=−20480<0\nu = -20480 < 0ν=−20480<0 → ν≤0\nu \leq 0ν≤0 branch → negative innovation → x_est = z path.
④ Floor gate
u64 floor = x_est - (x_est >> 3);
// floor = 66560 - 8320 = 58240
if (neg_skip_cnt >= 3 || z >= floor) {
// 3 >= 3 → true! Bypass the floor check!
x_est = (u32)min_t(u64, z, U32_MAX);
// = 46080 (45ms!)
}
⑤ x_est = 46080 (45ms) ← Converged!
⑥ Automatic reset after convergence:
if (x_updated) {
ext->pos_skip_cnt = 0;
ext->neg_skip_cnt = 0; // ← Cleared
ext->drift_sum = 0;
}
neg_skip_cnt=0neg\_skip\_cnt = 0neg_skip_cnt=0, ready for the next potential route change.
RTT 4: Steady-state maintenance
- x_est=46080x\_est = 46080x_est=46080 (45ms), z≈46080z \approx 46080z≈46080, ν≈0\nu \approx 0ν≈0
- ν≈0\nu \approx 0ν≈0 → ν≥0\nu \geq 0ν≥0 branch →
neg_skip_cnt = 0(cleared) - ∣ν∣≈0<5.12×106|\nu| \approx 0 < 5.12 \times 10^6∣ν∣≈0<5.12×106 → Outlier gate does not trigger
- Floor gate: z≈46080≥46080×0.875=40320z \approx 46080 \geq 46080 \times 0.875 = 40320z≈46080≥46080×0.875=40320 ✓ → passed normally
- Normal Kalman update maintains x_estx\_estx_est around 45ms
Convergence summary table
| RTT | neg_skip_cnt | Outlier Gate | Floor Gate | x_est | p_est Dependency? |
|---|---|---|---|---|---|
| 1 | 1 | Reject (1<3) | — | 66560 | No |
| 2 | 2 | Reject (2<3) | — | 66560 | No |
| 3 | 3 | Pass (3≥3) | Bypass (3≥3) | 46080 ✅ | No |
| 4 | 0 | Not triggered (ν≈0) | Pass | ≈46080 | No |
Key property: The entire chain does not check p_estp\_estp_est, QQQ, or RRR. Convergence depends solely on the independent state neg_skip_cnt.
3. Why N=3: Joint Probability Under Multi-Layer Defense
neg_skip_cnt is just a counter — whether it can correctly increment to 3 depends on whether the preceding samples can penetrate all front-line defense layers.
3.1 Sources of False Negative Innovations
Three known sources of ACK contamination can produce spurious ν<0\nu < 0ν<0:
| Source | Mechanism | Physical Magnitude |
|---|---|---|
| TSO (sender-side TCP Segmentation Offload) | A 64KB large send is split by the NIC into ~44 MTU-sized packets burst within ~1μs; receiver ACKs are compressed and returned, causing RTT samples to be偏低. | Error ≈ 64KB/C64\text{KB}/C64KB/C. 500Mbps link ≈ 1ms; 10Mbps link ≈ 51ms |
| GSO (receiver-side Generic Receive Offload) | GRO merges multiple consecutive ACKs into one, losing inter-packet spacing information, further biasing RTT samples low. | Stacked on top of TSO, amplifying the error |
| Timestamp error | Kernel tcp_mstamp precision ≈ 1μs; interrupt latency may introduce additional deviation. | ≤ 1ms (extreme interrupt storms) |
3.2 A Single Layer is Insufficient: A Simple Probability Estimate
Consider only the outlier gate’s base threshold (5ms): on a 500Mbps link, TSO-induced RTT error ≈ 1ms < 5ms → a single TSO batch will not trigger the outlier gate. However, on a 100Mbps link, the TSO error can reach 5.1ms — just crossing the threshold.
If only the outlier gate existed, a single TSO event on a slow link could produce a spurious rejection. But neg_skip_cnt requires 3 consecutive times to break through — requiring TSO batches over 3 independent RTTs, each crossing the 5ms threshold.
From a naive binomial perspective: 2−3=12.5%2^{-3} = 12.5\%2−3=12.5% is already a conservative upper bound. But the true probability is far lower because four pre-defense layers are active simultaneously.
3.3 Joint Filtering by Four Defense Layers
For a spurious negative to successfully enter neg_skip_cnt, it must simultaneously penetrate all of the following layers:
RTT sample → [Temporal gating] → [Outlier gate base threshold] → [Jitter EWMA dynamic threshold] → [Directional gate] → neg_skip_cnt++
First Layer: Temporal Gating (rtt/2rtt/2rtt/2 interval)
ACK spacing within a TSO batch is about 1μs; rtt/2=22.5rtt/2 = 22.5rtt/2=22.5ms. Only the first ACK of the first batch passes. The subsequent ~43 ACKs are all discarded by the temporal gate.
Ptime=144≈0.023(a single TSO batch yields only 1 count opportunity)P_{time} = \frac{1}{44} \approx 0.023 \quad (\text{a single TSO batch yields only 1 count opportunity})Ptime=441≈0.023(a single TSO batch yields only 1 count opportunity)
Second Layer: Outlier Gate Base Threshold (5ms)
Triggers only when ∣ν∣>5|\nu| > 5∣ν∣>5ms. TSO-induced RTT error = 64KB/C64\text{KB}/C64KB/C:
| Link Rate | TSO Error | > 5ms threshold? | PbaseP_{base}Pbase |
|---|---|---|---|
| 10Mbps | 51.2ms | ✓ | ≈1.0 |
| 100Mbps | 5.12ms | ✓ (marginal) | ≈0.5 |
| 500Mbps | 1.02ms | ✗ | ≈0 |
| 1Gbps | 0.51ms | ✗ | ≈0 |
On the 500Mbps link discussed in this article, the TSO error itself cannot trigger the outlier gate. The outlier gate responds to TSO only on truly large path switches (20ms step) or extremely slow links.
Third Layer: Jitter EWMA Dynamic Threshold
Even if the outlier gate is triggered by the base threshold, a dynamic threshold applies:
dyn_thresh=max(outlier_ms⋅scale, jitter_ewma⋅outlier_jitter_mult⋅scale)dyn\_thresh = \max(outlier\_ms \cdot scale,\; jitter\_ewma \cdot outlier\_jitter\_mult \cdot scale)dyn_thresh=max(outlier_ms⋅scale,jitter_ewma⋅outlier_jitter_mult⋅scale)
jitter_ewmajitter\_ewmajitter_ewma is an EWMA of jitter with α=0.125\alpha=0.125α=0.125 (Neff≈15N_{eff} \approx 15Neff≈15 samples). An isolated RTT spike caused by TSO is diluted by a factor of 1/81/81/8 in the EWMA — a single spurious pulse hardly raises the dynamic threshold.
When jitter_ewmajitter\_ewmajitter_ewma is low, the dynamic threshold degrades to the base threshold (5ms). But when real sustained jitter exists on the path, the dynamic threshold is raised, making subsequent TSO artifacts even harder to trigger the gate.
Fourth Layer: Directional Gate
Samples with ν>0\nu > 0ν>0 are rejected by the directional update rule. However, TSO/GSO happen to produce ν<0\nu < 0ν<0 (RTT biased low) — this bypasses the directional gate.
But there is another crucial fact: ACK compression not only produces ν<0\nu < 0ν<0, but also produces temporal correlation: negatives from the same TSO batch are clustered in time (μs-scale spacing), and the temporal gate is specifically designed to counter this clustering.
3.4 Joint Probability
On the 500Mbps × 45ms RTT link discussed here:
P(single-RTT false negative)=Ptime⋅Pbase⋅Pjitter⋅PdirectionalP(\text{single-RTT false negative}) = P_{time} \cdot P_{base} \cdot P_{jitter} \cdot P_{directional}P(single-RTT false negative)=Ptime⋅Pbase⋅Pjitter⋅Pdirectional
=0.023⋅≈0⋅1.0⋅1.0≈0= 0.023 \cdot \approx 0 \cdot 1.0 \cdot 1.0 \approx 0=0.023⋅≈0⋅1.0⋅1.0≈0
TSO error ≈ 1ms on a 500Mbps link, far below the 5ms base threshold — the second defense layer already pushes the probability to zero.
On a 100Mbps link (TSO error ≈ 5.1ms, marginally passing the base threshold):
P(single-RTT false negative)≈0.023⋅0.5⋅1.0⋅1.0≈0.012P(\text{single-RTT false negative}) \approx 0.023 \cdot 0.5 \cdot 1.0 \cdot 1.0 \approx 0.012P(single-RTT false negative)≈0.023⋅0.5⋅1.0⋅1.0≈0.012
P(3 consecutive RTTs)≈0.0123≈1.7×10−6P(\text{3 consecutive RTTs}) \approx 0.012^3 \approx 1.7 \times 10^{-6}P(3 consecutive RTTs)≈0.0123≈1.7×10−6
This is about 1 in 600,000 — three independent filters (temporal gating × base threshold × EWMA smoothing) suppress the penetration rate of a single TSO artifact to ~1.2%, and the joint probability of three penetrations to the parts-per-million level.
Conversely, a real physical path drop (e.g., BGP reroute) produces a stable ν<0\nu < 0ν<0 every RTT; both the base and dynamic thresholds are triggered consistently, and the temporal gate is satisfied by the regular intervals. Real signals and TSO artifacts are highly distinguishable under multi-layer joint filtering.
3.5 Re-examining N=2 vs N=3
From the joint probability perspective:
| N | Neyman-Pearson Upper Bound | Joint Probability on 100Mbps | Assessment |
|---|---|---|---|
| 2 | ≤ 25% | ≈ 1.4×10−41.4 \times 10^{-4}1.4×10−4 | Upper bound appears weak, but joint probability is already 1 in 7000. However: data-center 10μs-scale RTTs weaken temporal gating (rtt/2 ≈ 5μs, TSO batch μs spacing may sneak through); WiFi links with 40ms+ jitter may raise jitter EWMA so much that the base threshold is no longer effective. N=2 is not robust enough in these extreme environments. |
| 3 | ≤ 12.5% | ≈ 1.7×10−61.7 \times 10^{-6}1.7×10−6 | Upper bound is conservative; joint probability is at the parts-per-million level. Even in data centers or high-jitter links, the combined suppression of three filters pushes three consecutive false negatives to statistically negligible levels. |
N=3 is not determined by a simple 2−32^{-3}2−3, but by the actual penetration probability under the joint action of multiple defense layers. 2−32^{-3}2−3 is merely a theoretical loose upper bound — in real engineering, the four pre-filters have already suppressed the single-RTT false negative probability to ~1.2%, and the three-consecutive joint probability to below one in a million. Even if this one-in-a-million occurs, the persistent upward pull provided by drift detection and Q-boost (see §3.7) will correct x_estx\_estx_est back to the true T_propT\_{prop}T_prop within tens to hundreds of RTTs. The asymmetric design of “fast downward pull by negatives + slow upward pull by positives” guarantees the filter’s robustness across all timescales.
3.6 Marginal Analysis of N=4
If KCC_NEG_PERSIST_THRESH were set to 4, the convergence process would become:
| RTT | neg_skip_cnt | Outlier Gate | Floor Gate | x_est |
|---|---|---|---|---|
| 1 | 1 | Reject (1<4) | — | 66560 |
| 2 | 2 | Reject (2<4) | — | 66560 |
| 3 | 3 | Reject (3<4) | — | 66560 |
| 4 | 4 | Pass (4≥4) | Bypass | 46080 ✅ |
Convergence latency: 4×45ms=180ms4 \times 45\text{ms} = 180\text{ms}4×45ms=180ms.
Joint probability for N=4 on a 100Mbps link:
P(4 consecutive RTT false negatives)≈0.0124≈2.1×10−8P(\text{4 consecutive RTT false negatives}) \approx 0.012^4 \approx 2.1 \times 10^{-8}P(4 consecutive RTT false negatives)≈0.0124≈2.1×10−8
Compared to N=3’s 1.7×10−61.7 \times 10^{-6}1.7×10−6, N=4 reduces the false-positive rate by two orders of magnitude — from one in a million to one in a hundred million.
But this comes with a real additional latency cost: N=3 → N=4 adds 1 RTT (45ms) of convergence waiting. In the Neyman-Pearson marginal analysis:
| N | Joint Probability | Convergence Latency | Marginal Probability Gain | Marginal Latency Cost |
|---|---|---|---|---|
| 1→2 | 0.5→0.0120.5 \rightarrow 0.0120.5→0.012 | 45→90ms | ≈2.5×10−3\approx 2.5 \times 10^{-3}≈2.5×10−3 / RTT | +45ms |
| 2→3 | 0.012→1.7×10−60.012 \rightarrow 1.7 \times 10^{-6}0.012→1.7×10−6 | 90→135ms | ≈1.4×10−4\approx 1.4 \times 10^{-4}≈1.4×10−4 / RTT | +45ms |
| 3→4 | 1.7×10−6→2.1×10−81.7 \times 10^{-6} \rightarrow 2.1 \times 10^{-8}1.7×10−6→2.1×10−8 | 135→180ms | ≈1.7×10−6\approx 1.7 \times 10^{-6}≈1.7×10−6 / RTT | +45ms |
From N=3 to N=4, the marginal probability gain is only 1.7×10−61.7 \times 10^{-6}1.7×10−6 per RTT — nearly two orders of magnitude smaller than the N=2→3 gain of 1.4×10−41.4 \times 10^{-4}1.4×10−4 per RTT. Waiting an extra RTT yields safety improvement that is statistically dwarfed, yet every real path switch of the same duration is delayed by one more RTT. This is classic diminishing marginal returns — N=3 is the last node on the curve with a still-significant marginal value.
3.7 The Ultimate Safety Net: Even if the One-in-a-Million Occurs, Drift + Q-boost Will Pull It Back
Assume the extreme scenario — the one-in-a-million probability actually happens — three consecutive TSO/GSO artifacts penetrate all four defense layers, and x_estx\_estx_est is incorrectly pulled down to a value lower than the true T_propT\_{prop}T_prop (e.g., 40ms while the true path is 45ms).
What happens then? The filter has been tricked into “converging” on a false low value. Thereafter, every correct RTT sample (45ms) will produce ν>0\nu > 0ν>0 (measurement higher than estimate).
Looking only at the directional gate: ν>0\nu > 0ν>0 samples are rejected by the directional update rule — this appears to lock x_estx\_estx_est permanently at 40ms. But KCC has two upward pulling forces:
Drift Detection (Tier-1 / Tier-2)
When ν>0\nu > 0ν>0 is rejected, pos_skip_cnt increments. This is the cumulative counter for positive innovations — symmetric to neg_skip_cnt. When pos_skip_cnt reaches the drift threshold (default 16), Tier-1 triggers:
corr=K⋅∣ν∣4corr = \frac{K \cdot |\nu|}{4}corr=4K⋅∣ν∣
i.e., a quarter of the normal Kalman correction is applied (lagging, controlled upward correction). Tier-2 triggers at threshold 16×8=12816 \times 8 = 12816×8=128, applying an even smaller correction (K⋅∣ν∣/8K \cdot |\nu| / 8K⋅∣ν∣/8).
Drift detection is not as aggressive as Q-boost — it is slow and cumulative, modeling the gradual baseline drift of a physical wireless link or fiber. But once triggered, it moves x_estx\_estx_est gradually upward, back toward the true T_propT\_{prop}T_prop.
Q-boost (Large Step Detection)
If x_estx\_estx_est is pulled too far down (e.g., 40ms vs true 45ms, a 5ms gap), and ∣ν∣>q_boost_thresh|\nu| > q\_boost\_thresh∣ν∣>q_boost_thresh (≈ 4ms), Q-boost triggers:
- p_estp\_estp_est reset to 1000 (high uncertainty)
qboost_fired = true, the outlier gate is skipped- For ν>0\nu > 0ν>0, the L12149-L12157 path is taken: x_est=x_est+K⋅∣ν∣x\_est = x\_est + K \cdot |\nu|x_est=x_est+K⋅∣ν∣, applying the full Kalman correction, pulling it back in one step
Understanding through a fluid dynamics analogy
RTT sampling in a network is inherently asymmetric:
- Most samples carry T_queue>0T\_{queue} > 0T_queue>0 → ν>0\nu > 0ν>0 (positive innovations are the norm — a direct consequence of the Lindley recurrence q_k+1=max(0,q_k+∑λ_i−C)q\_{k+1} = \max(0, q\_k + \sum\lambda\_i - C)q_k+1=max(0,q_k+∑λ_i−C): as long as the arrival rate approaches the bottleneck capacity, queues accumulate and RTT rises)
- Sustained ν<0\nu < 0ν<0 only appears during PROBE_BW’s DRAIN phase (pacing_gain=0.75x) and after PROBE_RTT drains the queue
Therefore, in the long-term statistical average of the flow, ν≥0\nu \geq 0ν≥0 samples dominate. neg_skip_cnt handles transient path drops (completed within a few RTTs), while drift + Q-boost handle the persistent upward pull (scales of tens to hundreds of RTTs).
Here is the concrete code verification:
Drift Tier-1 (L12347–L12360): pos_skip_cnt ≥ drift_thresh(16) && jitter_ewma < min_rtt/8 → corr_abs >> 2 (i.e., K⋅∣ν∣/4K \cdot |\nu| / 4K⋅∣ν∣/4, one quarter of the normal correction). This is a slow response to “sustained small positive biases” — not an immediate correction, but a lagging local correction applied only after accumulation reaches statistical significance.
Drift Tier-2 (L12368–L12382): pos_skip_cnt ≥ drift_thresh × 8 = 128 → corr_abs >> 3 (K⋅∣ν∣/8K \cdot |\nu| / 8K⋅∣ν∣/8, one eighth). This is a forced correction for “long-term stubborn biases” with an extremely high statistical threshold (Neyman-Pearson upper bound 2−128≈2.9×10−392^{-128} \approx 2.9 \times 10^{-39}2−128≈2.9×10−39), triggered virtually only when the filter is certain it has drifted away.
Q-boost (L11904–L11911 + L12149–L12157): ν>0\nu > 0ν>0, ∣ν∣>4|\nu| > 4∣ν∣>4ms and p_est≤converged_valp\_est \leq converged\_valp_est≤converged_val → p_estp\_estp_est reset to 1000, applies the full Kalman correction x_est=x_est+K⋅∣ν∣x\_est = x\_est + K \cdot |\nu|x_est=x_est+K⋅∣ν∣ in one step.
These three together constitute a persistent upward pull, guaranteeing at the code level that — even if x_estx\_estx_est is incorrectly pulled down by a spurious negative innovation:
3.8 Practical Consequences of an Incorrectly Lowered x_est: Bandwidth Utilization Drop
If x_estx\_estx_est is pulled below the true T_propT\_{prop}T_prop by a false negative, it affects not just filter estimation accuracy — it directly depresses link bandwidth utilization.
KCC defaults to FILTER mode (kcc_rtt_mode = 0, L7705), where model_rtt is taken directly from x_est_usx\_est\_usx_est_us (L12534: t_prop_scaled = kcc_rtt_mode == 0 ? ext->x_est : ...). BDP calculation (L10509: target = kcc_bdp(sk, bw, gain, ext)) depends on model_rtt:
BDP=bw×model_rttBDP = bw \times model\_rttBDP=bw×model_rtt
target_cwnd=BDP×cwnd_gain(steady-state cwnd_gain=2.0)target\_cwnd = BDP \times cwnd\_gain \quad (\text{steady-state } cwnd\_gain = 2.0)target_cwnd=BDP×cwnd_gain(steady-state cwnd_gain=2.0)
In PROBE_BW steady state, cwnd is limited by target (L10519: cwnd = min(cwnd + acked, target)).
If x_estx\_estx_est is incorrectly pulled to 40ms (true T_prop=45T\_{prop} = 45T_prop=45ms):
BDP_wrong=bw×40msBDP_true=bw×45msUtilization loss=45−4045≈11%\begin{aligned} BDP\_{wrong} &= bw \times 40\text{ms} \\ BDP\_{true} &= bw \times 45\text{ms} \\ \text{Utilization loss} &= \frac{45 - 40}{45} \approx 11\% \end{aligned}BDP_wrongBDP_trueUtilization loss=bw×40ms=bw×45ms=4545−40≈11%
For a 500Mbps link, this translates to a loss of approximately 55Mbps — just for a 5ms x_estx\_estx_est error.
Conversely, if x_estx\_estx_est is overestimated (e.g., min_rtt inflation in BBR mode), BDP is overestimated → cwnd too large → excess injection → packet loss. KCC’s directional gate is precisely designed to prevent this — but the directional gate is asymmetric: it blocks ν>0\nu > 0ν>0 updates (preventing x_estx\_estx_est from being inflated by queuing), yet allows ν<0\nu < 0ν<0 updates (fast convergence on path drops). The price of this asymmetric design is that x_estx\_estx_est lacks defense against false negatives — and this is exactly the gap filled by the neg_skip_cnt mechanism.
Summarizing this mechanical balance:
- Moving downward: ν<0\nu < 0ν<0 passes the directional gate → x_estx\_estx_est is pulled down.
neg_skip_cntensures this is controlled (3 RTT persistence test), not every isolated noise spike gets through. - Moving upward: ν>0\nu > 0ν>0 is blocked by the directional gate → x_estx\_estx_est cannot rise. But Drift Tier-1/2 and Q-boost provide a controlled, lagged upward channel — they intervene only when statistically significant, preventing the filter from overreacting to noise.
- Cost of error: x_estx\_estx_est too low → utilization loss (insufficient cwnd); x_estx\_estx_est too high → packet loss (excessive cwnd). KCC chooses the asymmetric strategy of “better too low than too high” (directional gate blocks upward moves,
neg_skip_cntensures the reality of downward moves), because the cost of packet loss far outweighs the cost of temporarily lower utilization.
3.9 Comparison of Multi-Layer Penetration
| Defense Layer | Real Path Drop (45→65→45) | TSO Artifact (500Mbps) | TSO Artifact (100Mbps) |
|---|---|---|---|
| Temporal gate (rtt/2) | Passes once per RTT ✓ | Only 1 pass per batch (1/44) | Same as left |
| Base threshold (5ms) | 20ms ≫ 5ms ✓ | 1ms < 5ms ✗ Blocked | 5.1ms ≈ 5ms marginal |
| Jitter EWMA | Sustained trigger → adaptive raise → but still passes | — | Passively raised → raises bar |
| Directional gate | ν<0 → passes ✓ | ν<0 → passes ✓ | ν<0 → passes ✓ |
On the vast majority of links (≥ 500Mbps), the second defense layer (base threshold) directly zeros out TSO artifact penetration. On these links, neg_skip_cnt responds almost exclusively to real path changes.
4. Why Any ν≥0\nu \geq 0ν≥0 Immediately Resets the Counter
The fundamental premise of the Neyman-Pearson test: “persistent evidence” constituted by N consecutive negatives.
A single sample with ν≥0\nu \geq 0ν≥0 (even a neutral ν=0\nu = 0ν=0) provides evidence in the opposite direction — sufficient to overturn the hypothesis that “we are currently in a continuous downward trend.” Hence:
else {
ext->neg_skip_cnt = 0;
ext->last_neg_mstamp = 0;
}
This reset is unconditional — any positive or zero innovation instantly terminates the persistence count. If the path is indeed dropping, subsequent samples will re-accumulate; if it was just random jitter, the reset prevents a false positive.
5. Complete Mathematical Statement
The new code provides the following deterministic guarantee in the negative path-drop scenario:
∀ p_est,Q,R,x_est_init,if Tprop drops abruptly from t0 to t1<t0\forall\ p\_est, Q, R, x\_est\_init,\quad \text{if } T_{prop} \text{ drops abruptly from } t_0 \text{ to } t_1 < t_0∀ p_est,Q,R,x_est_init,if Tprop drops abruptly from t0 to t1<t0
and ∣t1−t0∣>dyn_thresh/S(drop magnitude captured by dynamic threshold)\text{and } |t_1 - t_0| > dyn\_thresh/S \quad (\text{drop magnitude captured by dynamic threshold})and ∣t1−t0∣>dyn_thresh/S(drop magnitude captured by dynamic threshold)
and 3 consecutive RTT samples are all near t1\text{and 3 consecutive RTT samples are all near } t_1and 3 consecutive RTT samples are all near t1
then x_est is corrected to t1 on the 3rd RTT sample\text{then } x\_est \text{ is corrected to } t_1 \text{ on the 3rd RTT sample}then x_est is corrected to t1 on the 3rd RTT sample
where S=1024S = 1024S=1024 is the scaling factor. Convergence time = 3×t13 \times t_13×t1. This promise is completely decoupled from the filter’s internal state.
6. Summary
The old code’s p_est≤converged_valp\_est \leq converged\_valp_est≤converged_val gating was a direction-agnostic condition used in both directions — coincidentally valid for positives (“large innovation during convergence = noise”), but state-dependent for negatives (“large innovation during convergence = noise? Depends on the current value of Q”). When Q was suppressed to 0, p_estp\_estp_est could not rebound after a rejection, forming a permanent interlock between the two gates.
The new code uses neg_skip_cnt to provide direction-aware gating for the two directions: negatives go through a persistence test (N=3), positives still use p_estp\_estp_est. The Neyman-Pearson sequential framework gives a Type-I error bound of 2−3=12.5%2^{-3}=12.5\%2−3=12.5%, TSO temporal gating filters out burst-compression artifacts, and an immediate reset on ν≥0\nu \geq 0ν≥0 prevents false accumulation.
Conclusion: In any scenario, convergence from an inflated initial value to the new physical baseline is completed in at most 3 RTTs.


405

被折叠的 条评论
为什么被折叠?



