KCC Negative Innovation Deadlock: Why N=3 Guarantees Convergence

KCC Negative Innovation Deadlock: Why N=3 Guarantees Convergence

0. Parameters and Notation

All calculations below are based on KCC default parameters, cross-verified in the source code (tcp_kcc.c):

ParameterSymbolDefaultPhysical Meaning
kcc_kalman_qQQQ100Process noise covariance
kcc_kalman_rRRR400Measurement noise covariance
kcc_kalman_scaleSSS1024 (2102^{10}210)Fixed-point scaling factor
kcc_kalman_p_est_floor10p_estp\_estp_est lower bound
kcc_kalman_converged_k_ppm250,000Convergence criterion (K≤0.25K \leq 0.25K0.25)
KCC_KALMAN_CONVERGED_MIN1Convergence threshold lower bound
kcc_kalman_outlier_ms5msOutlier gate base threshold
KCC_NEG_PERSIST_THRESHNNN3Persistence test threshold
KCC_FORCED_DROP_FLOOR_SHIFT3Floor gate shift (÷8\div 8÷8)
kcc_kalman_max_consec_reject25Forced acceptance gate upper limit

Derived constants:

converged_val=250000⋅4001000000−250000−100=1087.5×105−100=33dyn_thresh=5ms⋅1000(μs/ms)⋅1024=5.12×106(approx. 5ms × scale)q_boost_thresh=4⋅1ms⋅1000⋅1024=4.096×106(approx. 4ms × scale)\begin{aligned} converged\_val &= \frac{250000 \cdot 400}{1000000 - 250000} - 100 = \frac{10^8}{7.5 \times 10^5} - 100 = 33 \\[4pt] dyn\_thresh &= 5\text{ms} \cdot 1000\text{(μs/ms)} \cdot 1024 = 5.12 \times 10^6 \quad (\text{approx. 5ms × scale}) \\[4pt] q\_boost\_thresh &= 4 \cdot 1\text{ms} \cdot 1000 \cdot 1024 = 4.096 \times 10^6 \quad (\text{approx. 4ms × scale}) \end{aligned}converged_valdyn_threshq_boost_thresh=1000000250000250000400100=7.5×105108100=33=5ms1000(μs/ms)1024=5.12×106(approx. 5ms × scale)=41ms10001024=4.096×106(approx. 4ms × scale)

Scenario: Chengdu → Hong Kong, normally routed through Shanghai (long path 65ms), occasionally flopping to Guangdong (short path 45ms).

Scaled physical quantities (×1024\times 1024×1024):

x_est(65ms)=66560,z(45ms)=46080,ν=46080−66560=−20480x\_est(\text{65ms}) = 66560, \quad z(\text{45ms}) = 46080, \quad \nu = 46080 - 66560 = -20480x_est(65ms)=66560,z(45ms)=46080,ν=4608066560=20480

Drop ratio = (65−45)/65=30.8%(65-45)/65 = 30.8\%(6545)/65=30.8%, far exceeding the floor gate’s upper bound of 12.5%.


1. Old Code: Per-RTT Derivation (Why It Was Unreliable)

The old code (commit 95f28e7) had the following gating conditions:

Outlier gate: |ν| > dyn_thresh && p_est ≤ converged_val
Floor gate:   p_est > converged_val || z ≥ x_est × 7/8

Case A: p_est=10p\_est = 10p_est=10 (filter at floor value, Q suppressed by a long-term clean path)

RTT 1: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=20480, ∣ν∣=20480|\nu| = 20480ν=20480

① Outlier gate: 20480>5.12×10620480 > 5.12 \times 10^620480>5.12×106 ✓, p_est=10≤33p\_est = 10 \leq 33p_est=1033 ✓ → Reject

Upon rejection, p_estp\_estp_est is set to p_predp\_predp_pred. The filter state is updated (even if the sample is rejected):
p_pred=p_est+Q=10+100=110p\_pred = p\_est + Q = 10 + 100 = 110p_pred=p_est+Q=10+100=110
p_est:=p_pred=110p\_est := p\_pred = 110p_est:=p_pred=110

The rejected sample does not enter x_estx\_estx_est:
x_est=66560(unchanged, still 65ms)x\_est = 66560 \quad (\text{unchanged, still 65ms})x_est=66560(unchanged, still 65ms)

② The function returns, never reaching the floor gate.

RTT 2: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=20480

① Outlier gate: p_est=110>33p\_est = 110 > 33p_est=110>33p_est≤33=falsep\_est \leq 33 = falsep_est33=falseNot triggered

② Sample bypasses the outlier gate and enters the floor gate:
p_est=110>33→short-circuit condition triggers→floor skippedp\_est = 110 > 33 \quad \rightarrow \quad \text{short-circuit condition triggers} \quad \rightarrow \quad \text{floor skipped}p_est=110>33short-circuit condition triggersfloor skipped
x_est=z=46080(45ms, converged!)x\_est = z = 46080 \quad (\text{45ms, converged!})x_est=z=46080(45ms, converged!)

Case A conclusion: With Q=100Q=100Q=100, after RTT 1 rejects the sample, p_estp\_estp_est rises to 110, and RTT 2 bypasses the floor. Converged in 2 RTTs, no deadlock.

Case B: p_est=10p\_est = 10p_est=10, Q=0Q = 0Q=0 (extreme suppression – lower bound of adaptive Q)

RTT 1: z=46080z = 46080z=46080, ν=−20480\nu = -20480ν=20480

① Outlier gate: 20480>5.12×10620480 > 5.12 \times 10^620480>5.12×106 ✓, 10≤3310 \leq 331033 ✓ → Reject

With Q=0Q = 0Q=0: p_pred=p_est+Q=10+0=10p\_pred = p\_est + Q = 10 + 0 = 10p_pred=p_est+Q=10+0=10, p_est:=10p\_est := 10p_est:=10 (unchanged!)

RTT 2: p_est=10≤33p\_est = 10 \leq 33p_est=1033 ✓ → Outlier gate triggers again → Reject.

p_pred=10+0=10p\_pred = 10 + 0 = 10p_pred=10+0=10, p_est:=10p\_est := 10p_est:=10 (still unchanged!)

RTT 3: same as above → Reject.

RTT N: p_est=10p\_est = 10p_est=10 never rises → Outlier gate triggers forever → sample never reaches the floor gate → x_estx\_estx_est stuck at 65ms forever.

Case B conclusion: With Q=0Q = 0Q=0, the filter is permanently deadlocked.

The Fundamental Problem

The behavior of the old code depended on the value of QQQ — and QQQ is dynamically adaptive (depending on min_rtt_us / q_rtt_div), not a constant the designer can guarantee in advance. It is impossible to assert “this code will converge under all conditions.”


2. New Code: Per-RTT Derivation (Deterministic Proof)

The new code (commit 88f1b8e) completely replaced the p_estp\_estp_est gating for negative innovations — using the neg_skip_cnt persistence counter instead. The following derivation does not depend on any specific values of p_estp\_estp_est, QQQ, or RRR.

Assumptions (worst case):

  • x_est=66560x\_est = 66560x_est=66560 (65ms, steady-state value of the long path)
  • z=46080z = 46080z=46080 (45ms, measurement from the new short path, constant every RTT)
  • neg_skip_cnt=0neg\_skip\_cnt = 0neg_skip_cnt=0 (starting state, no prior negative innovation)
  • last_neg_mstamp=0last\_neg\_mstamp = 0last_neg_mstamp=0 (negative timestamp never recorded)

RTT 1

① neg_skip_cnt increment

if (innovation < 0) {
    u32 threshold = rtt_us >> 1;    // 45000/2 = 22500
    threshold = max_t(u32, threshold, 1U);
    // = 22500
    bool enough_time_elapsed;
    enough_time_elapsed = (0 == 0) || ((now - 0) >= 22500);
    // last_neg_mstamp == 0 → true
    if (enough_time_elapsed) {
        ext->neg_skip_cnt = (0 < 254) ? 0+1 : 254;
        // = 1
    }
}

neg_skip_cnt: 0 → 1. last_neg_mstamp = now.

② Outlier gate

if (unlikely(!qboost_fired && abs_innov > dyn_thresh)) {
    // 20480 > 5.12e6 → true
    if (innovation < 0) {
        outlier_reject = (1 < 3);  // true
    }
    if (outlier_reject) {
        // consec_reject_cnt(0) < 25 → true
        // p_est = p_pred, update jitter, return
    }
}

outlier_reject=trueoutlier\_reject = trueoutlier_reject=trueReject.

Side effects on rejection: p_est:=p_pred=p_est+Qp\_est := p\_pred = p\_est + Qp_est:=p_pred=p_est+Q (p_estp\_estp_est increases, but this has no effect on the new gating — the outlier gate no longer inspects p_estp\_estp_est).

③ x_est unchanged: x_est=66560x\_est = 66560x_est=66560 (65ms).

RTT 2

① neg_skip_cnt increment

now−last_neg_mstamp≈45000μsnow - last\_neg\_mstamp \approx 45000\mu snowlast_neg_mstamp45000μs (one RTT ≈ 45ms)

45000≥22500→enough_time_elapsed=true45000 \geq 22500 \quad \rightarrow \quad enough\_time\_elapsed = true4500022500enough_time_elapsed=true

neg_skip_cnt=1+1=2neg\_skip\_cnt = 1 + 1 = 2neg_skip_cnt=1+1=2

② Outlier gate

outlier_reject=(2<3)=true→Rejectoutlier\_reject = (2 < 3) = true \quad \rightarrow \quad \text{Reject}outlier_reject=(2<3)=trueReject

③ x_est unchanged: x_est=66560x\_est = 66560x_est=66560 (65ms).

RTT 3

① neg_skip_cnt increment

45000≥22500→enough_time_elapsed=true45000 \geq 22500 \quad \rightarrow \quad enough\_time\_elapsed = true4500022500enough_time_elapsed=true

neg_skip_cnt=2+1=3neg\_skip\_cnt = 2 + 1 = 3neg_skip_cnt=2+1=3

② Outlier gate

outlier_reject=(3<3)=false→Bypass!outlier\_reject = (3 < 3) = false \quad \rightarrow \quad \text{Bypass!}outlier_reject=(3<3)=falseBypass!

The sample penetrates the outlier gate and enters subsequent processing.

③ Directional gate: ν=−20480<0\nu = -20480 < 0ν=20480<0ν≤0\nu \leq 0ν0 branch → negative innovation → x_est = z path.

④ Floor gate

u64 floor = x_est - (x_est >> 3);
// floor = 66560 - 8320 = 58240

if (neg_skip_cnt >= 3 || z >= floor) {
    // 3 >= 3 → true! Bypass the floor check!
    x_est = (u32)min_t(u64, z, U32_MAX);
    // = 46080 (45ms!)
}

⑤ x_est = 46080 (45ms) ← Converged!

⑥ Automatic reset after convergence:

if (x_updated) {
    ext->pos_skip_cnt = 0;
    ext->neg_skip_cnt = 0;  // ← Cleared
    ext->drift_sum = 0;
}

neg_skip_cnt=0neg\_skip\_cnt = 0neg_skip_cnt=0, ready for the next potential route change.

RTT 4: Steady-state maintenance

  • x_est=46080x\_est = 46080x_est=46080 (45ms), z≈46080z \approx 46080z46080, ν≈0\nu \approx 0ν0
  • ν≈0\nu \approx 0ν0ν≥0\nu \geq 0ν0 branch → neg_skip_cnt = 0 (cleared)
  • ∣ν∣≈0<5.12×106|\nu| \approx 0 < 5.12 \times 10^6ν0<5.12×106 → Outlier gate does not trigger
  • Floor gate: z≈46080≥46080×0.875=40320z \approx 46080 \geq 46080 \times 0.875 = 40320z4608046080×0.875=40320 ✓ → passed normally
  • Normal Kalman update maintains x_estx\_estx_est around 45ms

Convergence summary table

RTTneg_skip_cntOutlier GateFloor Gatex_estp_est Dependency?
11Reject (1<3)66560No
22Reject (2<3)66560No
33Pass (3≥3)Bypass (3≥3)46080 ✅No
40Not triggered (ν≈0)Pass≈46080No

Key property: The entire chain does not check p_estp\_estp_est, QQQ, or RRR. Convergence depends solely on the independent state neg_skip_cnt.


3. Why N=3: Joint Probability Under Multi-Layer Defense

neg_skip_cnt is just a counter — whether it can correctly increment to 3 depends on whether the preceding samples can penetrate all front-line defense layers.

3.1 Sources of False Negative Innovations

Three known sources of ACK contamination can produce spurious ν<0\nu < 0ν<0:

SourceMechanismPhysical Magnitude
TSO (sender-side TCP Segmentation Offload)A 64KB large send is split by the NIC into ~44 MTU-sized packets burst within ~1μs; receiver ACKs are compressed and returned, causing RTT samples to be偏低.Error ≈ 64KB/C64\text{KB}/C64KB/C. 500Mbps link ≈ 1ms; 10Mbps link ≈ 51ms
GSO (receiver-side Generic Receive Offload)GRO merges multiple consecutive ACKs into one, losing inter-packet spacing information, further biasing RTT samples low.Stacked on top of TSO, amplifying the error
Timestamp errorKernel tcp_mstamp precision ≈ 1μs; interrupt latency may introduce additional deviation.≤ 1ms (extreme interrupt storms)

3.2 A Single Layer is Insufficient: A Simple Probability Estimate

Consider only the outlier gate’s base threshold (5ms): on a 500Mbps link, TSO-induced RTT error ≈ 1ms < 5ms → a single TSO batch will not trigger the outlier gate. However, on a 100Mbps link, the TSO error can reach 5.1ms — just crossing the threshold.

If only the outlier gate existed, a single TSO event on a slow link could produce a spurious rejection. But neg_skip_cnt requires 3 consecutive times to break through — requiring TSO batches over 3 independent RTTs, each crossing the 5ms threshold.

From a naive binomial perspective: 2−3=12.5%2^{-3} = 12.5\%23=12.5% is already a conservative upper bound. But the true probability is far lower because four pre-defense layers are active simultaneously.

3.3 Joint Filtering by Four Defense Layers

For a spurious negative to successfully enter neg_skip_cnt, it must simultaneously penetrate all of the following layers:

RTT sample → [Temporal gating] → [Outlier gate base threshold] → [Jitter EWMA dynamic threshold] → [Directional gate] → neg_skip_cnt++

First Layer: Temporal Gating (rtt/2rtt/2rtt/2 interval)

ACK spacing within a TSO batch is about 1μs; rtt/2=22.5rtt/2 = 22.5rtt/2=22.5ms. Only the first ACK of the first batch passes. The subsequent ~43 ACKs are all discarded by the temporal gate.

Ptime=144≈0.023(a single TSO batch yields only 1 count opportunity)P_{time} = \frac{1}{44} \approx 0.023 \quad (\text{a single TSO batch yields only 1 count opportunity})Ptime=4410.023(a single TSO batch yields only 1 count opportunity)

Second Layer: Outlier Gate Base Threshold (5ms)

Triggers only when ∣ν∣>5|\nu| > 5ν>5ms. TSO-induced RTT error = 64KB/C64\text{KB}/C64KB/C:

Link RateTSO Error> 5ms threshold?PbaseP_{base}Pbase
10Mbps51.2ms≈1.0
100Mbps5.12ms✓ (marginal)≈0.5
500Mbps1.02ms≈0
1Gbps0.51ms≈0

On the 500Mbps link discussed in this article, the TSO error itself cannot trigger the outlier gate. The outlier gate responds to TSO only on truly large path switches (20ms step) or extremely slow links.

Third Layer: Jitter EWMA Dynamic Threshold

Even if the outlier gate is triggered by the base threshold, a dynamic threshold applies:

dyn_thresh=max⁡(outlier_ms⋅scale,  jitter_ewma⋅outlier_jitter_mult⋅scale)dyn\_thresh = \max(outlier\_ms \cdot scale,\; jitter\_ewma \cdot outlier\_jitter\_mult \cdot scale)dyn_thresh=max(outlier_msscale,jitter_ewmaoutlier_jitter_multscale)

jitter_ewmajitter\_ewmajitter_ewma is an EWMA of jitter with α=0.125\alpha=0.125α=0.125 (Neff≈15N_{eff} \approx 15Neff15 samples). An isolated RTT spike caused by TSO is diluted by a factor of 1/81/81/8 in the EWMA — a single spurious pulse hardly raises the dynamic threshold.

When jitter_ewmajitter\_ewmajitter_ewma is low, the dynamic threshold degrades to the base threshold (5ms). But when real sustained jitter exists on the path, the dynamic threshold is raised, making subsequent TSO artifacts even harder to trigger the gate.

Fourth Layer: Directional Gate

Samples with ν>0\nu > 0ν>0 are rejected by the directional update rule. However, TSO/GSO happen to produce ν<0\nu < 0ν<0 (RTT biased low) — this bypasses the directional gate.

But there is another crucial fact: ACK compression not only produces ν<0\nu < 0ν<0, but also produces temporal correlation: negatives from the same TSO batch are clustered in time (μs-scale spacing), and the temporal gate is specifically designed to counter this clustering.

3.4 Joint Probability

On the 500Mbps × 45ms RTT link discussed here:

P(single-RTT false negative)=Ptime⋅Pbase⋅Pjitter⋅PdirectionalP(\text{single-RTT false negative}) = P_{time} \cdot P_{base} \cdot P_{jitter} \cdot P_{directional}P(single-RTT false negative)=PtimePbasePjitterPdirectional

=0.023⋅≈0⋅1.0⋅1.0≈0= 0.023 \cdot \approx 0 \cdot 1.0 \cdot 1.0 \approx 0=0.02301.01.00

TSO error ≈ 1ms on a 500Mbps link, far below the 5ms base threshold — the second defense layer already pushes the probability to zero.

On a 100Mbps link (TSO error ≈ 5.1ms, marginally passing the base threshold):

P(single-RTT false negative)≈0.023⋅0.5⋅1.0⋅1.0≈0.012P(\text{single-RTT false negative}) \approx 0.023 \cdot 0.5 \cdot 1.0 \cdot 1.0 \approx 0.012P(single-RTT false negative)0.0230.51.01.00.012

P(3 consecutive RTTs)≈0.0123≈1.7×10−6P(\text{3 consecutive RTTs}) \approx 0.012^3 \approx 1.7 \times 10^{-6}P(3 consecutive RTTs)0.01231.7×106

This is about 1 in 600,000three independent filters (temporal gating × base threshold × EWMA smoothing) suppress the penetration rate of a single TSO artifact to ~1.2%, and the joint probability of three penetrations to the parts-per-million level.

Conversely, a real physical path drop (e.g., BGP reroute) produces a stable ν<0\nu < 0ν<0 every RTT; both the base and dynamic thresholds are triggered consistently, and the temporal gate is satisfied by the regular intervals. Real signals and TSO artifacts are highly distinguishable under multi-layer joint filtering.

3.5 Re-examining N=2 vs N=3

From the joint probability perspective:

NNeyman-Pearson Upper BoundJoint Probability on 100MbpsAssessment
2≤ 25%1.4×10−41.4 \times 10^{-4}1.4×104Upper bound appears weak, but joint probability is already 1 in 7000. However: data-center 10μs-scale RTTs weaken temporal gating (rtt/2 ≈ 5μs, TSO batch μs spacing may sneak through); WiFi links with 40ms+ jitter may raise jitter EWMA so much that the base threshold is no longer effective. N=2 is not robust enough in these extreme environments.
3≤ 12.5%1.7×10−61.7 \times 10^{-6}1.7×106Upper bound is conservative; joint probability is at the parts-per-million level. Even in data centers or high-jitter links, the combined suppression of three filters pushes three consecutive false negatives to statistically negligible levels.

N=3 is not determined by a simple 2−32^{-3}23, but by the actual penetration probability under the joint action of multiple defense layers. 2−32^{-3}23 is merely a theoretical loose upper bound — in real engineering, the four pre-filters have already suppressed the single-RTT false negative probability to ~1.2%, and the three-consecutive joint probability to below one in a million. Even if this one-in-a-million occurs, the persistent upward pull provided by drift detection and Q-boost (see §3.7) will correct x_estx\_estx_est back to the true T_propT\_{prop}T_prop within tens to hundreds of RTTs. The asymmetric design of “fast downward pull by negatives + slow upward pull by positives” guarantees the filter’s robustness across all timescales.

3.6 Marginal Analysis of N=4

If KCC_NEG_PERSIST_THRESH were set to 4, the convergence process would become:

RTTneg_skip_cntOutlier GateFloor Gatex_est
11Reject (1<4)66560
22Reject (2<4)66560
33Reject (3<4)66560
44Pass (4≥4)Bypass46080 ✅

Convergence latency: 4×45ms=180ms4 \times 45\text{ms} = 180\text{ms}4×45ms=180ms.

Joint probability for N=4 on a 100Mbps link:

P(4 consecutive RTT false negatives)≈0.0124≈2.1×10−8P(\text{4 consecutive RTT false negatives}) \approx 0.012^4 \approx 2.1 \times 10^{-8}P(4 consecutive RTT false negatives)0.01242.1×108

Compared to N=3’s 1.7×10−61.7 \times 10^{-6}1.7×106, N=4 reduces the false-positive rate by two orders of magnitude — from one in a million to one in a hundred million.

But this comes with a real additional latency cost: N=3 → N=4 adds 1 RTT (45ms) of convergence waiting. In the Neyman-Pearson marginal analysis:

NJoint ProbabilityConvergence LatencyMarginal Probability GainMarginal Latency Cost
1→20.5→0.0120.5 \rightarrow 0.0120.50.01245→90ms≈2.5×10−3\approx 2.5 \times 10^{-3}2.5×103 / RTT+45ms
2→30.012→1.7×10−60.012 \rightarrow 1.7 \times 10^{-6}0.0121.7×10690→135ms≈1.4×10−4\approx 1.4 \times 10^{-4}1.4×104 / RTT+45ms
3→41.7×10−6→2.1×10−81.7 \times 10^{-6} \rightarrow 2.1 \times 10^{-8}1.7×1062.1×108135→180ms≈1.7×10−6\approx 1.7 \times 10^{-6}1.7×106 / RTT+45ms

From N=3 to N=4, the marginal probability gain is only 1.7×10−61.7 \times 10^{-6}1.7×106 per RTT — nearly two orders of magnitude smaller than the N=2→3 gain of 1.4×10−41.4 \times 10^{-4}1.4×104 per RTT. Waiting an extra RTT yields safety improvement that is statistically dwarfed, yet every real path switch of the same duration is delayed by one more RTT. This is classic diminishing marginal returns — N=3 is the last node on the curve with a still-significant marginal value.

3.7 The Ultimate Safety Net: Even if the One-in-a-Million Occurs, Drift + Q-boost Will Pull It Back

Assume the extreme scenario — the one-in-a-million probability actually happens — three consecutive TSO/GSO artifacts penetrate all four defense layers, and x_estx\_estx_est is incorrectly pulled down to a value lower than the true T_propT\_{prop}T_prop (e.g., 40ms while the true path is 45ms).

What happens then? The filter has been tricked into “converging” on a false low value. Thereafter, every correct RTT sample (45ms) will produce ν>0\nu > 0ν>0 (measurement higher than estimate).

Looking only at the directional gate: ν>0\nu > 0ν>0 samples are rejected by the directional update rule — this appears to lock x_estx\_estx_est permanently at 40ms. But KCC has two upward pulling forces:

Drift Detection (Tier-1 / Tier-2)

When ν>0\nu > 0ν>0 is rejected, pos_skip_cnt increments. This is the cumulative counter for positive innovations — symmetric to neg_skip_cnt. When pos_skip_cnt reaches the drift threshold (default 16), Tier-1 triggers:

corr=K⋅∣ν∣4corr = \frac{K \cdot |\nu|}{4}corr=4Kν

i.e., a quarter of the normal Kalman correction is applied (lagging, controlled upward correction). Tier-2 triggers at threshold 16×8=12816 \times 8 = 12816×8=128, applying an even smaller correction (K⋅∣ν∣/8K \cdot |\nu| / 8Kν∣/8).

Drift detection is not as aggressive as Q-boost — it is slow and cumulative, modeling the gradual baseline drift of a physical wireless link or fiber. But once triggered, it moves x_estx\_estx_est gradually upward, back toward the true T_propT\_{prop}T_prop.

Q-boost (Large Step Detection)

If x_estx\_estx_est is pulled too far down (e.g., 40ms vs true 45ms, a 5ms gap), and ∣ν∣>q_boost_thresh|\nu| > q\_boost\_threshν>q_boost_thresh (≈ 4ms), Q-boost triggers:

  • p_estp\_estp_est reset to 1000 (high uncertainty)
  • qboost_fired = true, the outlier gate is skipped
  • For ν>0\nu > 0ν>0, the L12149-L12157 path is taken: x_est=x_est+K⋅∣ν∣x\_est = x\_est + K \cdot |\nu|x_est=x_est+Kν, applying the full Kalman correction, pulling it back in one step

Understanding through a fluid dynamics analogy

RTT sampling in a network is inherently asymmetric:

  • Most samples carry T_queue>0T\_{queue} > 0T_queue>0ν>0\nu > 0ν>0 (positive innovations are the norm — a direct consequence of the Lindley recurrence q_k+1=max⁡(0,q_k+∑λ_i−C)q\_{k+1} = \max(0, q\_k + \sum\lambda\_i - C)q_k+1=max(0,q_k+λ_iC): as long as the arrival rate approaches the bottleneck capacity, queues accumulate and RTT rises)
  • Sustained ν<0\nu < 0ν<0 only appears during PROBE_BW’s DRAIN phase (pacing_gain=0.75x) and after PROBE_RTT drains the queue

Therefore, in the long-term statistical average of the flow, ν≥0\nu \geq 0ν0 samples dominate. neg_skip_cnt handles transient path drops (completed within a few RTTs), while drift + Q-boost handle the persistent upward pull (scales of tens to hundreds of RTTs).

Here is the concrete code verification:

Drift Tier-1 (L12347–L12360): pos_skip_cnt ≥ drift_thresh(16) && jitter_ewma < min_rtt/8corr_abs >> 2 (i.e., K⋅∣ν∣/4K \cdot |\nu| / 4Kν∣/4, one quarter of the normal correction). This is a slow response to “sustained small positive biases” — not an immediate correction, but a lagging local correction applied only after accumulation reaches statistical significance.

Drift Tier-2 (L12368–L12382): pos_skip_cnt ≥ drift_thresh × 8 = 128corr_abs >> 3 (K⋅∣ν∣/8K \cdot |\nu| / 8Kν∣/8, one eighth). This is a forced correction for “long-term stubborn biases” with an extremely high statistical threshold (Neyman-Pearson upper bound 2−128≈2.9×10−392^{-128} \approx 2.9 \times 10^{-39}21282.9×1039), triggered virtually only when the filter is certain it has drifted away.

Q-boost (L11904–L11911 + L12149–L12157): ν>0\nu > 0ν>0, ∣ν∣>4|\nu| > 4ν>4ms and p_est≤converged_valp\_est \leq converged\_valp_estconverged_valp_estp\_estp_est reset to 1000, applies the full Kalman correction x_est=x_est+K⋅∣ν∣x\_est = x\_est + K \cdot |\nu|x_est=x_est+Kν in one step.

These three together constitute a persistent upward pull, guaranteeing at the code level that — even if x_estx\_estx_est is incorrectly pulled down by a spurious negative innovation:

Upward Pull (Positive, verified from code)

Downward Pull (Negative)

neg_skip_cnt: path drop
→ 3 RTT fast convergence
(L11982-L11986)

Q-boost: large step (|ν|>4ms)
→ x_est += K·|ν| (L12152-L12156)

Drift Tier-1: pos≥16, quiet
→ corr/4 (L12353, L12347)

Drift Tier-2: pos≥128
→ corr/8 (L12375, L12368)

x_est balanced around T_prop

3.8 Practical Consequences of an Incorrectly Lowered x_est: Bandwidth Utilization Drop

If x_estx\_estx_est is pulled below the true T_propT\_{prop}T_prop by a false negative, it affects not just filter estimation accuracy — it directly depresses link bandwidth utilization.

KCC defaults to FILTER mode (kcc_rtt_mode = 0, L7705), where model_rtt is taken directly from x_est_usx\_est\_usx_est_us (L12534: t_prop_scaled = kcc_rtt_mode == 0 ? ext->x_est : ...). BDP calculation (L10509: target = kcc_bdp(sk, bw, gain, ext)) depends on model_rtt:

BDP=bw×model_rttBDP = bw \times model\_rttBDP=bw×model_rtt

target_cwnd=BDP×cwnd_gain(steady-state cwnd_gain=2.0)target\_cwnd = BDP \times cwnd\_gain \quad (\text{steady-state } cwnd\_gain = 2.0)target_cwnd=BDP×cwnd_gain(steady-state cwnd_gain=2.0)

In PROBE_BW steady state, cwnd is limited by target (L10519: cwnd = min(cwnd + acked, target)).

If x_estx\_estx_est is incorrectly pulled to 40ms (true T_prop=45T\_{prop} = 45T_prop=45ms):

BDP_wrong=bw×40msBDP_true=bw×45msUtilization loss=45−4045≈11%\begin{aligned} BDP\_{wrong} &= bw \times 40\text{ms} \\ BDP\_{true} &= bw \times 45\text{ms} \\ \text{Utilization loss} &= \frac{45 - 40}{45} \approx 11\% \end{aligned}BDP_wrongBDP_trueUtilization loss=bw×40ms=bw×45ms=45454011%

For a 500Mbps link, this translates to a loss of approximately 55Mbps — just for a 5ms x_estx\_estx_est error.

Conversely, if x_estx\_estx_est is overestimated (e.g., min_rtt inflation in BBR mode), BDP is overestimated → cwnd too large → excess injection → packet loss. KCC’s directional gate is precisely designed to prevent this — but the directional gate is asymmetric: it blocks ν>0\nu > 0ν>0 updates (preventing x_estx\_estx_est from being inflated by queuing), yet allows ν<0\nu < 0ν<0 updates (fast convergence on path drops). The price of this asymmetric design is that x_estx\_estx_est lacks defense against false negatives — and this is exactly the gap filled by the neg_skip_cnt mechanism.

Summarizing this mechanical balance:

  • Moving downward: ν<0\nu < 0ν<0 passes the directional gate → x_estx\_estx_est is pulled down. neg_skip_cnt ensures this is controlled (3 RTT persistence test), not every isolated noise spike gets through.
  • Moving upward: ν>0\nu > 0ν>0 is blocked by the directional gate → x_estx\_estx_est cannot rise. But Drift Tier-1/2 and Q-boost provide a controlled, lagged upward channel — they intervene only when statistically significant, preventing the filter from overreacting to noise.
  • Cost of error: x_estx\_estx_est too low → utilization loss (insufficient cwnd); x_estx\_estx_est too high → packet loss (excessive cwnd). KCC chooses the asymmetric strategy of “better too low than too high” (directional gate blocks upward moves, neg_skip_cnt ensures the reality of downward moves), because the cost of packet loss far outweighs the cost of temporarily lower utilization.

3.9 Comparison of Multi-Layer Penetration

Defense LayerReal Path Drop (45→65→45)TSO Artifact (500Mbps)TSO Artifact (100Mbps)
Temporal gate (rtt/2)Passes once per RTT ✓Only 1 pass per batch (1/44)Same as left
Base threshold (5ms)20ms ≫ 5ms ✓1ms < 5ms ✗ Blocked5.1ms ≈ 5ms marginal
Jitter EWMASustained trigger → adaptive raise → but still passesPassively raised → raises bar
Directional gateν<0 → passes ✓ν<0 → passes ✓ν<0 → passes ✓

On the vast majority of links (≥ 500Mbps), the second defense layer (base threshold) directly zeros out TSO artifact penetration. On these links, neg_skip_cnt responds almost exclusively to real path changes.


4. Why Any ν≥0\nu \geq 0ν0 Immediately Resets the Counter

The fundamental premise of the Neyman-Pearson test: “persistent evidence” constituted by N consecutive negatives.

A single sample with ν≥0\nu \geq 0ν0 (even a neutral ν=0\nu = 0ν=0) provides evidence in the opposite direction — sufficient to overturn the hypothesis that “we are currently in a continuous downward trend.” Hence:

else {
    ext->neg_skip_cnt = 0;
    ext->last_neg_mstamp = 0;
}

This reset is unconditional — any positive or zero innovation instantly terminates the persistence count. If the path is indeed dropping, subsequent samples will re-accumulate; if it was just random jitter, the reset prevents a false positive.


5. Complete Mathematical Statement

The new code provides the following deterministic guarantee in the negative path-drop scenario:

∀ p_est,Q,R,x_est_init,if Tprop drops abruptly from t0 to t1<t0\forall\ p\_est, Q, R, x\_est\_init,\quad \text{if } T_{prop} \text{ drops abruptly from } t_0 \text{ to } t_1 < t_0 p_est,Q,R,x_est_init,if Tprop drops abruptly from t0 to t1<t0

and ∣t1−t0∣>dyn_thresh/S(drop magnitude captured by dynamic threshold)\text{and } |t_1 - t_0| > dyn\_thresh/S \quad (\text{drop magnitude captured by dynamic threshold})and t1t0>dyn_thresh/S(drop magnitude captured by dynamic threshold)

and 3 consecutive RTT samples are all near t1\text{and 3 consecutive RTT samples are all near } t_1and 3 consecutive RTT samples are all near t1

then x_est is corrected to t1 on the 3rd RTT sample\text{then } x\_est \text{ is corrected to } t_1 \text{ on the 3rd RTT sample}then x_est is corrected to t1 on the 3rd RTT sample

where S=1024S = 1024S=1024 is the scaling factor. Convergence time = 3×t13 \times t_13×t1. This promise is completely decoupled from the filter’s internal state.


6. Summary

The old code’s p_est≤converged_valp\_est \leq converged\_valp_estconverged_val gating was a direction-agnostic condition used in both directions — coincidentally valid for positives (“large innovation during convergence = noise”), but state-dependent for negatives (“large innovation during convergence = noise? Depends on the current value of Q”). When Q was suppressed to 0, p_estp\_estp_est could not rebound after a rejection, forming a permanent interlock between the two gates.

The new code uses neg_skip_cnt to provide direction-aware gating for the two directions: negatives go through a persistence test (N=3), positives still use p_estp\_estp_est. The Neyman-Pearson sequential framework gives a Type-I error bound of 2−3=12.5%2^{-3}=12.5\%23=12.5%, TSO temporal gating filters out burst-compression artifacts, and an immediate reset on ν≥0\nu \geq 0ν0 prevents false accumulation.

Conclusion: In any scenario, convergence from an inflated initial value to the new physical baseline is completed in at most 3 RTTs.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值