💾 feat: Persist Context Breakdown & Branch/Total Usage Cost by danny-avila · Pull Request #13734 · danny-avila/LibreChat

danny-avila · 2026-06-14T00:45:14Z

Summary

I made the context-usage popover durable and branch-aware by persisting the granular breakdown and per-response usage/cost on the response message, then re-deriving branch + total usage/cost from the existing per-message token index instead of a page-session accumulator. Before this, the breakdown reset to the coarse fallback on reload and the cost row showed $0 on an existing conversation until a new turn was generated; the cost was also "events since page load," neither branch-scoped nor a true conversation total.

This builds on #13670 and persists onto the existing metadata field (Mongoose Mixed / z.record), so there is no migration.

Added aggregateEmittedUsage and buildPersistedContextUsage pure helpers in packages/api/src/agents/usage.ts (the rollup sums the same emitted on_token_usage payloads the client folds, so reload matches the live session; cost is summed additively and gated on interface.contextCost).
Captured the latest visible context snapshot and every emitted usage payload through new contextUsageSink/usageEmitSink wired from callbacks.js → initialize.js → client.js (the subagent emitter folds into the same sink).
Attached metadata.contextUsage (Part A — breakdown) and metadata.usage (Part B — usage/cost rollup) on the agents response message in sendCompletion, alongside the existing thoughtSignatures.
Extended the per-message token index (tokens.ts): entries now carry normalized usage, sumBranch returns branch usage, and added sumTotalUsage (all branches) + setEntryUsage (live flush) + mergeUsage.
Re-architected the live cost path: repurposed the cross-branch session accumulator into a single in-flight pendingUsageFamily holder, added totalUsageFamily, made foldUsage accumulate the current response and finalizeUsage flush it into the index exactly once — so cost is branch-aware live, not just on reload.
Added hydrateSnapshots to restore each branch's persisted breakdown on load (merges, never clobbering a live finalized snapshot).
Rendered the Input/Output/Cache totals from branch usage and the cost row as the branch cost, with a muted "All branches" total shown only when it differs (new com_ui_context_cost* i18n keys).
Verified the read path preserves metadata: GET /:conversationId uses a -_id -__v -user exclusion projection and sanitizeMessageForTransmit only strips fileContext/file text.

Design note

I followed the spec's raw-rollup approach (persist provider counts, normalize once on the client). Cost is therefore always exact (summed additively per event); token rows are exact for the single-provider main path and a minor approximation only when one turn mixes providers (subagents). This can be switched to backend-normalized display units if bit-exact token rows are wanted in that edge case.

Change Type

New feature (non-breaking change which adds functionality)

Testing

All changes are covered by unit + integration + e2e tests; tsc --noEmit and ESLint are clean across the 14 touched files.

packages/api/src/agents/usage.spec.ts — aggregateEmittedUsage (sum, cost gating, subagent/summarization fold, model/provider precedence) and buildPersistedContextUsage (zero-tool-count trimming).
api/server/controllers/agents/__tests__/usageEvents.integration.spec.js — through the real agents pipeline: sinks capture both model calls, the rollup aggregates correctly, and cost folds only when contextCost is on.
client/src/utils/tokens.spec.ts — buildIndex reads metadata.usage, sumBranch returns branch usage/cost, sumTotalUsage spans all branches, the regenerate scenario (branch vs. total), backward-compat (no metadata.usage → 0), setEntryUsage live flush, and rebuild idempotency.
e2e/specs/mock/usage.spec.ts — send → reload shows the granular breakdown and branch cost (Parts A + B persist); send → regenerate shows branch cost with the all-branches total line.

Test Configuration:

npm run test per workspace (packages/api, client, api); mock Playwright suite via npm run e2e:mock (interface.contextCost is enabled in the harness).

Checklist

My code adheres to this project's style guidelines
I have performed a self-review of my own code
I have commented in any complex areas of my code
My changes do not introduce new warnings
I have written tests demonstrating that my changes are effective or that my feature works
Local unit tests pass with my changes

github-actions · 2026-06-14T00:45:39Z

🚨 Unused i18next Keys Detected

The following translation keys are defined in translation.json but are not used in the codebase:

com_ui_session_cost

⚠️ Please remove these unused keys to keep the translation files clean.

danny-avila · 2026-06-14T00:48:08Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ec17c623a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T00:59:38Z

+      completedOutputTokens:
+        typeof message.tokenCount === 'number' ? message.tokenCount : undefined,


Persist only the post-snapshot output delta

When rehydrating a tool-using or otherwise multi-call agent turn after reload, metadata.contextUsage is the latest pre-invoke snapshot, which already includes tokens from earlier model/tool steps in that same response. The live finalizer only adds liveAtFinalize (output since that snapshot), but this assigns the response's full tokenCount, so earlier output tokens are counted again and the context gauge can overstate usage after reload. Persist/replay the post-snapshot delta instead of using the whole response count.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T00:59:38Z

+              </span>
+              <span className="font-medium text-text-primary">{formatCost(view.branchCost)}</span>
+            </div>
+            {view.totalCost !== view.branchCost && (


Use a tolerance before showing all-branches cost

In an unbranched conversation with several paid turns, sumBranch and sumTotalUsage add the same floating-point costs in opposite orders (branch walks tail-to-root, total walks insertion order). A strict !== can therefore be true only due to rounding, causing the popover to show a misleading “All branches” row even when there are no alternate branches. Compare rounded display values or use a small epsilon instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T00:59:39Z

+    model ??= event.model;
+    provider ??= event.provider;


Preserve per-event providers in usage rollups

When summarization is enabled with a different provider/model, its model-end usage can be the first usage event for the response before the primary agent call. These assignments then stamp the whole persisted rollup with the summarizer's provider, and readPersistedUsage later normalizes every token count using that one provider instead of the per-event providers that live foldUsage used. This makes branch input/cache/output totals change after reload for mixed-provider turns; persist normalized display units or keep per-event provider data.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T00:59:39Z

+    output: units.output,
+    cacheWrite: units.cacheWrite,
+    cacheRead: units.cacheRead,
+    cost: event.cost ?? 0,


Hide cost when persisted usage has no cost

When a turn was saved while interface.contextCost was off, the server still persists metadata.usage but omits cost; if cost display is later enabled for that deployment, this coercion turns the unknown cost into 0, and the popover renders Cost $0.00 because token usage exists. Preserve an undefined/has-cost state and only render the cost row for usage that actually carried cost.

Useful? React with 👍 / 👎.

danny-avila · 2026-06-14T01:08:46Z

Thanks — addressed all four in af066208f7:

Persist post-snapshot output delta (P2, store/usage): the backend now records the final primary call's output as completedOutputTokens on metadata.contextUsage, and rehydration uses that delta instead of the full response tokenCount. A reloaded multi-call/tool turn no longer double-counts earlier steps the snapshot already includes.
Preserve per-event providers in rollups (P2, usage.ts): aggregateEmittedUsage now normalizes each event with splitUsage (per its own provider) and sums display units, persisted as TResponseUsage. The client reads them directly — no single stamped provider — so mixed-provider turns (summarization/subagent on a different provider) match the live session on reload.
Hide cost when persisted usage has no cost (P3, tokens.ts): cost is omitted from the rollup when contextCost was off at save; a costKnown flag now gates the cost row so it never renders $0.00 for unknown cost.
Tolerance for the all-branches row (P3, Breakdown.tsx): the branch-vs-total comparison now uses an epsilon (total − branch > 1e-9) instead of !==, so float summation order can't surface a spurious "All branches" row in an unbranched conversation.

Also fixed the e2e regression: buildIndex now preserves an entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild during regenerate keeps the sibling branch's flushed cost. The regenerate e2e asserts the all-branches total after a reload (deterministic via persisted metadata).

danny-avila · 2026-06-14T01:08:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af066208f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T01:22:25Z

+    const completedOutputTokens =
+      event.completedOutputTokens ??
+      (typeof message.tokenCount === 'number' ? message.tokenCount : undefined);


Avoid treating missing deltas as the whole response

When an agent run emits on_context_usage but the provider/model does not return usage_metadata, buildPersistedContextUsage persists a new metadata.contextUsage blob without completedOutputTokens. On reload this fallback then uses the entire assistant tokenCount as the post-snapshot delta; for tool/multi-call turns the latest snapshot already includes earlier assistant/tool-call output in breakdown.messageTokens, so the gauge double-counts those tokens after reload instead of matching the live view. This needs a way to distinguish legacy blobs from new snapshots with no usage event, or avoid applying full tokenCount to multi-call snapshots.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T01:22:25Z

+  if (usage.costKnown) {
+    target.costKnown = true;
+  }


Require complete cost coverage before showing totals

When a branch contains a persisted metadata.usage saved while contextCost was off (so it has token usage but no cost) and a later response saved with cost, this OR marks the aggregate as costKnown and the popover renders Cost for the branch, even though target.cost excludes the earlier usage. This under-reports branch/all-branch cost after toggling cost display or after any cost-computation miss; track partial/unknown cost separately or only render the cost total when every usage-bearing entry contributed a cost.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T01:22:25Z

+      const pendingAtom = pendingUsageFamily(convoKey);
+      const prev = jotai.get(pendingAtom);
+      jotai.set(pendingAtom, {


Clear pending usage on aborted streams

These folded events now live in pendingUsageFamily until finalizeUsage moves them onto a response, but the terminal cancel/error paths only call resetLive, which still clears just the live token estimate. If a user stops a generation (or a terminal SSE error occurs) after an on_token_usage event but before a final event, the pending totals remain and get merged into the next response in the same conversation, making that response's branch/total usage and cost include tokens from the aborted run.

Useful? React with 👍 / 👎.

danny-avila · 2026-06-14T01:33:10Z

Round 2 addressed in 6bfd306113 (all three P2):

Pending usage leak on abort/error: resetLive (the terminal cancel/error path) now clears the in-flight pendingUsageFamily, so a stopped generation's folded tokens are discarded instead of merging into the next response in the conversation.
Partial cost coverage: costKnown now means complete coverage — it's ANDed across usage-bearing entries (vacuously true when empty, paired with hasUsage). A branch mixing turns saved with and without cost is flagged incomplete and the cost row is hidden rather than rendering an under-reported total.
Reload delta fallback: removed the ?? tokenCount fallback for completedOutputTokens. Only the persisted post-snapshot delta is applied, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output on reload (it contributes 0 — matching the snapshot's pre-final-call base — instead of the full response count).

danny-avila · 2026-06-14T01:33:11Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bfd306113

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T01:45:31Z

+    if (this.contextUsageSink?.latest) {
+      metadata.contextUsage = buildPersistedContextUsage(
+        this.contextUsageSink.latest,
+        this.usageEmitSink ?? [],
+      );


Avoid persisting snapshots without final output

When a provider emits on_context_usage but no usage_metadata, ModelEndHandler returns before emitting any token-usage event, so usageEmitSink is empty here. This still persists metadata.contextUsage, and after reload hydrateSnapshots prefers that pre-invoke snapshot over the coarse tokenCount fallback; because completedOutputTokens is absent, the gauge undercounts by the entire assistant reply for those providers. Either include a response-token fallback or skip persisting the snapshot unless the post-snapshot output is known.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T01:45:31Z

+            {hasBranches && (
+              <div className="flex items-center justify-between text-xs">
+                <span className="text-text-secondary">{localize('com_ui_context_cost_total')}</span>
+                <span className="text-text-secondary">{formatCost(view.totalCost)}</span>
+              </div>


Gate all-branches cost on complete coverage

If the active branch has complete costs but another branch contains a usage-bearing response saved without metadata.usage.cost (for example while interface.contextCost was off) plus any costed sibling, sumTotalUsage marks totalUsage.costKnown false but this row still renders from totalCost. That makes the "All branches" value a partial, under-reported total; hide this row unless view.totalUsage.costKnown is also true.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T01:45:31Z

+  let cost = 0;
+  let hasCost = false;
+  for (const event of events) {
+    const split = splitUsage(event);


Match live cache fallback for provider-less usage

For usage payloads that include cache details but omit provider, the live client uses normalizeUsageUnits' magnitude fallback to decide whether input_tokens already includes cache, but this persisted rollup calls backend splitUsage, which treats a missing provider as additive. In that provider-less cached case (for example an OpenAI-compatible/custom event without provider), the usage shown immediately after streaming can exclude cache from Input, then after reload metadata.usage is higher because cache was added back. Use the same fallback as the client or persist enough provider information before normalizing.

Useful? React with 👍 / 👎.

danny-avila · 2026-06-14T01:53:18Z

Round 3 addressed in 97c010ce9f (all three P2):

No-usage snapshots: buildResponseMetadata now persists metadata.contextUsage only when the response emitted a primary usage event (post-snapshot output is known). For providers that emit no usage_metadata, the snapshot is skipped and reload falls back to the coarse per-message tokenCount estimate (accurate totals) instead of a granular gauge that undercounts the reply.
All-branches coverage: the all-branches cost row (and the "(this branch)" label) is now gated on view.totalUsage.costKnown, so an incomplete total — e.g. a sibling branch saved without cost — is hidden rather than shown as an under-reported figure.
Provider-less cached events: aggregateEmittedUsage/finalCallOutputTokens now normalize per-event via normalizeEventUnits, which mirrors the client's normalizeUsageUnits magnitude fallback (cache ≤ input ⇒ subset) instead of billing splitUsage (additive for missing provider). A provider-less cached event now yields the same Input live and after reload. Added a backend test.

danny-avila · 2026-06-14T01:53:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97c010ce9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T02:08:15Z

+  if (hasCost) {
+    rollup.cost = cost;


Persist cost only with complete event coverage

When a multi-call response has a mix of usage events where one lacks cost (for example, emitTokenUsage catches a pricing error and emits that payload without cost while other calls were priced), this persists the partial sum with a cost field. On reload readPersistedUsage treats any numeric metadata.usage.cost as complete coverage, so the UI renders an under-reported branch/all-branches cost instead of hiding incomplete coverage like the live fold does. Persist cost only if every folded usage event carried a cost.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T02:08:15Z

+      if (responseId != null) {
+        setEntryUsage(realId, responseId, {


Preserve metadata usage when no events were folded

When the final event includes responseMessage.metadata.usage but this client has not folded any on_token_usage events, such as a late or second resumable subscriber that receives the stored final event without replayed chunks, upsertEntries has just loaded the persisted usage from the response and this unconditional flush replaces it with an empty pending record. The current session then shows no branch/total usage or cost until a later full reload/re-index; only overwrite the entry when pending.eventCount > 0.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T02:08:15Z

+      /** Terminal path (abort/error): discard the in-flight response's pending
+       *  usage. Without a final event `finalizeUsage` never flushes it, so it
+       *  would otherwise merge into the next response in this conversation. */
+      jotai.set(pendingUsageFamily(convoKey), EMPTY_USAGE_TOTALS);


Flush billed usage before clearing on stop

When a user stops a non-resumable generation after at least one on_token_usage event has arrived, such as a tool-using run after the first model call, the cancel path calls resetLive before abortConversation, and that abort path only invokes finalHandler rather than finalizeUsage. Clearing pendingUsageFamily here drops usage that was already emitted/billed, so the branch and total cost under-report the stopped response instead of attaching that pending usage to the partial response or carrying it through finalization.

Useful? React with 👍 / 👎.

danny-avila · 2026-06-14T02:15:40Z

Round 4 addressed in cfebd80d9c (all three P2):

Complete cost coverage on persist: aggregateEmittedUsage now records cost only when every call carried one. A partial pricing failure (one call's computeUsageCostUSD threw and emitted without cost) omits cost, so readPersistedUsage treats coverage as unknown and hides it — matching the live fold — instead of reading an under-reported sum as authoritative.
Preserve metadata when nothing folded: finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0). A late/second resumable subscriber that receives the stored final event (with metadata.usage) but no replayed chunks now keeps the usage upsertEntries loaded, instead of overwriting it with an empty pending record.
Attribute billed usage on stop: the user-stop path now calls a new attributePending handler that flushes the in-flight pending usage onto the partial response (the branch tail) before resetting, so the stopped reply's billed tokens are kept on it (and still can't leak into the next response). resetLive's discard remains for the stream-error path, where no salvageable response exists.

danny-avila · 2026-06-14T02:25:25Z

@codex review

chatgpt-codex-connector · 2026-06-14T02:33:48Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: cfebd80d9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-actions · 2026-06-14T02:42:54Z

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

danny-avila · 2026-06-14T12:31:37Z

Fixed a user-reported regression: branch cost disappeared when switching to a sibling branch (until a new turn) — the cost counterpart of the earlier granularity bug.

Root cause: buildIndex rebuilds the token index from the messages cache. A sibling generated this session whose cache message lacks metadata.usage (and which regenerate transiently drops from the cache view) lost its live-flushed usage on rebuild, so sumBranch found none and the cost row hid. The breakdown survived only because it has a durable per-response map (snapshotsByAnchorFamily); usage had no equivalent.

Fix: a sticky per-response usage map (conversationId → messageId → usage) written by setEntryUsage and never rebuilt from the cache — the usage analog of snapshotsByAnchorFamily. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; it's cleared on conversation switch and migrated with the index. Added unit coverage for the drop-then-re-add case and an e2e assertion that branch cost survives a branch switch (live, no reload).

danny-avila · 2026-06-14T12:31:37Z

@codex review

chatgpt-codex-connector · 2026-06-14T12:39:45Z

Codex Review: Didn't find any major issues. Nice work!

Reviewed commit: 034f0f93d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danny-avila · 2026-06-14T12:50:34Z

Follow-up: the branch-switch cost fix needed one more piece. The sticky usage history alone wasn't enough — a branch switch fires no cache updated event, so the index never re-ran, and the post-regenerate rebuild was skipped while isSubmitting was still true, leaving the index stale and missing the viewed branch's entry. Now useTokenUsage re-indexes from the messages cache on every tail change (created/finalize and branch switch), so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history. Verified locally: the branch-switch e2e passes and the cost section renders both the branch row and the all-branches total after switching. @codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 532227ef91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T13:02:47Z

+      const tail = latestMessages?.[latestMessages.length - 1];
+      const partialResponseId =
+        tail != null && tail.isCreatedByUser === false ? tail.messageId : null;
+      attributePending(partialResponseId, { ...submission, userMessage });


Wire pending usage handling into resumable stops

This attribution only runs in the legacy useSSE cancel handler, but useAdaptiveSSE routes the default non-assistant/agents path through useResumableSSE, whose explicit Stop flow clears the submission and closes the stream without calling attributePending or otherwise resetting pendingUsageFamily. When a resumable agents stream is stopped after an on_token_usage event has folded, the per-response pending usage remains live and can be merged into the gauge and flushed into the next response, misattributing cost/tokens; the same attribution/reset needs to be wired into the resumable stop/cleanup path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T13:02:47Z

+    const metadata = this.buildResponseMetadata();
+    return metadata ? { completion, metadata } : { completion };


Persist usage metadata for stopped agent responses

This metadata is only attached on the normal sendCompletion path, but stopped resumable agent generations are saved by api/server/middleware/abortMiddleware.js instead; that path spends and clears collectedUsage while saving a responseMessage without metadata.usage or metadata.contextUsage. When a user stops a response after usage events were emitted, the live UI may show the billed cost, but after reload the saved incomplete response has no persisted rollup, so branch/total cost and the granular breakdown for that response disappear. The abort save path needs the same rollup/snapshot persistence.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T13:02:47Z

+    [pendingUsage],
+  );
+  const branchUsage = useMemo(
+    () => mergeUsage(branchTotals.usage, pendingAsUsage),


Keep pending usage on the streaming branch

When a user switches siblings while another branch is still streaming, tailId/branchTotals follow the viewed branch but pendingUsageFamily is still the conversation-wide holder for the in-flight response. This unconditional merge adds the streaming branch's pending input/output/cost to whichever sibling is being viewed until finalize/reset, so the cost section can temporarily show the wrong branch cost; only merge pending when the viewed tail is the streaming response, or scope pending usage per response.

Useful? React with 👍 / 👎.

github-actions · 2026-06-14T13:19:02Z

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

danny-avila · 2026-06-14T13:29:45Z

Addressed the stop/abort findings in 3cde6fe035:

Pending leak on resumable stop (P2): the resumable intentional-close handler (useResumableSSE) reset UI state but never cleared pendingUsageFamily, so usage folded before an explicit stop leaked into the next response. It now discards pending on intentional close; a resume re-folds via backfillUsage, so nothing is lost.
Stopped-response metadata persistence (P2): abortMiddleware saved the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. It now rebuilds both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path, with the breakdown gated on a primary usage event like buildResponseMetadata.

Intentionally deferred (tracked as a follow-up): the mid-stream branch-switch case (useTokenUsage merging the conversation-wide pending onto a sibling viewed while another branch streams) — it's transient (until finalize), the gauge's liveTokens already behaves the same way, and a correct fix needs new per-response streaming-id tracking. Better as its own change than bolted onto this PR.

@codex review

chatgpt-codex-connector

💡 Codex Review

LibreChat/client/src/hooks/SSE/useUsageHandler.ts

Lines 272 to 274 in 3cde6fe

    
           for (const entry of entries) { 
        
             foldUsage(entry, submission); 
        
           }

Restore confirmed output when backfilling resumed usage

When a resumable stream reconnects after a primary on_token_usage was already persisted, this backfill marks the runId:seq as folded without updating confirmedRef. The sync path later replays the same usage event through usageHandler, but it is skipped as a duplicate, so if the stream finalizes before another context snapshot, finalizeUsage records completedOutputTokens from only the seeded character estimate (or 0) rather than the provider-reported output. In that resume-at-final/tool-boundary case the in-session context gauge/snapshot history remains inaccurate even though the final response carries exact metadata.contextUsage.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T13:45:08Z

+  /** Persist the usage/cost rollup + context breakdown for the stopped response
+   *  so its branch/total cost and granular rows survive a reload, matching the
+   *  normal completion path. */
+  const abortMetadata = buildAbortResponseMetadata(jobData);


Persist stopped usage in the agents abort route

This metadata is attached only in abortMiddleware, but the Stop button for agent streams posts to /api/agents/chat/abort via abortStream, and that route builds/saves its own responseMessage in api/server/routes/agents/index.js without calling this helper or setting metadata. As a result, stopped resumable agent responses still lose their metadata.usage and metadata.contextUsage on reload; the helper needs to be reused in that actual abort path too.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T13:45:08Z

+     *  can't size the final reply, so skip persisting the pre-invoke snapshot —
+     *  reload then falls back to the coarse per-message tokenCount estimate
+     *  (accurate totals) rather than a snapshot that undercounts the reply. */
+    const hasPrimaryUsage = usageEvents.some((event) => event.usage_type == null);


Require latest-call usage before persisting snapshots

In a multi-call agent turn, contextUsageSink.latest is the snapshot for the most recent model call, but this predicate accepts any earlier primary usage event. If the latest/final call emits no usage_metadata (or is interrupted after a prior tool-call LLM did emit usage), buildPersistedContextUsage will use that earlier call's output as completedOutputTokens even though the latest snapshot already includes it, so reload over-reports the context gauge instead of falling back to the coarse tokenCount estimate intended by the comment.

Useful? React with 👍 / 👎.

danny-avila · 2026-06-14T13:52:28Z

Addressed in cc8a739b5e — and corrected where my prior fixes hit the wrong path:

Stopped-response metadata, real agents path (was wrong file): the agents Stop posts to /api/agents/chat/abort → routes/agents/index.js, which saves its own responseMessage and never hit the abortMiddleware I'd edited. Moved the builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in both abort save paths (agents route + legacy middleware), rebuilding metadata.usage/contextUsage from the job's persisted tokenUsage/contextUsage. Stopped agent replies now keep cost + breakdown on reload.
Snapshot gate (new finding, real over-report): buildResponseMetadata now persists the breakdown only when the final visible call emitted usage — it tracks a per-response snapshot count and requires primaryUsageCount >= snapshotCount. Previously any earlier primary usage passed, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already in the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate.
Resumable stop pending-reset (re-flagged): this is handled as of 3cde6fe035. The agents Stop flows stopGenerating → /chat/abort → clearAllSubmissions() → the useResumableSSE effect's null-branch sse.close() → the intentional-close abort handler, which now calls resetLive (clears pendingUsageFamily). A resume re-folds via backfillUsage, so nothing is lost.

Deferred (tracked follow-up): mid-stream branch-switch pending attribution (useTokenUsage) — transient, matches the gauge's existing liveTokens behavior, needs per-response streaming-id tracking.

@codex review

Persist the granular context breakdown and per-response usage/cost on the response message metadata, and re-derive branch + total usage/cost from a per-message index so the popover survives reloads and is branch-aware live. - Add aggregateEmittedUsage + buildPersistedContextUsage helpers in packages/api; capture the latest visible snapshot and every emitted on_token_usage payload via contextUsageSink/usageEmitSink. - Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the agents response message in sendCompletion. - Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage and branch-scoped usage on sumBranch. - Repurpose the session accumulator into a single in-flight pending holder; flush it into the index at finalize; hydrate breakdowns on load. - Render branch cost with a conditional all-branches total in the breakdown.

… deltas - Persist per-event-normalized display units in metadata.usage (TResponseUsage) so reloaded mixed-provider turns match the live session; client reads them directly instead of re-normalizing with a single stamped provider (P2). - Persist completedOutputTokens (final call output) on metadata.contextUsage so a reloaded multi-call turn adds the post-snapshot delta, not the full tokenCount the snapshot already counts (P2). - buildIndex preserves a prior entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a sibling branch's flushed cost (fixes the e2e regenerate failure). - Track costKnown so turns saved with contextCost off don't render $0.00 when cost display is later enabled (P3). - Use an epsilon for the all-branches cost comparison to avoid a spurious total row from float summation order (P3). - Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts the all-branches total after reload (deterministic via persisted metadata).

- Clear the in-flight pending usage on terminal abort/error (resetLive), so a stopped generation's tokens no longer merge into the next response (P2). - costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing and cost-less turns is flagged incomplete and the cost row is hidden rather than rendering an under-reported total (P2). - Drop the tokenCount fallback for completedOutputTokens on reload: only the persisted post-snapshot delta is used, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output (P2). - Update tokens.spec for AND coverage semantics + incomplete-cost case.

…ovider-less cache - Skip persisting metadata.contextUsage when the response emitted no primary usage event: without a known post-snapshot output the granular gauge would undercount the reply on reload, so fall back to the coarse per-message estimate instead (P2). - Gate the all-branches cost row on totalUsage.costKnown so an incomplete total (a sibling saved without cost) never renders an under-reported figure (P2). - aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the client's magnitude fallback (normalizeEventUnits) instead of billing splitUsage, so provider-less cached events match live on reload (P2). - Add backend test for the provider-less cached case.

- aggregateEmittedUsage persists cost only when EVERY call was priced; a partial pricing failure now omits cost so the client treats coverage as unknown rather than reading an under-reported sum as authoritative (P2). - finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0), so a late/second resumable subscriber carrying persisted metadata.usage keeps it instead of being overwritten with an empty pending record (P2). - On user stop, attribute the in-flight pending usage to the partial response (new attributePending handler) instead of discarding it in resetLive — the stopped reply's billed tokens are kept and still can't leak into the next response; resetLive's discard remains for the error path (P2).

…story Branch cost vanished on switching to a sibling branch (until a new turn) — the cost analog of the granularity bug. buildIndex rebuilds the token index from the messages cache; a sibling generated this session whose cache message lacks metadata.usage (and is transiently dropped from the cache during regenerate) lost its live-flushed usage, so sumBranch found none and the cost row hid. Fix: a sticky per-response usage map (conversationId → messageId → usage), written by setEntryUsage and never rebuilt from the cache — the usage counterpart of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; cleared on convo switch and migrated with the index. Add unit coverage for the drop-then-readd regression and an e2e assertion that branch cost survives a branch switch.

The sticky usage history alone didn't fix the reported branch-switch cost drop: on a branch switch no cache `updated` event fires, so the index subscriber never re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was still true — leaving the index stale and missing the now-viewed branch's response entirely (sticky can only restore entries present in a rebuild). Re-index from the messages cache on every tail change (created/finalize AND branch switch), not just while submitting. The cache holds the full message set at switch time, so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history → sumBranch finds it and the branch cost renders. Verified locally: the branch-switch e2e now passes (the cost section shows both the branch row and the all-branches total). Also fixed that e2e assertion to target a single cost value (strict-mode safe).

…tadata Codex round (stop/abort edges): - Resumable explicit-stop (intentional SSE close) reset UI state but never cleared pendingUsageFamily, so usage folded before the stop leaked into the next response in the conversation. Discard pending on intentional close (resetLive); a resume re-folds via backfillUsage, so nothing is lost. - The abort save path (abortMiddleware) persisted the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path; breakdown gated on a primary usage event like buildResponseMetadata. Deferred (per scope decision): mid-stream branch-switch transiently shows the streaming branch's pending on the viewed sibling (cosmetic, until finalize).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc8a739b5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T14:07:07Z

+  const hasPrimaryUsage = events.some((event) => event && event.usage_type == null);
+  if (snapshot && hasPrimaryUsage) {
+    metadata.contextUsage = buildPersistedContextUsage(snapshot, events);


Gate abort snapshots on the final usage event

When a stopped agent run has already completed an earlier primary call, emits a later context snapshot, and is aborted before that later/final call emits on_token_usage, hasPrimaryUsage is still true. This saves the latest snapshot and buildPersistedContextUsage uses the earlier call's output as completedOutputTokens, even though that output is already included in the latest snapshot; after reload the context gauge over-counts. The abort path needs the same snapshot-count/final-call gate as AgentClient.buildResponseMetadata (or should skip persisting the breakdown) for that case.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T14:07:07Z

+         *  while generating): discard the in-flight pending usage so it can't
+         *  merge into the next response in this conversation. On a resume the
+         *  collected usage is re-folded via backfillUsage, so nothing is lost. */
+        resetLive({ ...currentSubmission, userMessage });


Let resume backfill refold cleared pending usage

When this intentional-close path runs after some on_token_usage events have already been folded (for example navigating away from a resumable generation), resetLive clears pendingUsageFamily but leaves the conversation's folded usage keys intact. The subsequent resume sync calls backfillUsage, but markUsageFolded treats those persisted events as duplicates and does not rebuild pending usage; if the stream is then stopped or finalized, the branch/total usage for the response can be missing the pre-close calls until a full refetch/reload restores metadata.usage.

Useful? React with 👍 / 👎.

…shot gate Codex round (corrects last round's wrong-path fixes): - Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort), not abortMiddleware — so last round's metadata fix never ran for them. Moved the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped agent reply keeps its cost + breakdown on reload. - Persist the breakdown only when the FINAL visible call emitted usage: track a per-response snapshot count and require primaryUsageCount >= snapshotCount. Previously any earlier primary usage event passed the gate, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate. Resumable stop pending-reset (prior round, 3cde6fe) already flows through clearAllSubmissions → SSE close → the intentional-close handler's resetLive. Deferred per scope: mid-stream branch-switch pending attribution (tracked).

Codex round (on the re-applied abort/snapshot work): - buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already in the snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate, which is safe and apt. - resetLive now also forgets the conversation's folded usage-event identities (clearUsageFolded). Discarding pending on a terminal/intentional close left the folded keys set, so a later resume's backfillUsage saw the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.

danny-avila · 2026-06-14T14:15:51Z

Two follow-ups in a1a32318aa (plus 0cbeee80e4, which re-applies the agents-abort-route metadata + snapshot-count gate that had dropped off the branch in a rebase):

Abort breakdown over-count (usage.ts): buildAbortedResponseMetadata now persists only the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate — safe, and apt for an interrupted turn. The accurate cost rollup is still persisted.
Resume re-fold after pending discard (useResumableSSE/resetLive): the terminal/intentional-close discard now also forgets the conversation's folded usage-event identities (clearUsageFolded). Previously the folded keys survived, so a later resume's backfillUsage treated the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1a32318aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T14:33:45Z

+    if (this.contextUsageSink?.latest && snapshotCount > 0 && primaryUsageCount >= snapshotCount) {
+      metadata.contextUsage = buildPersistedContextUsage(this.contextUsageSink.latest, usageEvents);


Skip contextUsage when saving aborted completions

When an agents stream is stopped, sendCompletion still returns through BaseClient.sendMessage, which immediately saves the response with this metadata, and the resumable controller then skips its own save because client.savedMessageIds already contains the id. That means an aborted/unfinished response can still persist metadata.contextUsage here, bypassing the new buildAbortedResponseMetadata path that intentionally writes only the usage rollup for stopped responses; reloads can therefore show the granular persisted breakdown for interrupted turns instead of the safe coarse estimate.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T14:33:45Z

+      const abortMetadata = buildAbortedResponseMetadata(jobData);
+      if (abortMetadata) {
+        responseMessage.metadata = abortMetadata;


Save token counts for aborted agents responses

For stops that go through /api/agents/chat/abort, this new aborted metadata intentionally contains only metadata.usage, not metadata.contextUsage; after a reload the gauge therefore falls back to the coarse token index, where a missing tokenCount is treated as 0. The response saved in this route still omits tokenCount, so stopped agents replies with partial content are persisted with zero output tokens for the context meter, making the intended safe fallback under-report the interrupted response after reload.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T14:33:45Z

+  for (let i = events.length - 1; i >= 0; i--) {
+    if (events[i].usage_type == null) {
+      return normalizeEventUnits(events[i]).output;


Sum parallel post-snapshot outputs

When a graph has multiple visible primary agents running in parallel, each call emits its context snapshot before any of those calls has produced output, so the latest snapshot may exclude the outputs from several sibling calls. The live meter adds every post-snapshot primary usage event, but the persisted metadata keeps only the last primary event's output here; reloading such a parallel-agent response will under-report the context used by the omitted sibling outputs.

Useful? React with 👍 / 👎.

github-actions · 2026-06-14T14:45:19Z

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

…ila#13734) * 💾 feat: Persist Context Breakdown & Branch/Total Usage Cost Persist the granular context breakdown and per-response usage/cost on the response message metadata, and re-derive branch + total usage/cost from a per-message index so the popover survives reloads and is branch-aware live. - Add aggregateEmittedUsage + buildPersistedContextUsage helpers in packages/api; capture the latest visible snapshot and every emitted on_token_usage payload via contextUsageSink/usageEmitSink. - Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the agents response message in sendCompletion. - Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage and branch-scoped usage on sumBranch. - Repurpose the session accumulator into a single in-flight pending holder; flush it into the index at finalize; hydrate breakdowns on load. - Render branch cost with a conditional all-branches total in the breakdown. * 🧹 chore: Remove orphaned com_ui_session_cost i18n key * 🩹 fix: Address Codex review — normalize usage server-side, fix reload deltas - Persist per-event-normalized display units in metadata.usage (TResponseUsage) so reloaded mixed-provider turns match the live session; client reads them directly instead of re-normalizing with a single stamped provider (P2). - Persist completedOutputTokens (final call output) on metadata.contextUsage so a reloaded multi-call turn adds the post-snapshot delta, not the full tokenCount the snapshot already counts (P2). - buildIndex preserves a prior entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a sibling branch's flushed cost (fixes the e2e regenerate failure). - Track costKnown so turns saved with contextCost off don't render $0.00 when cost display is later enabled (P3). - Use an epsilon for the all-branches cost comparison to avoid a spurious total row from float summation order (P3). - Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts the all-branches total after reload (deterministic via persisted metadata). * 🩹 fix: Address Codex round 2 — pending leak, cost coverage, reload delta - Clear the in-flight pending usage on terminal abort/error (resetLive), so a stopped generation's tokens no longer merge into the next response (P2). - costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing and cost-less turns is flagged incomplete and the cost row is hidden rather than rendering an under-reported total (P2). - Drop the tokenCount fallback for completedOutputTokens on reload: only the persisted post-snapshot delta is used, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output (P2). - Update tokens.spec for AND coverage semantics + incomplete-cost case. * 🩹 fix: Address Codex round 3 — no-usage snapshots, total coverage, provider-less cache - Skip persisting metadata.contextUsage when the response emitted no primary usage event: without a known post-snapshot output the granular gauge would undercount the reply on reload, so fall back to the coarse per-message estimate instead (P2). - Gate the all-branches cost row on totalUsage.costKnown so an incomplete total (a sibling saved without cost) never renders an under-reported figure (P2). - aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the client's magnitude fallback (normalizeEventUnits) instead of billing splitUsage, so provider-less cached events match live on reload (P2). - Add backend test for the provider-less cached case. * 🩹 fix: Address Codex round 4 — abort attribution, complete cost coverage - aggregateEmittedUsage persists cost only when EVERY call was priced; a partial pricing failure now omits cost so the client treats coverage as unknown rather than reading an under-reported sum as authoritative (P2). - finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0), so a late/second resumable subscriber carrying persisted metadata.usage keeps it instead of being overwritten with an empty pending record (P2). - On user stop, attribute the in-flight pending usage to the partial response (new attributePending handler) instead of discarding it in resetLive — the stopped reply's billed tokens are kept and still can't leak into the next response; resetLive's discard remains for the error path (P2). * 🐛 fix: Persist branch cost across branch switches via sticky usage history Branch cost vanished on switching to a sibling branch (until a new turn) — the cost analog of the granularity bug. buildIndex rebuilds the token index from the messages cache; a sibling generated this session whose cache message lacks metadata.usage (and is transiently dropped from the cache during regenerate) lost its live-flushed usage, so sumBranch found none and the cost row hid. Fix: a sticky per-response usage map (conversationId → messageId → usage), written by setEntryUsage and never rebuilt from the cache — the usage counterpart of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; cleared on convo switch and migrated with the index. Add unit coverage for the drop-then-readd regression and an e2e assertion that branch cost survives a branch switch. * 🐛 fix: Re-index on branch switch so branch cost survives the switch The sticky usage history alone didn't fix the reported branch-switch cost drop: on a branch switch no cache `updated` event fires, so the index subscriber never re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was still true — leaving the index stale and missing the now-viewed branch's response entirely (sticky can only restore entries present in a rebuild). Re-index from the messages cache on every tail change (created/finalize AND branch switch), not just while submitting. The cache holds the full message set at switch time, so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history → sumBranch finds it and the branch cost renders. Verified locally: the branch-switch e2e now passes (the cost section shows both the branch row and the all-branches total). Also fixed that e2e assertion to target a single cost value (strict-mode safe). * 🩹 fix: Handle stopped-stream usage — reset pending + persist abort metadata Codex round (stop/abort edges): - Resumable explicit-stop (intentional SSE close) reset UI state but never cleared pendingUsageFamily, so usage folded before the stop leaked into the next response in the conversation. Discard pending on intentional close (resetLive); a resume re-folds via backfillUsage, so nothing is lost. - The abort save path (abortMiddleware) persisted the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path; breakdown gated on a primary usage event like buildResponseMetadata. Deferred (per scope decision): mid-stream branch-switch transiently shows the streaming branch's pending on the viewed sibling (cosmetic, until finalize). * 🩹 fix: Persist abort metadata on the real agents route + tighten snapshot gate Codex round (corrects last round's wrong-path fixes): - Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort), not abortMiddleware — so last round's metadata fix never ran for them. Moved the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped agent reply keeps its cost + breakdown on reload. - Persist the breakdown only when the FINAL visible call emitted usage: track a per-response snapshot count and require primaryUsageCount >= snapshotCount. Previously any earlier primary usage event passed the gate, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate. Resumable stop pending-reset (prior round, 3cde6fe) already flows through clearAllSubmissions → SSE close → the intentional-close handler's resetLive. Deferred per scope: mid-stream branch-switch pending attribution (tracked). * 🩹 fix: Abort breakdown over-count + resume re-fold after pending discard Codex round (on the re-applied abort/snapshot work): - buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already in the snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate, which is safe and apt. - resetLive now also forgets the conversation's folded usage-event identities (clearUsageFolded). Discarding pending on a terminal/intentional close left the folded keys set, so a later resume's backfillUsage saw the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

danny-avila added 5 commits June 14, 2026 10:04

🧹 chore: Remove orphaned com_ui_session_cost i18n key

a587bcb

danny-avila added 4 commits June 14, 2026 10:04

danny-avila force-pushed the claude/strange-dewdney-6b6d5d branch from cc8a739 to e0328ce Compare June 14, 2026 14:04

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

danny-avila added 2 commits June 14, 2026 10:09

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

danny-avila merged commit b03b2a0 into dev Jun 14, 2026
32 checks passed

danny-avila deleted the claude/strange-dewdney-6b6d5d branch June 14, 2026 14:48

This was referenced Jun 14, 2026

💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing #13738

Merged

🪙 feat: Context Gauge UX, Hover Snapshot, Click Breakdown, Currency, Cost-On-By-Default #13739

Merged

		completedOutputTokens:
		typeof message.tokenCount === 'number' ? message.tokenCount : undefined,

		if (responseId != null) {
		setEntryUsage(realId, responseId, {

		const metadata = this.buildResponseMetadata();
		return metadata ? { completion, metadata } : { completion };

	for (const entry of entries) {
	foldUsage(entry, submission);
	}

		if (this.contextUsageSink?.latest && snapshotCount > 0 && primaryUsageCount >= snapshotCount) {
		metadata.contextUsage = buildPersistedContextUsage(this.contextUsageSink.latest, usageEvents);

Uh oh!

Conversation

danny-avila commented Jun 14, 2026

Summary

Design note

Change Type

Testing

Test Configuration:

Checklist

Uh oh!

github-actions Bot commented Jun 14, 2026

🚨 Unused i18next Keys Detected

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

danny-avila commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026