Skip to content

💾 feat: Persist Context Breakdown & Branch/Total Usage Cost#13734

Merged
danny-avila merged 11 commits into
devfrom
claude/strange-dewdney-6b6d5d
Jun 14, 2026
Merged

💾 feat: Persist Context Breakdown & Branch/Total Usage Cost#13734
danny-avila merged 11 commits into
devfrom
claude/strange-dewdney-6b6d5d

Conversation

@danny-avila

Copy link
Copy Markdown
Owner

Summary

I made the context-usage popover durable and branch-aware by persisting the granular breakdown and per-response usage/cost on the response message, then re-deriving branch + total usage/cost from the existing per-message token index instead of a page-session accumulator. Before this, the breakdown reset to the coarse fallback on reload and the cost row showed $0 on an existing conversation until a new turn was generated; the cost was also "events since page load," neither branch-scoped nor a true conversation total.

This builds on #13670 and persists onto the existing metadata field (Mongoose Mixed / z.record), so there is no migration.

  • Added aggregateEmittedUsage and buildPersistedContextUsage pure helpers in packages/api/src/agents/usage.ts (the rollup sums the same emitted on_token_usage payloads the client folds, so reload matches the live session; cost is summed additively and gated on interface.contextCost).
  • Captured the latest visible context snapshot and every emitted usage payload through new contextUsageSink/usageEmitSink wired from callbacks.jsinitialize.jsclient.js (the subagent emitter folds into the same sink).
  • Attached metadata.contextUsage (Part A — breakdown) and metadata.usage (Part B — usage/cost rollup) on the agents response message in sendCompletion, alongside the existing thoughtSignatures.
  • Extended the per-message token index (tokens.ts): entries now carry normalized usage, sumBranch returns branch usage, and added sumTotalUsage (all branches) + setEntryUsage (live flush) + mergeUsage.
  • Re-architected the live cost path: repurposed the cross-branch session accumulator into a single in-flight pendingUsageFamily holder, added totalUsageFamily, made foldUsage accumulate the current response and finalizeUsage flush it into the index exactly once — so cost is branch-aware live, not just on reload.
  • Added hydrateSnapshots to restore each branch's persisted breakdown on load (merges, never clobbering a live finalized snapshot).
  • Rendered the Input/Output/Cache totals from branch usage and the cost row as the branch cost, with a muted "All branches" total shown only when it differs (new com_ui_context_cost* i18n keys).
  • Verified the read path preserves metadata: GET /:conversationId uses a -_id -__v -user exclusion projection and sanitizeMessageForTransmit only strips fileContext/file text.

Design note

I followed the spec's raw-rollup approach (persist provider counts, normalize once on the client). Cost is therefore always exact (summed additively per event); token rows are exact for the single-provider main path and a minor approximation only when one turn mixes providers (subagents). This can be switched to backend-normalized display units if bit-exact token rows are wanted in that edge case.

Change Type

  • New feature (non-breaking change which adds functionality)

Testing

All changes are covered by unit + integration + e2e tests; tsc --noEmit and ESLint are clean across the 14 touched files.

  • packages/api/src/agents/usage.spec.tsaggregateEmittedUsage (sum, cost gating, subagent/summarization fold, model/provider precedence) and buildPersistedContextUsage (zero-tool-count trimming).
  • api/server/controllers/agents/__tests__/usageEvents.integration.spec.js — through the real agents pipeline: sinks capture both model calls, the rollup aggregates correctly, and cost folds only when contextCost is on.
  • client/src/utils/tokens.spec.tsbuildIndex reads metadata.usage, sumBranch returns branch usage/cost, sumTotalUsage spans all branches, the regenerate scenario (branch vs. total), backward-compat (no metadata.usage → 0), setEntryUsage live flush, and rebuild idempotency.
  • e2e/specs/mock/usage.spec.ts — send → reload shows the granular breakdown and branch cost (Parts A + B persist); send → regenerate shows branch cost with the all-branches total line.

Test Configuration:

npm run test per workspace (packages/api, client, api); mock Playwright suite via npm run e2e:mock (interface.contextCost is enabled in the harness).

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • My changes do not introduce new warnings
  • I have written tests demonstrating that my changes are effective or that my feature works
  • Local unit tests pass with my changes

@github-actions

Copy link
Copy Markdown
Contributor

🚨 Unused i18next Keys Detected

The following translation keys are defined in translation.json but are not used in the codebase:

  • com_ui_session_cost

⚠️ Please remove these unused keys to keep the translation files clean.

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ec17c623a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread client/src/store/usage.ts Outdated
Comment on lines +155 to +156
completedOutputTokens:
typeof message.tokenCount === 'number' ? message.tokenCount : undefined,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist only the post-snapshot output delta

When rehydrating a tool-using or otherwise multi-call agent turn after reload, metadata.contextUsage is the latest pre-invoke snapshot, which already includes tokens from earlier model/tool steps in that same response. The live finalizer only adds liveAtFinalize (output since that snapshot), but this assigns the response's full tokenCount, so earlier output tokens are counted again and the context gauge can overstate usage after reload. Persist/replay the post-snapshot delta instead of using the whole response count.

Useful? React with 👍 / 👎.

</span>
<span className="font-medium text-text-primary">{formatCost(view.branchCost)}</span>
</div>
{view.totalCost !== view.branchCost && (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Use a tolerance before showing all-branches cost

In an unbranched conversation with several paid turns, sumBranch and sumTotalUsage add the same floating-point costs in opposite orders (branch walks tail-to-root, total walks insertion order). A strict !== can therefore be true only due to rounding, causing the popover to show a misleading “All branches” row even when there are no alternate branches. Compare rounded display values or use a small epsilon instead.

Useful? React with 👍 / 👎.

Comment thread packages/api/src/agents/usage.ts Outdated
Comment on lines +202 to +203
model ??= event.model;
provider ??= event.provider;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve per-event providers in usage rollups

When summarization is enabled with a different provider/model, its model-end usage can be the first usage event for the response before the primary agent call. These assignments then stamp the whole persisted rollup with the summarizer's provider, and readPersistedUsage later normalizes every token count using that one provider instead of the per-event providers that live foldUsage used. This makes branch input/cache/output totals change after reload for mixed-provider turns; persist normalized display units or keep per-event provider data.

Useful? React with 👍 / 👎.

Comment thread client/src/utils/tokens.ts Outdated
output: units.output,
cacheWrite: units.cacheWrite,
cacheRead: units.cacheRead,
cost: event.cost ?? 0,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Hide cost when persisted usage has no cost

When a turn was saved while interface.contextCost was off, the server still persists metadata.usage but omits cost; if cost display is later enabled for that deployment, this coercion turns the unknown cost into 0, and the popover renders Cost $0.00 because token usage exists. Preserve an undefined/has-cost state and only render the cost row for usage that actually carried cost.

Useful? React with 👍 / 👎.

@danny-avila

Copy link
Copy Markdown
Owner Author

Thanks — addressed all four in af066208f7:

  • Persist post-snapshot output delta (P2, store/usage): the backend now records the final primary call's output as completedOutputTokens on metadata.contextUsage, and rehydration uses that delta instead of the full response tokenCount. A reloaded multi-call/tool turn no longer double-counts earlier steps the snapshot already includes.
  • Preserve per-event providers in rollups (P2, usage.ts): aggregateEmittedUsage now normalizes each event with splitUsage (per its own provider) and sums display units, persisted as TResponseUsage. The client reads them directly — no single stamped provider — so mixed-provider turns (summarization/subagent on a different provider) match the live session on reload.
  • Hide cost when persisted usage has no cost (P3, tokens.ts): cost is omitted from the rollup when contextCost was off at save; a costKnown flag now gates the cost row so it never renders $0.00 for unknown cost.
  • Tolerance for the all-branches row (P3, Breakdown.tsx): the branch-vs-total comparison now uses an epsilon (total − branch > 1e-9) instead of !==, so float summation order can't surface a spurious "All branches" row in an unbranched conversation.

Also fixed the e2e regression: buildIndex now preserves an entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild during regenerate keeps the sibling branch's flushed cost. The regenerate e2e asserts the all-branches total after a reload (deterministic via persisted metadata).

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af066208f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread client/src/store/usage.ts Outdated
Comment on lines +158 to +160
const completedOutputTokens =
event.completedOutputTokens ??
(typeof message.tokenCount === 'number' ? message.tokenCount : undefined);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating missing deltas as the whole response

When an agent run emits on_context_usage but the provider/model does not return usage_metadata, buildPersistedContextUsage persists a new metadata.contextUsage blob without completedOutputTokens. On reload this fallback then uses the entire assistant tokenCount as the post-snapshot delta; for tool/multi-call turns the latest snapshot already includes earlier assistant/tool-call output in breakdown.messageTokens, so the gauge double-counts those tokens after reload instead of matching the live view. This needs a way to distinguish legacy blobs from new snapshots with no usage event, or avoid applying full tokenCount to multi-call snapshots.

Useful? React with 👍 / 👎.

Comment thread client/src/utils/tokens.ts Outdated
Comment on lines +106 to +108
if (usage.costKnown) {
target.costKnown = true;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require complete cost coverage before showing totals

When a branch contains a persisted metadata.usage saved while contextCost was off (so it has token usage but no cost) and a later response saved with cost, this OR marks the aggregate as costKnown and the popover renders Cost for the branch, even though target.cost excludes the earlier usage. This under-reports branch/all-branch cost after toggling cost display or after any cost-computation miss; track partial/unknown cost separately or only render the cost total when every usage-bearing entry contributed a cost.

Useful? React with 👍 / 👎.

Comment on lines +151 to +153
const pendingAtom = pendingUsageFamily(convoKey);
const prev = jotai.get(pendingAtom);
jotai.set(pendingAtom, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear pending usage on aborted streams

These folded events now live in pendingUsageFamily until finalizeUsage moves them onto a response, but the terminal cancel/error paths only call resetLive, which still clears just the live token estimate. If a user stops a generation (or a terminal SSE error occurs) after an on_token_usage event but before a final event, the pending totals remain and get merged into the next response in the same conversation, making that response's branch/total usage and cost include tokens from the aborted run.

Useful? React with 👍 / 👎.

@danny-avila

Copy link
Copy Markdown
Owner Author

Round 2 addressed in 6bfd306113 (all three P2):

  • Pending usage leak on abort/error: resetLive (the terminal cancel/error path) now clears the in-flight pendingUsageFamily, so a stopped generation's folded tokens are discarded instead of merging into the next response in the conversation.
  • Partial cost coverage: costKnown now means complete coverage — it's ANDed across usage-bearing entries (vacuously true when empty, paired with hasUsage). A branch mixing turns saved with and without cost is flagged incomplete and the cost row is hidden rather than rendering an under-reported total.
  • Reload delta fallback: removed the ?? tokenCount fallback for completedOutputTokens. Only the persisted post-snapshot delta is applied, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output on reload (it contributes 0 — matching the snapshot's pre-final-call base — instead of the full response count).

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bfd306113

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread api/server/controllers/agents/client.js Outdated
Comment on lines +870 to +874
if (this.contextUsageSink?.latest) {
metadata.contextUsage = buildPersistedContextUsage(
this.contextUsageSink.latest,
this.usageEmitSink ?? [],
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid persisting snapshots without final output

When a provider emits on_context_usage but no usage_metadata, ModelEndHandler returns before emitting any token-usage event, so usageEmitSink is empty here. This still persists metadata.contextUsage, and after reload hydrateSnapshots prefers that pre-invoke snapshot over the coarse tokenCount fallback; because completedOutputTokens is absent, the gauge undercounts by the entire assistant reply for those providers. Either include a response-token fallback or skip persisting the snapshot unless the post-snapshot output is known.

Useful? React with 👍 / 👎.

Comment on lines +175 to +179
{hasBranches && (
<div className="flex items-center justify-between text-xs">
<span className="text-text-secondary">{localize('com_ui_context_cost_total')}</span>
<span className="text-text-secondary">{formatCost(view.totalCost)}</span>
</div>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate all-branches cost on complete coverage

If the active branch has complete costs but another branch contains a usage-bearing response saved without metadata.usage.cost (for example while interface.contextCost was off) plus any costed sibling, sumTotalUsage marks totalUsage.costKnown false but this row still renders from totalCost. That makes the "All branches" value a partial, under-reported total; hide this row unless view.totalUsage.costKnown is also true.

Useful? React with 👍 / 👎.

Comment thread packages/api/src/agents/usage.ts Outdated
let cost = 0;
let hasCost = false;
for (const event of events) {
const split = splitUsage(event);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match live cache fallback for provider-less usage

For usage payloads that include cache details but omit provider, the live client uses normalizeUsageUnits' magnitude fallback to decide whether input_tokens already includes cache, but this persisted rollup calls backend splitUsage, which treats a missing provider as additive. In that provider-less cached case (for example an OpenAI-compatible/custom event without provider), the usage shown immediately after streaming can exclude cache from Input, then after reload metadata.usage is higher because cache was added back. Use the same fallback as the client or persist enough provider information before normalizing.

Useful? React with 👍 / 👎.

@danny-avila

Copy link
Copy Markdown
Owner Author

Round 3 addressed in 97c010ce9f (all three P2):

  • No-usage snapshots: buildResponseMetadata now persists metadata.contextUsage only when the response emitted a primary usage event (post-snapshot output is known). For providers that emit no usage_metadata, the snapshot is skipped and reload falls back to the coarse per-message tokenCount estimate (accurate totals) instead of a granular gauge that undercounts the reply.
  • All-branches coverage: the all-branches cost row (and the "(this branch)" label) is now gated on view.totalUsage.costKnown, so an incomplete total — e.g. a sibling branch saved without cost — is hidden rather than shown as an under-reported figure.
  • Provider-less cached events: aggregateEmittedUsage/finalCallOutputTokens now normalize per-event via normalizeEventUnits, which mirrors the client's normalizeUsageUnits magnitude fallback (cache ≤ input ⇒ subset) instead of billing splitUsage (additive for missing provider). A provider-less cached event now yields the same Input live and after reload. Added a backend test.

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97c010ce9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/usage.ts Outdated
Comment on lines +202 to +203
if (hasCost) {
rollup.cost = cost;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist cost only with complete event coverage

When a multi-call response has a mix of usage events where one lacks cost (for example, emitTokenUsage catches a pricing error and emits that payload without cost while other calls were priced), this persists the partial sum with a cost field. On reload readPersistedUsage treats any numeric metadata.usage.cost as complete coverage, so the UI renders an under-reported branch/all-branches cost instead of hiding incomplete coverage like the live fold does. Persist cost only if every folded usage event carried a cost.

Useful? React with 👍 / 👎.

Comment thread client/src/hooks/SSE/useUsageHandler.ts Outdated
Comment on lines +281 to +282
if (responseId != null) {
setEntryUsage(realId, responseId, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve metadata usage when no events were folded

When the final event includes responseMessage.metadata.usage but this client has not folded any on_token_usage events, such as a late or second resumable subscriber that receives the stored final event without replayed chunks, upsertEntries has just loaded the persisted usage from the response and this unconditional flush replaces it with an empty pending record. The current session then shows no branch/total usage or cost until a later full reload/re-index; only overwrite the entry when pending.eventCount > 0.

Useful? React with 👍 / 👎.

Comment thread client/src/hooks/SSE/useUsageHandler.ts Outdated
Comment on lines +222 to +225
/** Terminal path (abort/error): discard the in-flight response's pending
* usage. Without a final event `finalizeUsage` never flushes it, so it
* would otherwise merge into the next response in this conversation. */
jotai.set(pendingUsageFamily(convoKey), EMPTY_USAGE_TOTALS);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Flush billed usage before clearing on stop

When a user stops a non-resumable generation after at least one on_token_usage event has arrived, such as a tool-using run after the first model call, the cancel path calls resetLive before abortConversation, and that abort path only invokes finalHandler rather than finalizeUsage. Clearing pendingUsageFamily here drops usage that was already emitted/billed, so the branch and total cost under-report the stopped response instead of attaching that pending usage to the partial response or carrying it through finalization.

Useful? React with 👍 / 👎.

@danny-avila

Copy link
Copy Markdown
Owner Author

Round 4 addressed in cfebd80d9c (all three P2):

  • Complete cost coverage on persist: aggregateEmittedUsage now records cost only when every call carried one. A partial pricing failure (one call's computeUsageCostUSD threw and emitted without cost) omits cost, so readPersistedUsage treats coverage as unknown and hides it — matching the live fold — instead of reading an under-reported sum as authoritative.
  • Preserve metadata when nothing folded: finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0). A late/second resumable subscriber that receives the stored final event (with metadata.usage) but no replayed chunks now keeps the usage upsertEntries loaded, instead of overwriting it with an empty pending record.
  • Attribute billed usage on stop: the user-stop path now calls a new attributePending handler that flushes the in-flight pending usage onto the partial response (the branch tail) before resetting, so the stopped reply's billed tokens are kept on it (and still can't leak into the next response). resetLive's discard remains for the stream-error path, where no salvageable response exists.

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: cfebd80d9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions

Copy link
Copy Markdown
Contributor

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

@danny-avila

Copy link
Copy Markdown
Owner Author

Fixed a user-reported regression: branch cost disappeared when switching to a sibling branch (until a new turn) — the cost counterpart of the earlier granularity bug.

Root cause: buildIndex rebuilds the token index from the messages cache. A sibling generated this session whose cache message lacks metadata.usage (and which regenerate transiently drops from the cache view) lost its live-flushed usage on rebuild, so sumBranch found none and the cost row hid. The breakdown survived only because it has a durable per-response map (snapshotsByAnchorFamily); usage had no equivalent.

Fix: a sticky per-response usage map (conversationId → messageId → usage) written by setEntryUsage and never rebuilt from the cache — the usage analog of snapshotsByAnchorFamily. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; it's cleared on conversation switch and migrated with the index. Added unit coverage for the drop-then-re-add case and an e2e assertion that branch cost survives a branch switch (live, no reload).

@danny-avila

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

Reviewed commit: 034f0f93d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila

Copy link
Copy Markdown
Owner Author

Follow-up: the branch-switch cost fix needed one more piece. The sticky usage history alone wasn't enough — a branch switch fires no cache updated event, so the index never re-ran, and the post-regenerate rebuild was skipped while isSubmitting was still true, leaving the index stale and missing the viewed branch's entry. Now useTokenUsage re-indexes from the messages cache on every tail change (created/finalize and branch switch), so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history. Verified locally: the branch-switch e2e passes and the cost section renders both the branch row and the all-branches total after switching. @codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 532227ef91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const tail = latestMessages?.[latestMessages.length - 1];
const partialResponseId =
tail != null && tail.isCreatedByUser === false ? tail.messageId : null;
attributePending(partialResponseId, { ...submission, userMessage });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire pending usage handling into resumable stops

This attribution only runs in the legacy useSSE cancel handler, but useAdaptiveSSE routes the default non-assistant/agents path through useResumableSSE, whose explicit Stop flow clears the submission and closes the stream without calling attributePending or otherwise resetting pendingUsageFamily. When a resumable agents stream is stopped after an on_token_usage event has folded, the per-response pending usage remains live and can be merged into the gauge and flushed into the next response, misattributing cost/tokens; the same attribution/reset needs to be wired into the resumable stop/cleanup path.

Useful? React with 👍 / 👎.

Comment on lines +844 to +845
const metadata = this.buildResponseMetadata();
return metadata ? { completion, metadata } : { completion };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist usage metadata for stopped agent responses

This metadata is only attached on the normal sendCompletion path, but stopped resumable agent generations are saved by api/server/middleware/abortMiddleware.js instead; that path spends and clears collectedUsage while saving a responseMessage without metadata.usage or metadata.contextUsage. When a user stops a response after usage events were emitted, the live UI may show the billed cost, but after reload the saved incomplete response has no persisted rollup, so branch/total cost and the granular breakdown for that response disappear. The abort save path needs the same rollup/snapshot persistence.

Useful? React with 👍 / 👎.

[pendingUsage],
);
const branchUsage = useMemo(
() => mergeUsage(branchTotals.usage, pendingAsUsage),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep pending usage on the streaming branch

When a user switches siblings while another branch is still streaming, tailId/branchTotals follow the viewed branch but pendingUsageFamily is still the conversation-wide holder for the in-flight response. This unconditional merge adds the streaming branch's pending input/output/cost to whichever sibling is being viewed until finalize/reset, so the cost section can temporarily show the wrong branch cost; only merge pending when the viewed tail is the streaming response, or scope pending usage per response.

Useful? React with 👍 / 👎.

@github-actions

Copy link
Copy Markdown
Contributor

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

@danny-avila

Copy link
Copy Markdown
Owner Author

Addressed the stop/abort findings in 3cde6fe035:

  • Pending leak on resumable stop (P2): the resumable intentional-close handler (useResumableSSE) reset UI state but never cleared pendingUsageFamily, so usage folded before an explicit stop leaked into the next response. It now discards pending on intentional close; a resume re-folds via backfillUsage, so nothing is lost.
  • Stopped-response metadata persistence (P2): abortMiddleware saved the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. It now rebuilds both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path, with the breakdown gated on a primary usage event like buildResponseMetadata.

Intentionally deferred (tracked as a follow-up): the mid-stream branch-switch case (useTokenUsage merging the conversation-wide pending onto a sibling viewed while another branch streams) — it's transient (until finalize), the gauge's liveTokens already behaves the same way, and a correct fix needs new per-response streaming-id tracking. Better as its own change than bolted onto this PR.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

for (const entry of entries) {
foldUsage(entry, submission);
}

P2 Badge Restore confirmed output when backfilling resumed usage

When a resumable stream reconnects after a primary on_token_usage was already persisted, this backfill marks the runId:seq as folded without updating confirmedRef. The sync path later replays the same usage event through usageHandler, but it is skipped as a duplicate, so if the stream finalizes before another context snapshot, finalizeUsage records completedOutputTokens from only the seeded character estimate (or 0) rather than the provider-reported output. In that resume-at-final/tool-boundary case the in-session context gauge/snapshot history remains inaccurate even though the final response carries exact metadata.contextUsage.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/** Persist the usage/cost rollup + context breakdown for the stopped response
* so its branch/total cost and granular rows survive a reload, matching the
* normal completion path. */
const abortMetadata = buildAbortResponseMetadata(jobData);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist stopped usage in the agents abort route

This metadata is attached only in abortMiddleware, but the Stop button for agent streams posts to /api/agents/chat/abort via abortStream, and that route builds/saves its own responseMessage in api/server/routes/agents/index.js without calling this helper or setting metadata. As a result, stopped resumable agent responses still lose their metadata.usage and metadata.contextUsage on reload; the helper needs to be reused in that actual abort path too.

Useful? React with 👍 / 👎.

Comment thread api/server/controllers/agents/client.js Outdated
* can't size the final reply, so skip persisting the pre-invoke snapshot —
* reload then falls back to the coarse per-message tokenCount estimate
* (accurate totals) rather than a snapshot that undercounts the reply. */
const hasPrimaryUsage = usageEvents.some((event) => event.usage_type == null);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require latest-call usage before persisting snapshots

In a multi-call agent turn, contextUsageSink.latest is the snapshot for the most recent model call, but this predicate accepts any earlier primary usage event. If the latest/final call emits no usage_metadata (or is interrupted after a prior tool-call LLM did emit usage), buildPersistedContextUsage will use that earlier call's output as completedOutputTokens even though the latest snapshot already includes it, so reload over-reports the context gauge instead of falling back to the coarse tokenCount estimate intended by the comment.

Useful? React with 👍 / 👎.

@danny-avila

Copy link
Copy Markdown
Owner Author

Addressed in cc8a739b5e — and corrected where my prior fixes hit the wrong path:

  • Stopped-response metadata, real agents path (was wrong file): the agents Stop posts to /api/agents/chat/abortroutes/agents/index.js, which saves its own responseMessage and never hit the abortMiddleware I'd edited. Moved the builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in both abort save paths (agents route + legacy middleware), rebuilding metadata.usage/contextUsage from the job's persisted tokenUsage/contextUsage. Stopped agent replies now keep cost + breakdown on reload.
  • Snapshot gate (new finding, real over-report): buildResponseMetadata now persists the breakdown only when the final visible call emitted usage — it tracks a per-response snapshot count and requires primaryUsageCount >= snapshotCount. Previously any earlier primary usage passed, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already in the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate.
  • Resumable stop pending-reset (re-flagged): this is handled as of 3cde6fe035. The agents Stop flows stopGenerating/chat/abortclearAllSubmissions() → the useResumableSSE effect's null-branch sse.close() → the intentional-close abort handler, which now calls resetLive (clears pendingUsageFamily). A resume re-folds via backfillUsage, so nothing is lost.

Deferred (tracked follow-up): mid-stream branch-switch pending attribution (useTokenUsage) — transient, matches the gauge's existing liveTokens behavior, needs per-response streaming-id tracking.

@codex review

Persist the granular context breakdown and per-response usage/cost on the
response message metadata, and re-derive branch + total usage/cost from a
per-message index so the popover survives reloads and is branch-aware live.

- Add aggregateEmittedUsage + buildPersistedContextUsage helpers in
  packages/api; capture the latest visible snapshot and every emitted
  on_token_usage payload via contextUsageSink/usageEmitSink.
- Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the
  agents response message in sendCompletion.
- Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage
  and branch-scoped usage on sumBranch.
- Repurpose the session accumulator into a single in-flight pending holder;
  flush it into the index at finalize; hydrate breakdowns on load.
- Render branch cost with a conditional all-branches total in the breakdown.
… deltas

- Persist per-event-normalized display units in metadata.usage (TResponseUsage)
  so reloaded mixed-provider turns match the live session; client reads them
  directly instead of re-normalizing with a single stamped provider (P2).
- Persist completedOutputTokens (final call output) on metadata.contextUsage so
  a reloaded multi-call turn adds the post-snapshot delta, not the full
  tokenCount the snapshot already counts (P2).
- buildIndex preserves a prior entry's immutable usage when a rebuilt cache
  message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a
  sibling branch's flushed cost (fixes the e2e regenerate failure).
- Track costKnown so turns saved with contextCost off don't render $0.00 when
  cost display is later enabled (P3).
- Use an epsilon for the all-branches cost comparison to avoid a spurious total
  row from float summation order (P3).
- Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts
  the all-branches total after reload (deterministic via persisted metadata).
- Clear the in-flight pending usage on terminal abort/error (resetLive), so a
  stopped generation's tokens no longer merge into the next response (P2).
- costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing
  and cost-less turns is flagged incomplete and the cost row is hidden rather
  than rendering an under-reported total (P2).
- Drop the tokenCount fallback for completedOutputTokens on reload: only the
  persisted post-snapshot delta is used, so a multi-call turn whose provider
  emitted no usage_metadata no longer double-counts earlier output (P2).
- Update tokens.spec for AND coverage semantics + incomplete-cost case.
…ovider-less cache

- Skip persisting metadata.contextUsage when the response emitted no primary
  usage event: without a known post-snapshot output the granular gauge would
  undercount the reply on reload, so fall back to the coarse per-message
  estimate instead (P2).
- Gate the all-branches cost row on totalUsage.costKnown so an incomplete total
  (a sibling saved without cost) never renders an under-reported figure (P2).
- aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the
  client's magnitude fallback (normalizeEventUnits) instead of billing
  splitUsage, so provider-less cached events match live on reload (P2).
- Add backend test for the provider-less cached case.
- aggregateEmittedUsage persists cost only when EVERY call was priced; a partial
  pricing failure now omits cost so the client treats coverage as unknown rather
  than reading an under-reported sum as authoritative (P2).
- finalizeUsage flushes pending into the response entry only when events were
  folded this session (eventCount > 0), so a late/second resumable subscriber
  carrying persisted metadata.usage keeps it instead of being overwritten with
  an empty pending record (P2).
- On user stop, attribute the in-flight pending usage to the partial response
  (new attributePending handler) instead of discarding it in resetLive — the
  stopped reply's billed tokens are kept and still can't leak into the next
  response; resetLive's discard remains for the error path (P2).
…story

Branch cost vanished on switching to a sibling branch (until a new turn) — the
cost analog of the granularity bug. buildIndex rebuilds the token index from the
messages cache; a sibling generated this session whose cache message lacks
metadata.usage (and is transiently dropped from the cache during regenerate)
lost its live-flushed usage, so sumBranch found none and the cost row hid.

Fix: a sticky per-response usage map (conversationId → messageId → usage),
written by setEntryUsage and never rebuilt from the cache — the usage counterpart
of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an
entry's usage from it when the message carries none; cleared on convo switch and
migrated with the index. Add unit coverage for the drop-then-readd regression and
an e2e assertion that branch cost survives a branch switch.
The sticky usage history alone didn't fix the reported branch-switch cost drop:
on a branch switch no cache `updated` event fires, so the index subscriber never
re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was
still true — leaving the index stale and missing the now-viewed branch's
response entirely (sticky can only restore entries present in a rebuild).

Re-index from the messages cache on every tail change (created/finalize AND
branch switch), not just while submitting. The cache holds the full message set
at switch time, so the viewed branch's response is re-added and its usage
restored from metadata.usage or the sticky history → sumBranch finds it and the
branch cost renders. Verified locally: the branch-switch e2e now passes (the
cost section shows both the branch row and the all-branches total). Also fixed
that e2e assertion to target a single cost value (strict-mode safe).
…tadata

Codex round (stop/abort edges):
- Resumable explicit-stop (intentional SSE close) reset UI state but never
  cleared pendingUsageFamily, so usage folded before the stop leaked into the
  next response in the conversation. Discard pending on intentional close
  (resetLive); a resume re-folds via backfillUsage, so nothing is lost.
- The abort save path (abortMiddleware) persisted the stopped response without
  metadata.usage/contextUsage, so its cost + breakdown vanished on reload.
  Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost)
  and contextUsage snapshot — parity with the normal sendCompletion path;
  breakdown gated on a primary usage event like buildResponseMetadata.

Deferred (per scope decision): mid-stream branch-switch transiently shows the
streaming branch's pending on the viewed sibling (cosmetic, until finalize).
@danny-avila danny-avila force-pushed the claude/strange-dewdney-6b6d5d branch from cc8a739 to e0328ce Compare June 14, 2026 14:04

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc8a739b5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/usage.ts Outdated
Comment on lines +337 to +339
const hasPrimaryUsage = events.some((event) => event && event.usage_type == null);
if (snapshot && hasPrimaryUsage) {
metadata.contextUsage = buildPersistedContextUsage(snapshot, events);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate abort snapshots on the final usage event

When a stopped agent run has already completed an earlier primary call, emits a later context snapshot, and is aborted before that later/final call emits on_token_usage, hasPrimaryUsage is still true. This saves the latest snapshot and buildPersistedContextUsage uses the earlier call's output as completedOutputTokens, even though that output is already included in the latest snapshot; after reload the context gauge over-counts. The abort path needs the same snapshot-count/final-call gate as AgentClient.buildResponseMetadata (or should skip persisting the breakdown) for that case.

Useful? React with 👍 / 👎.

* while generating): discard the in-flight pending usage so it can't
* merge into the next response in this conversation. On a resume the
* collected usage is re-folded via backfillUsage, so nothing is lost. */
resetLive({ ...currentSubmission, userMessage });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let resume backfill refold cleared pending usage

When this intentional-close path runs after some on_token_usage events have already been folded (for example navigating away from a resumable generation), resetLive clears pendingUsageFamily but leaves the conversation's folded usage keys intact. The subsequent resume sync calls backfillUsage, but markUsageFolded treats those persisted events as duplicates and does not rebuild pending usage; if the stream is then stopped or finalized, the branch/total usage for the response can be missing the pre-close calls until a full refetch/reload restores metadata.usage.

Useful? React with 👍 / 👎.

…shot gate

Codex round (corrects last round's wrong-path fixes):
- Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort),
  not abortMiddleware — so last round's metadata fix never ran for them. Moved
  the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata
  (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped
  agent reply keeps its cost + breakdown on reload.
- Persist the breakdown only when the FINAL visible call emitted usage: track a
  per-response snapshot count and require primaryUsageCount >= snapshotCount.
  Previously any earlier primary usage event passed the gate, so a multi-call
  turn whose final call emitted no usage_metadata used an earlier call's output
  as completedOutputTokens (already counted by the latest snapshot) → reload
  over-reported. Now it falls back to the coarse estimate.

Resumable stop pending-reset (prior round, 3cde6fe) already flows through
clearAllSubmissions → SSE close → the intentional-close handler's resetLive.
Deferred per scope: mid-stream branch-switch pending attribution (tracked).
Codex round (on the re-applied abort/snapshot work):
- buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the
  context breakdown. The abort path can't tell whether the final call emitted
  usage (the job stores only the latest snapshot, not a count), so persisting
  the breakdown risked reusing an earlier call's output as completedOutputTokens
  (already in the snapshot) → reload over-count. Stopped/incomplete responses
  now fall back to the coarse gauge estimate, which is safe and apt.
- resetLive now also forgets the conversation's folded usage-event identities
  (clearUsageFolded). Discarding pending on a terminal/intentional close left
  the folded keys set, so a later resume's backfillUsage saw the persisted
  events as duplicates and never rebuilt pending — leaving the response's usage
  missing until a full reload. Clearing them lets the resume re-fold.
@danny-avila

Copy link
Copy Markdown
Owner Author

Two follow-ups in a1a32318aa (plus 0cbeee80e4, which re-applies the agents-abort-route metadata + snapshot-count gate that had dropped off the branch in a rebase):

  • Abort breakdown over-count (usage.ts): buildAbortedResponseMetadata now persists only the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate — safe, and apt for an interrupted turn. The accurate cost rollup is still persisted.
  • Resume re-fold after pending discard (useResumableSSE/resetLive): the terminal/intentional-close discard now also forgets the conversation's folded usage-event identities (clearUsageFolded). Previously the folded keys survived, so a later resume's backfillUsage treated the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1a32318aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +879 to +880
if (this.contextUsageSink?.latest && snapshotCount > 0 && primaryUsageCount >= snapshotCount) {
metadata.contextUsage = buildPersistedContextUsage(this.contextUsageSink.latest, usageEvents);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip contextUsage when saving aborted completions

When an agents stream is stopped, sendCompletion still returns through BaseClient.sendMessage, which immediately saves the response with this metadata, and the resumable controller then skips its own save because client.savedMessageIds already contains the id. That means an aborted/unfinished response can still persist metadata.contextUsage here, bypassing the new buildAbortedResponseMetadata path that intentionally writes only the usage rollup for stopped responses; reloads can therefore show the granular persisted breakdown for interrupted turns instead of the safe coarse estimate.

Useful? React with 👍 / 👎.

Comment on lines +303 to +305
const abortMetadata = buildAbortedResponseMetadata(jobData);
if (abortMetadata) {
responseMessage.metadata = abortMetadata;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Save token counts for aborted agents responses

For stops that go through /api/agents/chat/abort, this new aborted metadata intentionally contains only metadata.usage, not metadata.contextUsage; after a reload the gauge therefore falls back to the coarse token index, where a missing tokenCount is treated as 0. The response saved in this route still omits tokenCount, so stopped agents replies with partial content are persisted with zero output tokens for the context meter, making the intended safe fallback under-report the interrupted response after reload.

Useful? React with 👍 / 👎.

Comment on lines +256 to +258
for (let i = events.length - 1; i >= 0; i--) {
if (events[i].usage_type == null) {
return normalizeEventUnits(events[i]).output;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sum parallel post-snapshot outputs

When a graph has multiple visible primary agents running in parallel, each call emits its context snapshot before any of those calls has produced output, so the latest snapshot may exclude the outputs from several sibling calls. The live meter adds every post-snapshot primary usage event, but the persisted metadata keeps only the last primary event's output here; reloading such a parallel-agent response will under-report the context used by the omitted sibling outputs.

Useful? React with 👍 / 👎.

@github-actions

Copy link
Copy Markdown
Contributor

GitNexus: ❌ deploy failed

The deploy failed — the previous index (if any) continues to be served.
Deploy run

@danny-avila danny-avila merged commit b03b2a0 into dev Jun 14, 2026
32 checks passed
@danny-avila danny-avila deleted the claude/strange-dewdney-6b6d5d branch June 14, 2026 14:48
fuuuzzy pushed a commit to fuuuzzy/LibreChat that referenced this pull request Jun 18, 2026
…ila#13734)

* 💾 feat: Persist Context Breakdown & Branch/Total Usage Cost

Persist the granular context breakdown and per-response usage/cost on the
response message metadata, and re-derive branch + total usage/cost from a
per-message index so the popover survives reloads and is branch-aware live.

- Add aggregateEmittedUsage + buildPersistedContextUsage helpers in
  packages/api; capture the latest visible snapshot and every emitted
  on_token_usage payload via contextUsageSink/usageEmitSink.
- Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the
  agents response message in sendCompletion.
- Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage
  and branch-scoped usage on sumBranch.
- Repurpose the session accumulator into a single in-flight pending holder;
  flush it into the index at finalize; hydrate breakdowns on load.
- Render branch cost with a conditional all-branches total in the breakdown.

* 🧹 chore: Remove orphaned com_ui_session_cost i18n key

* 🩹 fix: Address Codex review — normalize usage server-side, fix reload deltas

- Persist per-event-normalized display units in metadata.usage (TResponseUsage)
  so reloaded mixed-provider turns match the live session; client reads them
  directly instead of re-normalizing with a single stamped provider (P2).
- Persist completedOutputTokens (final call output) on metadata.contextUsage so
  a reloaded multi-call turn adds the post-snapshot delta, not the full
  tokenCount the snapshot already counts (P2).
- buildIndex preserves a prior entry's immutable usage when a rebuilt cache
  message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a
  sibling branch's flushed cost (fixes the e2e regenerate failure).
- Track costKnown so turns saved with contextCost off don't render $0.00 when
  cost display is later enabled (P3).
- Use an epsilon for the all-branches cost comparison to avoid a spurious total
  row from float summation order (P3).
- Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts
  the all-branches total after reload (deterministic via persisted metadata).

* 🩹 fix: Address Codex round 2 — pending leak, cost coverage, reload delta

- Clear the in-flight pending usage on terminal abort/error (resetLive), so a
  stopped generation's tokens no longer merge into the next response (P2).
- costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing
  and cost-less turns is flagged incomplete and the cost row is hidden rather
  than rendering an under-reported total (P2).
- Drop the tokenCount fallback for completedOutputTokens on reload: only the
  persisted post-snapshot delta is used, so a multi-call turn whose provider
  emitted no usage_metadata no longer double-counts earlier output (P2).
- Update tokens.spec for AND coverage semantics + incomplete-cost case.

* 🩹 fix: Address Codex round 3 — no-usage snapshots, total coverage, provider-less cache

- Skip persisting metadata.contextUsage when the response emitted no primary
  usage event: without a known post-snapshot output the granular gauge would
  undercount the reply on reload, so fall back to the coarse per-message
  estimate instead (P2).
- Gate the all-branches cost row on totalUsage.costKnown so an incomplete total
  (a sibling saved without cost) never renders an under-reported figure (P2).
- aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the
  client's magnitude fallback (normalizeEventUnits) instead of billing
  splitUsage, so provider-less cached events match live on reload (P2).
- Add backend test for the provider-less cached case.

* 🩹 fix: Address Codex round 4 — abort attribution, complete cost coverage

- aggregateEmittedUsage persists cost only when EVERY call was priced; a partial
  pricing failure now omits cost so the client treats coverage as unknown rather
  than reading an under-reported sum as authoritative (P2).
- finalizeUsage flushes pending into the response entry only when events were
  folded this session (eventCount > 0), so a late/second resumable subscriber
  carrying persisted metadata.usage keeps it instead of being overwritten with
  an empty pending record (P2).
- On user stop, attribute the in-flight pending usage to the partial response
  (new attributePending handler) instead of discarding it in resetLive — the
  stopped reply's billed tokens are kept and still can't leak into the next
  response; resetLive's discard remains for the error path (P2).

* 🐛 fix: Persist branch cost across branch switches via sticky usage history

Branch cost vanished on switching to a sibling branch (until a new turn) — the
cost analog of the granularity bug. buildIndex rebuilds the token index from the
messages cache; a sibling generated this session whose cache message lacks
metadata.usage (and is transiently dropped from the cache during regenerate)
lost its live-flushed usage, so sumBranch found none and the cost row hid.

Fix: a sticky per-response usage map (conversationId → messageId → usage),
written by setEntryUsage and never rebuilt from the cache — the usage counterpart
of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an
entry's usage from it when the message carries none; cleared on convo switch and
migrated with the index. Add unit coverage for the drop-then-readd regression and
an e2e assertion that branch cost survives a branch switch.

* 🐛 fix: Re-index on branch switch so branch cost survives the switch

The sticky usage history alone didn't fix the reported branch-switch cost drop:
on a branch switch no cache `updated` event fires, so the index subscriber never
re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was
still true — leaving the index stale and missing the now-viewed branch's
response entirely (sticky can only restore entries present in a rebuild).

Re-index from the messages cache on every tail change (created/finalize AND
branch switch), not just while submitting. The cache holds the full message set
at switch time, so the viewed branch's response is re-added and its usage
restored from metadata.usage or the sticky history → sumBranch finds it and the
branch cost renders. Verified locally: the branch-switch e2e now passes (the
cost section shows both the branch row and the all-branches total). Also fixed
that e2e assertion to target a single cost value (strict-mode safe).

* 🩹 fix: Handle stopped-stream usage — reset pending + persist abort metadata

Codex round (stop/abort edges):
- Resumable explicit-stop (intentional SSE close) reset UI state but never
  cleared pendingUsageFamily, so usage folded before the stop leaked into the
  next response in the conversation. Discard pending on intentional close
  (resetLive); a resume re-folds via backfillUsage, so nothing is lost.
- The abort save path (abortMiddleware) persisted the stopped response without
  metadata.usage/contextUsage, so its cost + breakdown vanished on reload.
  Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost)
  and contextUsage snapshot — parity with the normal sendCompletion path;
  breakdown gated on a primary usage event like buildResponseMetadata.

Deferred (per scope decision): mid-stream branch-switch transiently shows the
streaming branch's pending on the viewed sibling (cosmetic, until finalize).

* 🩹 fix: Persist abort metadata on the real agents route + tighten snapshot gate

Codex round (corrects last round's wrong-path fixes):
- Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort),
  not abortMiddleware — so last round's metadata fix never ran for them. Moved
  the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata
  (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped
  agent reply keeps its cost + breakdown on reload.
- Persist the breakdown only when the FINAL visible call emitted usage: track a
  per-response snapshot count and require primaryUsageCount >= snapshotCount.
  Previously any earlier primary usage event passed the gate, so a multi-call
  turn whose final call emitted no usage_metadata used an earlier call's output
  as completedOutputTokens (already counted by the latest snapshot) → reload
  over-reported. Now it falls back to the coarse estimate.

Resumable stop pending-reset (prior round, 3cde6fe) already flows through
clearAllSubmissions → SSE close → the intentional-close handler's resetLive.
Deferred per scope: mid-stream branch-switch pending attribution (tracked).

* 🩹 fix: Abort breakdown over-count + resume re-fold after pending discard

Codex round (on the re-applied abort/snapshot work):
- buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the
  context breakdown. The abort path can't tell whether the final call emitted
  usage (the job stores only the latest snapshot, not a count), so persisting
  the breakdown risked reusing an earlier call's output as completedOutputTokens
  (already in the snapshot) → reload over-count. Stopped/incomplete responses
  now fall back to the coarse gauge estimate, which is safe and apt.
- resetLive now also forgets the conversation's folded usage-event identities
  (clearUsageFolded). Discarding pending on a terminal/intentional close left
  the folded keys set, so a later resume's backfillUsage saw the persisted
  events as duplicates and never rebuilt pending — leaving the response's usage
  missing until a full reload. Clearing them lets the resume re-fold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant