💾 feat: Persist Context Breakdown & Branch/Total Usage Cost#13734
Conversation
🚨 Unused i18next Keys DetectedThe following translation keys are defined in
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ec17c623a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| completedOutputTokens: | ||
| typeof message.tokenCount === 'number' ? message.tokenCount : undefined, |
There was a problem hiding this comment.
Persist only the post-snapshot output delta
When rehydrating a tool-using or otherwise multi-call agent turn after reload, metadata.contextUsage is the latest pre-invoke snapshot, which already includes tokens from earlier model/tool steps in that same response. The live finalizer only adds liveAtFinalize (output since that snapshot), but this assigns the response's full tokenCount, so earlier output tokens are counted again and the context gauge can overstate usage after reload. Persist/replay the post-snapshot delta instead of using the whole response count.
Useful? React with 👍 / 👎.
| </span> | ||
| <span className="font-medium text-text-primary">{formatCost(view.branchCost)}</span> | ||
| </div> | ||
| {view.totalCost !== view.branchCost && ( |
There was a problem hiding this comment.
Use a tolerance before showing all-branches cost
In an unbranched conversation with several paid turns, sumBranch and sumTotalUsage add the same floating-point costs in opposite orders (branch walks tail-to-root, total walks insertion order). A strict !== can therefore be true only due to rounding, causing the popover to show a misleading “All branches” row even when there are no alternate branches. Compare rounded display values or use a small epsilon instead.
Useful? React with 👍 / 👎.
| model ??= event.model; | ||
| provider ??= event.provider; |
There was a problem hiding this comment.
Preserve per-event providers in usage rollups
When summarization is enabled with a different provider/model, its model-end usage can be the first usage event for the response before the primary agent call. These assignments then stamp the whole persisted rollup with the summarizer's provider, and readPersistedUsage later normalizes every token count using that one provider instead of the per-event providers that live foldUsage used. This makes branch input/cache/output totals change after reload for mixed-provider turns; persist normalized display units or keep per-event provider data.
Useful? React with 👍 / 👎.
| output: units.output, | ||
| cacheWrite: units.cacheWrite, | ||
| cacheRead: units.cacheRead, | ||
| cost: event.cost ?? 0, |
There was a problem hiding this comment.
Hide cost when persisted usage has no cost
When a turn was saved while interface.contextCost was off, the server still persists metadata.usage but omits cost; if cost display is later enabled for that deployment, this coercion turns the unknown cost into 0, and the popover renders Cost $0.00 because token usage exists. Preserve an undefined/has-cost state and only render the cost row for usage that actually carried cost.
Useful? React with 👍 / 👎.
|
Thanks — addressed all four in
Also fixed the |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af066208f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const completedOutputTokens = | ||
| event.completedOutputTokens ?? | ||
| (typeof message.tokenCount === 'number' ? message.tokenCount : undefined); |
There was a problem hiding this comment.
Avoid treating missing deltas as the whole response
When an agent run emits on_context_usage but the provider/model does not return usage_metadata, buildPersistedContextUsage persists a new metadata.contextUsage blob without completedOutputTokens. On reload this fallback then uses the entire assistant tokenCount as the post-snapshot delta; for tool/multi-call turns the latest snapshot already includes earlier assistant/tool-call output in breakdown.messageTokens, so the gauge double-counts those tokens after reload instead of matching the live view. This needs a way to distinguish legacy blobs from new snapshots with no usage event, or avoid applying full tokenCount to multi-call snapshots.
Useful? React with 👍 / 👎.
| if (usage.costKnown) { | ||
| target.costKnown = true; | ||
| } |
There was a problem hiding this comment.
Require complete cost coverage before showing totals
When a branch contains a persisted metadata.usage saved while contextCost was off (so it has token usage but no cost) and a later response saved with cost, this OR marks the aggregate as costKnown and the popover renders Cost for the branch, even though target.cost excludes the earlier usage. This under-reports branch/all-branch cost after toggling cost display or after any cost-computation miss; track partial/unknown cost separately or only render the cost total when every usage-bearing entry contributed a cost.
Useful? React with 👍 / 👎.
| const pendingAtom = pendingUsageFamily(convoKey); | ||
| const prev = jotai.get(pendingAtom); | ||
| jotai.set(pendingAtom, { |
There was a problem hiding this comment.
Clear pending usage on aborted streams
These folded events now live in pendingUsageFamily until finalizeUsage moves them onto a response, but the terminal cancel/error paths only call resetLive, which still clears just the live token estimate. If a user stops a generation (or a terminal SSE error occurs) after an on_token_usage event but before a final event, the pending totals remain and get merged into the next response in the same conversation, making that response's branch/total usage and cost include tokens from the aborted run.
Useful? React with 👍 / 👎.
|
Round 2 addressed in
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bfd306113
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (this.contextUsageSink?.latest) { | ||
| metadata.contextUsage = buildPersistedContextUsage( | ||
| this.contextUsageSink.latest, | ||
| this.usageEmitSink ?? [], | ||
| ); |
There was a problem hiding this comment.
Avoid persisting snapshots without final output
When a provider emits on_context_usage but no usage_metadata, ModelEndHandler returns before emitting any token-usage event, so usageEmitSink is empty here. This still persists metadata.contextUsage, and after reload hydrateSnapshots prefers that pre-invoke snapshot over the coarse tokenCount fallback; because completedOutputTokens is absent, the gauge undercounts by the entire assistant reply for those providers. Either include a response-token fallback or skip persisting the snapshot unless the post-snapshot output is known.
Useful? React with 👍 / 👎.
| {hasBranches && ( | ||
| <div className="flex items-center justify-between text-xs"> | ||
| <span className="text-text-secondary">{localize('com_ui_context_cost_total')}</span> | ||
| <span className="text-text-secondary">{formatCost(view.totalCost)}</span> | ||
| </div> |
There was a problem hiding this comment.
Gate all-branches cost on complete coverage
If the active branch has complete costs but another branch contains a usage-bearing response saved without metadata.usage.cost (for example while interface.contextCost was off) plus any costed sibling, sumTotalUsage marks totalUsage.costKnown false but this row still renders from totalCost. That makes the "All branches" value a partial, under-reported total; hide this row unless view.totalUsage.costKnown is also true.
Useful? React with 👍 / 👎.
| let cost = 0; | ||
| let hasCost = false; | ||
| for (const event of events) { | ||
| const split = splitUsage(event); |
There was a problem hiding this comment.
Match live cache fallback for provider-less usage
For usage payloads that include cache details but omit provider, the live client uses normalizeUsageUnits' magnitude fallback to decide whether input_tokens already includes cache, but this persisted rollup calls backend splitUsage, which treats a missing provider as additive. In that provider-less cached case (for example an OpenAI-compatible/custom event without provider), the usage shown immediately after streaming can exclude cache from Input, then after reload metadata.usage is higher because cache was added back. Use the same fallback as the client or persist enough provider information before normalizing.
Useful? React with 👍 / 👎.
|
Round 3 addressed in
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97c010ce9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (hasCost) { | ||
| rollup.cost = cost; |
There was a problem hiding this comment.
Persist cost only with complete event coverage
When a multi-call response has a mix of usage events where one lacks cost (for example, emitTokenUsage catches a pricing error and emits that payload without cost while other calls were priced), this persists the partial sum with a cost field. On reload readPersistedUsage treats any numeric metadata.usage.cost as complete coverage, so the UI renders an under-reported branch/all-branches cost instead of hiding incomplete coverage like the live fold does. Persist cost only if every folded usage event carried a cost.
Useful? React with 👍 / 👎.
| if (responseId != null) { | ||
| setEntryUsage(realId, responseId, { |
There was a problem hiding this comment.
Preserve metadata usage when no events were folded
When the final event includes responseMessage.metadata.usage but this client has not folded any on_token_usage events, such as a late or second resumable subscriber that receives the stored final event without replayed chunks, upsertEntries has just loaded the persisted usage from the response and this unconditional flush replaces it with an empty pending record. The current session then shows no branch/total usage or cost until a later full reload/re-index; only overwrite the entry when pending.eventCount > 0.
Useful? React with 👍 / 👎.
| /** Terminal path (abort/error): discard the in-flight response's pending | ||
| * usage. Without a final event `finalizeUsage` never flushes it, so it | ||
| * would otherwise merge into the next response in this conversation. */ | ||
| jotai.set(pendingUsageFamily(convoKey), EMPTY_USAGE_TOTALS); |
There was a problem hiding this comment.
Flush billed usage before clearing on stop
When a user stops a non-resumable generation after at least one on_token_usage event has arrived, such as a tool-using run after the first model call, the cancel path calls resetLive before abortConversation, and that abort path only invokes finalHandler rather than finalizeUsage. Clearing pendingUsageFamily here drops usage that was already emitted/billed, so the branch and total cost under-report the stopped response instead of attaching that pending usage to the partial response or carrying it through finalization.
Useful? React with 👍 / 👎.
|
Round 4 addressed in
|
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
GitNexus: ❌ deploy failedThe deploy failed — the previous index (if any) continues to be served. |
|
Fixed a user-reported regression: branch cost disappeared when switching to a sibling branch (until a new turn) — the cost counterpart of the earlier granularity bug. Root cause: Fix: a sticky per-response usage map ( |
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Follow-up: the branch-switch cost fix needed one more piece. The sticky usage history alone wasn't enough — a branch switch fires no cache |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 532227ef91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const tail = latestMessages?.[latestMessages.length - 1]; | ||
| const partialResponseId = | ||
| tail != null && tail.isCreatedByUser === false ? tail.messageId : null; | ||
| attributePending(partialResponseId, { ...submission, userMessage }); |
There was a problem hiding this comment.
Wire pending usage handling into resumable stops
This attribution only runs in the legacy useSSE cancel handler, but useAdaptiveSSE routes the default non-assistant/agents path through useResumableSSE, whose explicit Stop flow clears the submission and closes the stream without calling attributePending or otherwise resetting pendingUsageFamily. When a resumable agents stream is stopped after an on_token_usage event has folded, the per-response pending usage remains live and can be merged into the gauge and flushed into the next response, misattributing cost/tokens; the same attribution/reset needs to be wired into the resumable stop/cleanup path.
Useful? React with 👍 / 👎.
| const metadata = this.buildResponseMetadata(); | ||
| return metadata ? { completion, metadata } : { completion }; |
There was a problem hiding this comment.
Persist usage metadata for stopped agent responses
This metadata is only attached on the normal sendCompletion path, but stopped resumable agent generations are saved by api/server/middleware/abortMiddleware.js instead; that path spends and clears collectedUsage while saving a responseMessage without metadata.usage or metadata.contextUsage. When a user stops a response after usage events were emitted, the live UI may show the billed cost, but after reload the saved incomplete response has no persisted rollup, so branch/total cost and the granular breakdown for that response disappear. The abort save path needs the same rollup/snapshot persistence.
Useful? React with 👍 / 👎.
| [pendingUsage], | ||
| ); | ||
| const branchUsage = useMemo( | ||
| () => mergeUsage(branchTotals.usage, pendingAsUsage), |
There was a problem hiding this comment.
Keep pending usage on the streaming branch
When a user switches siblings while another branch is still streaming, tailId/branchTotals follow the viewed branch but pendingUsageFamily is still the conversation-wide holder for the in-flight response. This unconditional merge adds the streaming branch's pending input/output/cost to whichever sibling is being viewed until finalize/reset, so the cost section can temporarily show the wrong branch cost; only merge pending when the viewed tail is the streaming response, or scope pending usage per response.
Useful? React with 👍 / 👎.
GitNexus: ❌ deploy failedThe deploy failed — the previous index (if any) continues to be served. |
|
Addressed the stop/abort findings in
Intentionally deferred (tracked as a follow-up): the mid-stream branch-switch case ( @codex review |
There was a problem hiding this comment.
💡 Codex Review
LibreChat/client/src/hooks/SSE/useUsageHandler.ts
Lines 272 to 274 in 3cde6fe
When a resumable stream reconnects after a primary on_token_usage was already persisted, this backfill marks the runId:seq as folded without updating confirmedRef. The sync path later replays the same usage event through usageHandler, but it is skipped as a duplicate, so if the stream finalizes before another context snapshot, finalizeUsage records completedOutputTokens from only the seeded character estimate (or 0) rather than the provider-reported output. In that resume-at-final/tool-boundary case the in-session context gauge/snapshot history remains inaccurate even though the final response carries exact metadata.contextUsage.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /** Persist the usage/cost rollup + context breakdown for the stopped response | ||
| * so its branch/total cost and granular rows survive a reload, matching the | ||
| * normal completion path. */ | ||
| const abortMetadata = buildAbortResponseMetadata(jobData); |
There was a problem hiding this comment.
Persist stopped usage in the agents abort route
This metadata is attached only in abortMiddleware, but the Stop button for agent streams posts to /api/agents/chat/abort via abortStream, and that route builds/saves its own responseMessage in api/server/routes/agents/index.js without calling this helper or setting metadata. As a result, stopped resumable agent responses still lose their metadata.usage and metadata.contextUsage on reload; the helper needs to be reused in that actual abort path too.
Useful? React with 👍 / 👎.
| * can't size the final reply, so skip persisting the pre-invoke snapshot — | ||
| * reload then falls back to the coarse per-message tokenCount estimate | ||
| * (accurate totals) rather than a snapshot that undercounts the reply. */ | ||
| const hasPrimaryUsage = usageEvents.some((event) => event.usage_type == null); |
There was a problem hiding this comment.
Require latest-call usage before persisting snapshots
In a multi-call agent turn, contextUsageSink.latest is the snapshot for the most recent model call, but this predicate accepts any earlier primary usage event. If the latest/final call emits no usage_metadata (or is interrupted after a prior tool-call LLM did emit usage), buildPersistedContextUsage will use that earlier call's output as completedOutputTokens even though the latest snapshot already includes it, so reload over-reports the context gauge instead of falling back to the coarse tokenCount estimate intended by the comment.
Useful? React with 👍 / 👎.
|
Addressed in
Deferred (tracked follow-up): mid-stream branch-switch pending attribution ( @codex review |
Persist the granular context breakdown and per-response usage/cost on the response message metadata, and re-derive branch + total usage/cost from a per-message index so the popover survives reloads and is branch-aware live. - Add aggregateEmittedUsage + buildPersistedContextUsage helpers in packages/api; capture the latest visible snapshot and every emitted on_token_usage payload via contextUsageSink/usageEmitSink. - Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the agents response message in sendCompletion. - Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage and branch-scoped usage on sumBranch. - Repurpose the session accumulator into a single in-flight pending holder; flush it into the index at finalize; hydrate breakdowns on load. - Render branch cost with a conditional all-branches total in the breakdown.
… deltas - Persist per-event-normalized display units in metadata.usage (TResponseUsage) so reloaded mixed-provider turns match the live session; client reads them directly instead of re-normalizing with a single stamped provider (P2). - Persist completedOutputTokens (final call output) on metadata.contextUsage so a reloaded multi-call turn adds the post-snapshot delta, not the full tokenCount the snapshot already counts (P2). - buildIndex preserves a prior entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a sibling branch's flushed cost (fixes the e2e regenerate failure). - Track costKnown so turns saved with contextCost off don't render $0.00 when cost display is later enabled (P3). - Use an epsilon for the all-branches cost comparison to avoid a spurious total row from float summation order (P3). - Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts the all-branches total after reload (deterministic via persisted metadata).
- Clear the in-flight pending usage on terminal abort/error (resetLive), so a stopped generation's tokens no longer merge into the next response (P2). - costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing and cost-less turns is flagged incomplete and the cost row is hidden rather than rendering an under-reported total (P2). - Drop the tokenCount fallback for completedOutputTokens on reload: only the persisted post-snapshot delta is used, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output (P2). - Update tokens.spec for AND coverage semantics + incomplete-cost case.
…ovider-less cache - Skip persisting metadata.contextUsage when the response emitted no primary usage event: without a known post-snapshot output the granular gauge would undercount the reply on reload, so fall back to the coarse per-message estimate instead (P2). - Gate the all-branches cost row on totalUsage.costKnown so an incomplete total (a sibling saved without cost) never renders an under-reported figure (P2). - aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the client's magnitude fallback (normalizeEventUnits) instead of billing splitUsage, so provider-less cached events match live on reload (P2). - Add backend test for the provider-less cached case.
- aggregateEmittedUsage persists cost only when EVERY call was priced; a partial pricing failure now omits cost so the client treats coverage as unknown rather than reading an under-reported sum as authoritative (P2). - finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0), so a late/second resumable subscriber carrying persisted metadata.usage keeps it instead of being overwritten with an empty pending record (P2). - On user stop, attribute the in-flight pending usage to the partial response (new attributePending handler) instead of discarding it in resetLive — the stopped reply's billed tokens are kept and still can't leak into the next response; resetLive's discard remains for the error path (P2).
…story Branch cost vanished on switching to a sibling branch (until a new turn) — the cost analog of the granularity bug. buildIndex rebuilds the token index from the messages cache; a sibling generated this session whose cache message lacks metadata.usage (and is transiently dropped from the cache during regenerate) lost its live-flushed usage, so sumBranch found none and the cost row hid. Fix: a sticky per-response usage map (conversationId → messageId → usage), written by setEntryUsage and never rebuilt from the cache — the usage counterpart of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; cleared on convo switch and migrated with the index. Add unit coverage for the drop-then-readd regression and an e2e assertion that branch cost survives a branch switch.
The sticky usage history alone didn't fix the reported branch-switch cost drop: on a branch switch no cache `updated` event fires, so the index subscriber never re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was still true — leaving the index stale and missing the now-viewed branch's response entirely (sticky can only restore entries present in a rebuild). Re-index from the messages cache on every tail change (created/finalize AND branch switch), not just while submitting. The cache holds the full message set at switch time, so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history → sumBranch finds it and the branch cost renders. Verified locally: the branch-switch e2e now passes (the cost section shows both the branch row and the all-branches total). Also fixed that e2e assertion to target a single cost value (strict-mode safe).
…tadata Codex round (stop/abort edges): - Resumable explicit-stop (intentional SSE close) reset UI state but never cleared pendingUsageFamily, so usage folded before the stop leaked into the next response in the conversation. Discard pending on intentional close (resetLive); a resume re-folds via backfillUsage, so nothing is lost. - The abort save path (abortMiddleware) persisted the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path; breakdown gated on a primary usage event like buildResponseMetadata. Deferred (per scope decision): mid-stream branch-switch transiently shows the streaming branch's pending on the viewed sibling (cosmetic, until finalize).
cc8a739 to
e0328ce
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc8a739b5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const hasPrimaryUsage = events.some((event) => event && event.usage_type == null); | ||
| if (snapshot && hasPrimaryUsage) { | ||
| metadata.contextUsage = buildPersistedContextUsage(snapshot, events); |
There was a problem hiding this comment.
Gate abort snapshots on the final usage event
When a stopped agent run has already completed an earlier primary call, emits a later context snapshot, and is aborted before that later/final call emits on_token_usage, hasPrimaryUsage is still true. This saves the latest snapshot and buildPersistedContextUsage uses the earlier call's output as completedOutputTokens, even though that output is already included in the latest snapshot; after reload the context gauge over-counts. The abort path needs the same snapshot-count/final-call gate as AgentClient.buildResponseMetadata (or should skip persisting the breakdown) for that case.
Useful? React with 👍 / 👎.
| * while generating): discard the in-flight pending usage so it can't | ||
| * merge into the next response in this conversation. On a resume the | ||
| * collected usage is re-folded via backfillUsage, so nothing is lost. */ | ||
| resetLive({ ...currentSubmission, userMessage }); |
There was a problem hiding this comment.
Let resume backfill refold cleared pending usage
When this intentional-close path runs after some on_token_usage events have already been folded (for example navigating away from a resumable generation), resetLive clears pendingUsageFamily but leaves the conversation's folded usage keys intact. The subsequent resume sync calls backfillUsage, but markUsageFolded treats those persisted events as duplicates and does not rebuild pending usage; if the stream is then stopped or finalized, the branch/total usage for the response can be missing the pre-close calls until a full refetch/reload restores metadata.usage.
Useful? React with 👍 / 👎.
…shot gate Codex round (corrects last round's wrong-path fixes): - Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort), not abortMiddleware — so last round's metadata fix never ran for them. Moved the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped agent reply keeps its cost + breakdown on reload. - Persist the breakdown only when the FINAL visible call emitted usage: track a per-response snapshot count and require primaryUsageCount >= snapshotCount. Previously any earlier primary usage event passed the gate, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate. Resumable stop pending-reset (prior round, 3cde6fe) already flows through clearAllSubmissions → SSE close → the intentional-close handler's resetLive. Deferred per scope: mid-stream branch-switch pending attribution (tracked).
Codex round (on the re-applied abort/snapshot work): - buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already in the snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate, which is safe and apt. - resetLive now also forgets the conversation's folded usage-event identities (clearUsageFolded). Discarding pending on a terminal/intentional close left the folded keys set, so a later resume's backfillUsage saw the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.
|
Two follow-ups in
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1a32318aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (this.contextUsageSink?.latest && snapshotCount > 0 && primaryUsageCount >= snapshotCount) { | ||
| metadata.contextUsage = buildPersistedContextUsage(this.contextUsageSink.latest, usageEvents); |
There was a problem hiding this comment.
Skip contextUsage when saving aborted completions
When an agents stream is stopped, sendCompletion still returns through BaseClient.sendMessage, which immediately saves the response with this metadata, and the resumable controller then skips its own save because client.savedMessageIds already contains the id. That means an aborted/unfinished response can still persist metadata.contextUsage here, bypassing the new buildAbortedResponseMetadata path that intentionally writes only the usage rollup for stopped responses; reloads can therefore show the granular persisted breakdown for interrupted turns instead of the safe coarse estimate.
Useful? React with 👍 / 👎.
| const abortMetadata = buildAbortedResponseMetadata(jobData); | ||
| if (abortMetadata) { | ||
| responseMessage.metadata = abortMetadata; |
There was a problem hiding this comment.
Save token counts for aborted agents responses
For stops that go through /api/agents/chat/abort, this new aborted metadata intentionally contains only metadata.usage, not metadata.contextUsage; after a reload the gauge therefore falls back to the coarse token index, where a missing tokenCount is treated as 0. The response saved in this route still omits tokenCount, so stopped agents replies with partial content are persisted with zero output tokens for the context meter, making the intended safe fallback under-report the interrupted response after reload.
Useful? React with 👍 / 👎.
| for (let i = events.length - 1; i >= 0; i--) { | ||
| if (events[i].usage_type == null) { | ||
| return normalizeEventUnits(events[i]).output; |
There was a problem hiding this comment.
Sum parallel post-snapshot outputs
When a graph has multiple visible primary agents running in parallel, each call emits its context snapshot before any of those calls has produced output, so the latest snapshot may exclude the outputs from several sibling calls. The live meter adds every post-snapshot primary usage event, but the persisted metadata keeps only the last primary event's output here; reloading such a parallel-agent response will under-report the context used by the omitted sibling outputs.
Useful? React with 👍 / 👎.
GitNexus: ❌ deploy failedThe deploy failed — the previous index (if any) continues to be served. |
…ila#13734) * 💾 feat: Persist Context Breakdown & Branch/Total Usage Cost Persist the granular context breakdown and per-response usage/cost on the response message metadata, and re-derive branch + total usage/cost from a per-message index so the popover survives reloads and is branch-aware live. - Add aggregateEmittedUsage + buildPersistedContextUsage helpers in packages/api; capture the latest visible snapshot and every emitted on_token_usage payload via contextUsageSink/usageEmitSink. - Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the agents response message in sendCompletion. - Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage and branch-scoped usage on sumBranch. - Repurpose the session accumulator into a single in-flight pending holder; flush it into the index at finalize; hydrate breakdowns on load. - Render branch cost with a conditional all-branches total in the breakdown. * 🧹 chore: Remove orphaned com_ui_session_cost i18n key * 🩹 fix: Address Codex review — normalize usage server-side, fix reload deltas - Persist per-event-normalized display units in metadata.usage (TResponseUsage) so reloaded mixed-provider turns match the live session; client reads them directly instead of re-normalizing with a single stamped provider (P2). - Persist completedOutputTokens (final call output) on metadata.contextUsage so a reloaded multi-call turn adds the post-snapshot delta, not the full tokenCount the snapshot already counts (P2). - buildIndex preserves a prior entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a sibling branch's flushed cost (fixes the e2e regenerate failure). - Track costKnown so turns saved with contextCost off don't render $0.00 when cost display is later enabled (P3). - Use an epsilon for the all-branches cost comparison to avoid a spurious total row from float summation order (P3). - Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts the all-branches total after reload (deterministic via persisted metadata). * 🩹 fix: Address Codex round 2 — pending leak, cost coverage, reload delta - Clear the in-flight pending usage on terminal abort/error (resetLive), so a stopped generation's tokens no longer merge into the next response (P2). - costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing and cost-less turns is flagged incomplete and the cost row is hidden rather than rendering an under-reported total (P2). - Drop the tokenCount fallback for completedOutputTokens on reload: only the persisted post-snapshot delta is used, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output (P2). - Update tokens.spec for AND coverage semantics + incomplete-cost case. * 🩹 fix: Address Codex round 3 — no-usage snapshots, total coverage, provider-less cache - Skip persisting metadata.contextUsage when the response emitted no primary usage event: without a known post-snapshot output the granular gauge would undercount the reply on reload, so fall back to the coarse per-message estimate instead (P2). - Gate the all-branches cost row on totalUsage.costKnown so an incomplete total (a sibling saved without cost) never renders an under-reported figure (P2). - aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the client's magnitude fallback (normalizeEventUnits) instead of billing splitUsage, so provider-less cached events match live on reload (P2). - Add backend test for the provider-less cached case. * 🩹 fix: Address Codex round 4 — abort attribution, complete cost coverage - aggregateEmittedUsage persists cost only when EVERY call was priced; a partial pricing failure now omits cost so the client treats coverage as unknown rather than reading an under-reported sum as authoritative (P2). - finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0), so a late/second resumable subscriber carrying persisted metadata.usage keeps it instead of being overwritten with an empty pending record (P2). - On user stop, attribute the in-flight pending usage to the partial response (new attributePending handler) instead of discarding it in resetLive — the stopped reply's billed tokens are kept and still can't leak into the next response; resetLive's discard remains for the error path (P2). * 🐛 fix: Persist branch cost across branch switches via sticky usage history Branch cost vanished on switching to a sibling branch (until a new turn) — the cost analog of the granularity bug. buildIndex rebuilds the token index from the messages cache; a sibling generated this session whose cache message lacks metadata.usage (and is transiently dropped from the cache during regenerate) lost its live-flushed usage, so sumBranch found none and the cost row hid. Fix: a sticky per-response usage map (conversationId → messageId → usage), written by setEntryUsage and never rebuilt from the cache — the usage counterpart of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; cleared on convo switch and migrated with the index. Add unit coverage for the drop-then-readd regression and an e2e assertion that branch cost survives a branch switch. * 🐛 fix: Re-index on branch switch so branch cost survives the switch The sticky usage history alone didn't fix the reported branch-switch cost drop: on a branch switch no cache `updated` event fires, so the index subscriber never re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was still true — leaving the index stale and missing the now-viewed branch's response entirely (sticky can only restore entries present in a rebuild). Re-index from the messages cache on every tail change (created/finalize AND branch switch), not just while submitting. The cache holds the full message set at switch time, so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history → sumBranch finds it and the branch cost renders. Verified locally: the branch-switch e2e now passes (the cost section shows both the branch row and the all-branches total). Also fixed that e2e assertion to target a single cost value (strict-mode safe). * 🩹 fix: Handle stopped-stream usage — reset pending + persist abort metadata Codex round (stop/abort edges): - Resumable explicit-stop (intentional SSE close) reset UI state but never cleared pendingUsageFamily, so usage folded before the stop leaked into the next response in the conversation. Discard pending on intentional close (resetLive); a resume re-folds via backfillUsage, so nothing is lost. - The abort save path (abortMiddleware) persisted the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path; breakdown gated on a primary usage event like buildResponseMetadata. Deferred (per scope decision): mid-stream branch-switch transiently shows the streaming branch's pending on the viewed sibling (cosmetic, until finalize). * 🩹 fix: Persist abort metadata on the real agents route + tighten snapshot gate Codex round (corrects last round's wrong-path fixes): - Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort), not abortMiddleware — so last round's metadata fix never ran for them. Moved the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped agent reply keeps its cost + breakdown on reload. - Persist the breakdown only when the FINAL visible call emitted usage: track a per-response snapshot count and require primaryUsageCount >= snapshotCount. Previously any earlier primary usage event passed the gate, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate. Resumable stop pending-reset (prior round, 3cde6fe) already flows through clearAllSubmissions → SSE close → the intentional-close handler's resetLive. Deferred per scope: mid-stream branch-switch pending attribution (tracked). * 🩹 fix: Abort breakdown over-count + resume re-fold after pending discard Codex round (on the re-applied abort/snapshot work): - buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already in the snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate, which is safe and apt. - resetLive now also forgets the conversation's folded usage-event identities (clearUsageFolded). Discarding pending on a terminal/intentional close left the folded keys set, so a later resume's backfillUsage saw the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.
Summary
I made the context-usage popover durable and branch-aware by persisting the granular breakdown and per-response usage/cost on the response message, then re-deriving branch + total usage/cost from the existing per-message token index instead of a page-session accumulator. Before this, the breakdown reset to the coarse fallback on reload and the cost row showed
$0on an existing conversation until a new turn was generated; the cost was also "events since page load," neither branch-scoped nor a true conversation total.This builds on #13670 and persists onto the existing
metadatafield (MongooseMixed/z.record), so there is no migration.aggregateEmittedUsageandbuildPersistedContextUsagepure helpers inpackages/api/src/agents/usage.ts(the rollup sums the same emittedon_token_usagepayloads the client folds, so reload matches the live session; cost is summed additively and gated oninterface.contextCost).contextUsageSink/usageEmitSinkwired fromcallbacks.js→initialize.js→client.js(the subagent emitter folds into the same sink).metadata.contextUsage(Part A — breakdown) andmetadata.usage(Part B — usage/cost rollup) on the agents response message insendCompletion, alongside the existingthoughtSignatures.tokens.ts): entries now carry normalizedusage,sumBranchreturns branch usage, and addedsumTotalUsage(all branches) +setEntryUsage(live flush) +mergeUsage.pendingUsageFamilyholder, addedtotalUsageFamily, madefoldUsageaccumulate the current response andfinalizeUsageflush it into the index exactly once — so cost is branch-aware live, not just on reload.hydrateSnapshotsto restore each branch's persisted breakdown on load (merges, never clobbering a live finalized snapshot).com_ui_context_cost*i18n keys).metadata:GET /:conversationIduses a-_id -__v -userexclusion projection andsanitizeMessageForTransmitonly stripsfileContext/filetext.Design note
I followed the spec's raw-rollup approach (persist provider counts, normalize once on the client). Cost is therefore always exact (summed additively per event); token rows are exact for the single-provider main path and a minor approximation only when one turn mixes providers (subagents). This can be switched to backend-normalized display units if bit-exact token rows are wanted in that edge case.
Change Type
Testing
All changes are covered by unit + integration + e2e tests;
tsc --noEmitand ESLint are clean across the 14 touched files.packages/api/src/agents/usage.spec.ts—aggregateEmittedUsage(sum, cost gating, subagent/summarization fold, model/provider precedence) andbuildPersistedContextUsage(zero-tool-count trimming).api/server/controllers/agents/__tests__/usageEvents.integration.spec.js— through the real agents pipeline: sinks capture both model calls, the rollup aggregates correctly, and cost folds only whencontextCostis on.client/src/utils/tokens.spec.ts—buildIndexreadsmetadata.usage,sumBranchreturns branch usage/cost,sumTotalUsagespans all branches, the regenerate scenario (branch vs. total), backward-compat (nometadata.usage→ 0),setEntryUsagelive flush, and rebuild idempotency.e2e/specs/mock/usage.spec.ts— send → reload shows the granular breakdown and branch cost (Parts A + B persist); send → regenerate shows branch cost with the all-branches total line.Test Configuration:
npm run testper workspace (packages/api,client,api); mock Playwright suite vianpm run e2e:mock(interface.contextCostis enabled in the harness).Checklist