🛰️ feat: Add GPT-5.5 + Frontier OpenAI Models, Drop Deprecated Defaults#13636
Conversation
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR updates LibreChat’s default OpenAI model allowlist and aligns token/pricing configuration with the newly added GPT-5.5 family and refreshed GPT-5.x lineup, while removing deprecated/default legacy entries.
Changes:
- Updated OpenAI token pricing and cache token pricing tables to add GPT-5.5 (+ GPT-5.4-mini) and correct GPT-5.4-pro rates.
- Refreshed the default OpenAI model list used by
librechat-data-providerto include GPT-5.5 and remove deprecated legacy defaults. - Added explicit token context and max-output mappings for
gpt-5.4-minito avoid fallback matching togpt-5.4.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/data-schemas/src/methods/tx.ts | Adds/updates per-model token pricing and cache pricing for new GPT-5.x entries (incl. GPT-5.5) and corrects GPT-5.4-pro pricing. |
| packages/data-provider/src/config.ts | Updates the shared OpenAI default model allowlist to include GPT-5.5 family and remove deprecated legacy defaults. |
| packages/api/src/utils/tokens.ts | Adds explicit context/max-output entries for gpt-5.4-mini to ensure correct token limit resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 549fb38e40
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const sharedOpenAIModels = [ | ||
| 'gpt-5.5', | ||
| 'gpt-5.5-pro', | ||
| 'gpt-5.5-chat-latest', |
There was a problem hiding this comment.
Use the actual ChatGPT latest model id
OpenAI's current model catalog lists the ChatGPT Instant alias as chat-latest, not gpt-5.5-chat-latest (https://developers.openai.com/api/docs/models/chat-latest). With this default exposed in sharedOpenAIModels, users who select it will send an unsupported model id to OpenAI and get request failures; replace this entry with chat-latest and make sure token/pricing maps handle that alias.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 3968115 — chat-latest is the correct slug (the gpt-5.5-chat-latest model page 404s). Registered it in the token maps (400K context, 128K output) and pricing maps ($5/$30, cached $0.50). Also pinned explicit pricing keys for the still-live legacy aliases (gpt-5-chat-latest, gpt-5.1-chat-latest, gpt-5.2-chat-latest, gpt-5.3-chat-latest) — without them, the new bare chat-latest key would become their longest substring match and overbill them at GPT-5.5 rates.
| 'gpt-5.4-pro': { prompt: 5, completion: 30 }, | ||
| 'gpt-5.4-pro': { prompt: 30, completion: 180 }, | ||
| 'gpt-5.4-mini': { prompt: 0.75, completion: 4.5 }, | ||
| 'gpt-5.5': { prompt: 5, completion: 30 }, |
There was a problem hiding this comment.
Account for GPT-5.5 long-context pricing
For GPT-5.5, OpenAI publishes separate long-context rates ($10/$45, cached $1) in addition to the short-context rates added here (https://developers.openai.com/api/docs/pricing). Since this commit also gives gpt-5.5 a ~1M context window, any request that enters the long-context tier will still be billed internally at the short-context $5/$30 rates because there is no premiumTokenValues entry keyed by inputTokenCount, causing under-accounting for large prompts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 3968115 — added premiumTokenValues entries: gpt-5.5 at $10/$45 and gpt-5.4 at $5/$22.50 (its published long-context tier follows the same 2x input / 1.5x output structure), both with a 272K threshold per the model page: "prompts with >272K input tokens are priced at 2x input and 1.5x output for the full session." Cached-read tiering isn't representable in the current cache map structure, so long-context cached reads still bill at the standard rate — same limitation as the existing gemini entry.
|
|
||
| const sharedOpenAIModels = [ | ||
| 'gpt-5.5', | ||
| 'gpt-5.5-pro', |
There was a problem hiding this comment.
Don't expose GPT-5.5 Pro with streaming enabled
OpenAI documents gpt-5.5-pro with streaming as unsupported (https://developers.openai.com/api/docs/models/gpt-5.5-pro), while getOpenAIConfig defaults streaming = true for OpenAI requests. Adding this model to the default list means a user who selects it in the normal OpenAI flow will send a streaming request by default and hit an API failure unless they know to disable streaming first.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 3968115 — getOpenAILLMConfig now forces streaming: false for pro reasoning models (o1-pro, o3-pro, gpt-5-pro, gpt-5.x-pro), with spec coverage for both the pro and non-pro paths. This also covers gpt-5.4-pro, which was already in the default list before this PR with the same constraint.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
| 'gpt-5.1', | ||
| 'gpt-5.1-chat-latest', | ||
| 'gpt-5.1-codex', | ||
| 'gpt-5.1-codex-max', |
There was a problem hiding this comment.
Remove deprecated Codex model from defaults
OpenAI now marks gpt-5.1-codex-max as deprecated in its model page (https://developers.openai.com/api/docs/models/gpt-5.1-codex-max). Adding it to the default OpenAI model list exposes a deprecated option to new configurations, so users can select a model that is already on the removal path instead of the current gpt-5.3-codex entry added next to it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Keeping this one — the model page only marks a dated snapshot as deprecated; the default gpt-5.1-codex-max listing carries no deprecation marking and the page still describes it as the current default for long-running agentic coding tasks, with no recommended replacement. The deprecations page doesn't list it either. (For contrast, gpt-5.2-codex is marked deprecated at the model level, which is why it was deliberately left out of this PR.)
- Replace nonexistent gpt-5.5-chat-latest with the actual chat-latest alias; register its context window, output cap, pricing, and cache rates, and pin explicit rates for legacy gpt-5.x-chat-latest aliases so the new chat-latest key cannot out-match their cheaper pricing - Add long-context premium tiers (>272K input) for gpt-5.5 and gpt-5.4 - Disable streaming for pro reasoning models (o1-pro, gpt-5.x-pro), which OpenAI does not support, with spec coverage
|
@codex review Pushed fixes for the review feedback in 3968115: swapped in the real |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3968115670
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const sharedOpenAIModels = [ | ||
| 'gpt-5.5', | ||
| 'gpt-5.5-pro', | ||
| 'chat-latest', |
There was a problem hiding this comment.
Include chat-latest in fetched OpenAI model filtering
Adding chat-latest to the default list does not make it available for the common server-key path: getOpenAIModels calls fetchOpenAIModels, and when the OpenAI /models response is non-empty it filters official OpenAI results with /(text-davinci-003|gpt-|o\d+)/ in packages/api/src/endpoints/models.ts:276-279. Because chat-latest does not match that regex and fallback defaults are only used on an empty response, deployments that fetch models from OpenAI will drop this new slug from the model picker.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in ea432bc — added chat-latest to the official-OpenAI filter regex in fetchOpenAIModels, with a spec covering that it survives filtering while dall-e-3/gpt-realtime-2 are still excluded.
| /** Pro reasoning models (o1-pro, gpt-5.x-pro) do not support streaming */ | ||
| if ( | ||
| typeof modelOptions.model === 'string' && | ||
| /\b(?:o[13]|gpt-5(?:\.\d+)?)-pro\b/.test(modelOptions.model) |
There was a problem hiding this comment.
Drop sampling params for versioned pro models
This identifies gpt-5.4-pro/gpt-5.5-pro as pro reasoning models only for streaming, but those same IDs fall through the later sampling-parameter removal because the regex at packages/api/src/endpoints/openai/llm.ts:668 explicitly excludes versioned gpt-5.x models. When a user selects one of the new pro defaults with the normal temperature/topP settings still present, LibreChat will send unsupported sampling parameters to a reasoning model and the OpenAI request can fail instead of just disabling streaming.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in ea432bc — extracted the pro-model check into an exported isProReasoningModel helper and added it to the sampling-exclusion condition, so versioned pro models (gpt-5.4-pro, gpt-5.5-pro) now drop temperature/topP/penalties like other reasoning models. Both are added to the reasoning-model spec matrix.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
| typeof modelOptions.model === 'string' && | ||
| /\b(?:o[13]|gpt-5(?:\.\d+)?)-pro\b/.test(modelOptions.model) | ||
| ) { | ||
| llmConfig.streaming = false; |
There was a problem hiding this comment.
Propagate pro streaming disable to agent routes
Setting only llmConfig.streaming = false does not disable SSE handling for agent runs: both agent controllers decide whether to stream from model_parameters.disableStreaming (packages/api/src/agents/openai/service.ts:462-464 and api/server/controllers/agents/openai.js:431-432). When a user runs one of these pro models through the agents endpoint with stream: true, the HTTP route still enters the streaming path even though the initialized LLM config says not to stream, so the new guard is not honored in that context.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in ea432bc — both agent chat-completions routes (the TS service and the legacy JS controller) now factor isProReasoningModel(model_parameters.model) into their streamingDisabled decision alongside the explicit disableStreaming flag, so pro models take the aggregator/non-SSE path even when the request asks for streaming.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
- Allow chat-latest through the official OpenAI fetched-model filter - Export isProReasoningModel and drop unsupported sampling parameters for versioned pro models (gpt-5.4-pro, gpt-5.5-pro), which the versioned-model exemption previously let through - Honor the pro-model streaming disable in both agent chat-completions routes, which decide SSE from model_parameters before llmConfig exists - Update models.spec default-list assertions for the refreshed defaults and cover chat-latest filter retention
|
@codex review Round-2 fixes pushed in ea432bc: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea432bc56b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const sharedOpenAIModels = [ | ||
| 'gpt-5.5', | ||
| 'gpt-5.5-pro', | ||
| 'chat-latest', |
There was a problem hiding this comment.
Handle max-token conversion for chat-latest
When users select the new chat-latest default and set a max token limit, getOpenAILLMConfig still leaves it on llmConfig.maxTokens because the GPT-5 conversion guard only matches /\bgpt-[5-9]/ in packages/api/src/endpoints/openai/llm.ts. Since this alias is being treated as the latest GPT-5 chat model, those requests will send the legacy max-token field instead of max_completion_tokens/max_output_tokens, unlike the other GPT-5 defaults, and can fail only for conversations that configure a max token limit.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 13aa046 — the max-token conversion guard now also matches chat-latest, so it sends max_completion_tokens (or max_output_tokens on the Responses API) like the rest of the GPT-5 family. Spec added.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
| const streamingDisabled = | ||
| !!initializedParams?.disableStreaming || isProReasoningModel(initializedParams?.model); |
There was a problem hiding this comment.
Don't force pro models onto an empty nonstream path
In this TypeScript chat-completions service, setting streamingDisabled for pro models makes a streamed request take the non-streaming branch, but that branch never installs aggregator handlers (eventHandlers is {} unless isStreaming is true), so the later aggregator.getText() response stays empty even though the run produced output. This only affects pro-model requests through this route that ask for stream: true; the JS controller has separate non-streaming aggregation handlers, but this service does not.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 13aa046 — added createOpenAIAggregatorHandlers (message/reasoning deltas, tool-call accumulation, usage capture feeding the content aggregator) and wired it into the service's non-streaming branch. Worth noting this path was returning empty responses for ALL stream: false requests through this service, not just the new pro-model case — the aggregator existed but no handlers ever fed it. Covered with a spec that drives the handlers through a mocked run and asserts the aggregated body and usage, plus one asserting pro models take the JSON path when stream: true is requested.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
| modelOptions.model && | ||
| /\b(o[13]|gpt-5)(?!\.|-chat)(?:-|$)/.test(modelOptions.model as string) | ||
| (/\b(o[13]|gpt-5)(?!\.|-chat)(?:-|$)/.test(modelOptions.model as string) || | ||
| isProReasoningModel(modelOptions.model)) |
There was a problem hiding this comment.
Drop snake_case sampling params for pro models
When endpoint settings or the UI provide OpenAI parameters, the sampling fields use snake_case names such as top_p and logit_bias; this new pro-model branch still only adds the camelCase keys (topP, logitBias) to reasoningExcludeParams, so gpt-5.4-pro/gpt-5.5-pro requests can still carry unsupported sampling parameters and be rejected whenever those options are configured.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 13aa046 — added top_p, logit_bias, frequency_penalty, and presence_penalty to the reasoning exclusion list, so addParams-sourced snake_case values are stripped from modelKwargs for pro models (and the o-series/gpt-5 base models sharing that branch, which had the same latent gap). Spec covers addParams-provided snake_case params being dropped for gpt-5.5-pro.
There was a problem hiding this comment.
Update: the endpoint/handler fix for this was reverted in 1c4d1fd — this PR is intentionally trimmed to model-catalog changes only (lists, token windows, pricing, fetched-model filter). The OpenAI-compat endpoint/handler surface is moving out of LibreChat into the agents SDK, where this behavior belongs.
- Convert max_tokens for chat-latest, which the gpt-[5-9] guard missed - Drop snake_case sampling params (top_p, logit_bias, penalties) in the reasoning-model exclusion list so addParams-sourced values are removed - Add createOpenAIAggregatorHandlers and wire them into the agent chat-completions service's non-streaming branch, which previously ran with no handlers and always returned an empty aggregated response
|
@codex review Round-3 fixes pushed in 13aa046: |
- Sort type import first in service.spec.ts per import-order convention - Register isProReasoningModel in the openai controller spec's @librechat/api mock factory, whose enumerated exports left the new helper undefined and broke the non-streaming flow under test
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13aa04677a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| constructor(private aggregator: OpenAIContentAggregator) {} | ||
|
|
||
| handle(_event: string, data: MessageDeltaData): void { | ||
| const content = data?.content; |
There was a problem hiding this comment.
Read agent deltas from delta.content
In the non-streaming agent path, the new aggregator ignores the shape emitted by the existing agent event handlers: both the legacy OpenAI controller and the Responses service read message/reasoning chunks from data.delta.content, and the stream integration tests use that same envelope. When a normal non-streaming request (and now every forced non-streaming pro reasoning request) receives those events, data.content is undefined here, so all text chunks are dropped and the final chat.completion can be returned with content: null despite the model producing output.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Correct finding — the aggregator handlers did assume the streaming-handler shape rather than the delta.content envelope the agent graph actually emits. However, the aggregator handlers have been removed in 1c4d1fd: the PR scope is trimmed to the model catalog only, and the OpenAI-compat handler surface is moving out of LibreChat into the agents SDK, where this (and the equivalent shape mismatch in the existing streaming handler classes) should be addressed.
| const handlers: Record<string, EventHandler> = { | ||
| [GraphEvents.ON_MESSAGE_DELTA]: new OpenAIAggregatorMessageDeltaHandler(aggregator), | ||
| [GraphEvents.ON_REASONING_DELTA]: new OpenAIAggregatorReasoningDeltaHandler(aggregator), | ||
| [GraphEvents.ON_RUN_STEP_DELTA]: new OpenAIAggregatorRunStepDeltaHandler(aggregator), |
There was a problem hiding this comment.
Track tool starts for non-streaming calls
For non-streaming agent requests that surface a tool call, this registers only on_run_step_delta, but the existing controller and Responses handlers derive the tool id/name from on_run_step.stepDetails.tool_calls and then append only args from later deltas. With the normal two-event shape, this aggregator never creates the tool call entry unless a delta happens to repeat id, so the final OpenAI-compatible response can omit tool_calls and report finish_reason: "stop" for a tool-call turn.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Correct on the two-event shape (on_run_step carries id/name via stepDetails, deltas append args) — but the aggregator handlers were removed in 1c4d1fd as part of trimming this PR to model-catalog scope. To be handled in the agents SDK migration of this surface.
Revert the OpenAI endpoint and agent handler changes (pro-model streaming, sampling exclusions, non-streaming aggregation) — that surface is moving out of LibreChat into the agents SDK and belongs in its own change. Keep the model list, token windows, pricing, and the fetched-model filter for chat-latest.
|
@codex review Scope trimmed in 1c4d1fd: this PR is now model-catalog only — default model lists, token windows, pricing (incl. gpt-5.4-pro correction, long-context tiers, chat-latest alias keys), and the fetched-model filter. All endpoint/handler changes (pro-model streaming, sampling exclusions, non-streaming aggregation) were reverted; that surface is moving into the agents SDK and the related findings are deferred there. PR description updated to match. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c4d1fd297
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 'gpt-5.4': 272000, // standard context; 1M experimental available via API opt-in (2x rate) | ||
| 'gpt-5.4-pro': 272000, // same window as gpt-5.4 |
There was a problem hiding this comment.
Use the full GPT-5.4 context window
For gpt-5.4/gpt-5.4-pro, this value is used by getModelMaxTokens() and then by clients as maxContextTokens, so any prompt above 272K tokens gets truncated before it can use the long-context tier this PR adds. OpenAI’s model/pricing docs list GPT-5.4 and GPT-5.4 Pro with a 1.05M context window and a pricing breakpoint at 272K (https://developers.openai.com/api/docs/models/gpt-5.4), so the max-token map should represent the 1.05M capacity rather than the billing threshold.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 70c841a — both gpt-5.4 and gpt-5.4-pro now map to the documented 1,050,000 window. The 272K value was carried over from the pre-release note treating long context as an opt-in; the current model page lists 1.05M as standard with 272K as the pricing breakpoint only.
| 'gpt-5.4': { threshold: 272000, prompt: 5, completion: 22.5 }, | ||
| 'gpt-5.5': { threshold: 272000, prompt: 10, completion: 45 }, |
There was a problem hiding this comment.
Apply long-context pricing to pro models
When a gpt-5.4-pro or gpt-5.5-pro request exceeds 272K input tokens, getValueKey() resolves to the exact pro key added above, but premiumTokenValues only contains the non-pro keys, so getPremiumRate() never applies the long-context pro rates. OpenAI’s pricing table includes long-context columns for both pro models (https://developers.openai.com/api/docs/pricing), so these sessions would be billed at the short-context $30/$180 rate instead of $60/$270.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Partially confirmed, fixed in 70c841a — added gpt-5.4-pro long-context premium rates ($60/$270 above 272K), whose model page states the 2x input / 1.5x output rule verbatim. gpt-5.5-pro was left out deliberately: its model page documents flat $30/$180 pricing with no long-context tier or breakpoint (checked twice), so applying $60/$270 there would overbill. Happy to add it if OpenAI publishes a tier for it.
| 'gpt-5.3': { write: 1.75, read: 0.175 }, | ||
| 'gpt-5.4': { write: 2.5, read: 0.25 }, | ||
| 'gpt-5.4-mini': { write: 0.75, read: 0.075 }, | ||
| 'gpt-5.5': { write: 5, read: 0.5 }, |
There was a problem hiding this comment.
Charge cached GPT-5.5 long context at the long rate
For gpt-5.5 requests above the 272K-token long-context breakpoint, cached input is billed at $1.00/1M rather than $0.50/1M in OpenAI’s pricing table (https://developers.openai.com/api/docs/pricing), but getCacheMultiplier() has no input-token-count path and will always return this short-context read value. That undercounts any cached portion of long-context GPT-5.5 sessions even though prompt/output long-context rates are handled separately in premiumTokenValues.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Real but deferred — getCacheMultiplier() has no input-token-count path: its callers in the transaction layer don't carry prompt size, so tiered cache pricing isn't representable without extending the billing mechanism (the same limitation applies to the existing gemini premium entry). That's beyond this catalog-scoped PR; flagging it for the billing-mechanism follow-up. The long-context prompt/output rates themselves are handled via premiumTokenValues.
- Set gpt-5.4 and gpt-5.4-pro context to the documented 1,050,000 window — 272K is the long-context pricing breakpoint, not the cap, and using it truncated prompts before they could reach that tier - Add gpt-5.4-pro long-context premium rates ($60/$270 above 272K) per its model page; gpt-5.5-pro documents no long-context tier
|
@codex review Round-5 fixes pushed in 70c841a: gpt-5.4/gpt-5.4-pro context windows corrected to the documented 1.05M (272K is the pricing breakpoint, not the cap), and gpt-5.4-pro long-context premium rates added ($60/$270). gpt-5.5-pro intentionally has no premium entry — its model page documents no long-context tier. Tiered cached-input pricing is deferred: the cache multiplier path carries no token counts, so it needs a billing-mechanism change outside this catalog PR. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70c841a946
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // TODO: gpt-5.4-pro pricing not yet officially published — verify before release | ||
| 'gpt-5.4-pro': { prompt: 5, completion: 30 }, | ||
| 'gpt-5.4-pro': { prompt: 30, completion: 180 }, | ||
| 'gpt-5.4-mini': { prompt: 0.75, completion: 4.5 }, |
There was a problem hiding this comment.
Add the missing gpt-5.4-nano pricing key
When OpenAI returns gpt-5.4-nano from /v1/models or an admin configures it manually, this table has no explicit key for it even though the current pricing page lists it as part of the GPT-5.4 family at much lower rates ($0.20 input / $1.25 output, with cached input). Because getValueKey uses the longest substring match over tokenValues, gpt-5.4-nano resolves to the generic gpt-5.4 entry here and is billed at $2.50/$15 instead, so the new catalog refresh will substantially overcharge nano usage.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in ffa6359 — gpt-5.4-nano is registered in the default list, pricing ($0.20/$1.25, cached $0.02), and token maps (400K context; an explicit entry was also needed there since the substring match would have given it gpt-5.4's 1.05M window). Good catch — this model hadn't surfaced in any of the catalog sweeps.
|
|
||
| const sharedOpenAIModels = [ | ||
| 'gpt-5.5', | ||
| 'gpt-5.5-pro', |
There was a problem hiding this comment.
Hide gpt-5.5-pro until streaming is handled
Exposing gpt-5.5-pro in the default OpenAI model list lets users select it even though the OpenAI model page marks streaming as unsupported, while this endpoint still defaults to streaming = true in packages/api/src/endpoints/openai/config.ts. In the default setup (no explicit OPENAI_MODELS override and streaming left enabled), selecting this newly listed model will send a streaming chat request that the model rejects; either omit it from defaults or add the non-streaming/pro-model handling before advertising it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Keeping it listed, deliberately — the maintainer descoped endpoint/handler behavior from this PR: the OpenAI-compat surface (including pro-model streaming handling) is moving into the agents SDK, and the catalog should reflect the current frontier lineup. The same constraint class already ships today via gpt-5.4-pro, which predates this PR in the default list; users can disable streaming via model parameters until the SDK-side handling lands.
| 'gemini-3.1': { threshold: 200000, prompt: 4, completion: 18 }, | ||
| 'gpt-5.4': { threshold: 272000, prompt: 5, completion: 22.5 }, | ||
| 'gpt-5.4-pro': { threshold: 272000, prompt: 60, completion: 270 }, | ||
| 'gpt-5.5': { threshold: 272000, prompt: 10, completion: 45 }, |
There was a problem hiding this comment.
Apply long-context rates to gpt-5.5-pro
For gpt-5.5-pro requests over the 272K-token long-context breakpoint, getMultiplier only switches rates when the model has an entry in premiumTokenValues; the current pricing table lists gpt-5.5-pro long-context standard rates as $60 input / $270 output, but this block only adds the tier for gpt-5.5. As a result, any >272K gpt-5.5-pro session falls back to the standard $30/$180 rates from tokenValues and underbills long-context pro usage.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in ffa6359 — gpt-5.5-pro now has the $60/$270 long-context premium entry. This also corrects my reply on the earlier thread: the pricing page's long-context table does list gpt-5.5-pro, even though its model page presents only flat $30/$180 with no breakpoint mention — the pricing table is the canonical source here.
- Register gpt-5.4-nano ($0.20/$1.25, cached $0.02, 400K context) in the model list, pricing, cache, and token maps — the longest-match fallback billed it at gpt-5.4's $2.50/$15 - Add gpt-5.5-pro long-context premium rates ($60/$270 above 272K); the pricing table lists the tier even though the model page omits it
|
@codex review Round-6 fixes in ffa6359: added gpt-5.4-nano across the catalog (list, pricing, cache, token maps) and the gpt-5.5-pro long-context premium tier ($60/$270 — the pricing table lists it even though the model page omits the breakpoint, correcting my earlier reply). gpt-5.5-pro stays in the default list per the maintainer's descope decision; pro-model streaming handling lands with the agents SDK migration. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffa635927f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 'gpt-5.4': { threshold: 272000, prompt: 5, completion: 22.5 }, | ||
| 'gpt-5.4-pro': { threshold: 272000, prompt: 60, completion: 270 }, | ||
| 'gpt-5.5': { threshold: 272000, prompt: 10, completion: 45 }, | ||
| 'gpt-5.5-pro': { threshold: 272000, prompt: 60, completion: 270 }, |
There was a problem hiding this comment.
Apply long-context pricing to cached input
For GPT-5.4/5.5 requests above the 272K-token breakpoint, these new premium entries only affect uncached prompt and completion multipliers; structured cached reads still go through getCacheMultiplier, which has no inputTokenCount and returns the standard cacheTokenValues read rate. In a long-context request with cached input (e.g. read tokens plus total input >272K), the prompt and completion are billed at the premium tier but cached input remains at the below-threshold rate, undercharging those runs unless the cache multiplier is also made threshold-aware.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Duplicate of the earlier cached-input thread on this file — same answer: getCacheMultiplier has no input-token-count path (its transaction-layer callers don't carry prompt size), so threshold-aware cache rates require a billing-mechanism change that is deferred beyond this catalog-scoped PR. The gemini premium entry shares the same limitation.
| 'gpt-5.1', | ||
| 'gpt-5.1-chat-latest', | ||
| 'gpt-5.1-codex', | ||
| 'gpt-5.1-codex-max', |
There was a problem hiding this comment.
Remove deprecated Codex-Max from defaults
When the OpenAI model fetch is skipped or fails, this static list is what users can select, so adding gpt-5.1-codex-max reintroduces a model that the current OpenAI model page marks as deprecated (source). That works against the "drop deprecated defaults" change and can leave fallback/user-provided-key setups presenting a model that may fail as soon as OpenAI disables the alias; it should be omitted from defaults unless it is returned by the live models endpoint.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Re-checked the model page just now: the Deprecated label still applies only to a dated snapshot row in the Snapshots section — the model itself carries no deprecation marking, the page describes it as current ("purpose-built for agentic coding") with active pricing, no replacement is recommended, and it is absent from the deprecations page. Same conclusion as the earlier thread on this entry; keeping it. If OpenAI deprecates the model proper (as with gpt-5.2-codex, which this PR deliberately excludes), it should come out in a catalog follow-up.
…ts (danny-avila#13636) * 🛰️ feat: Add GPT-5.5 + Frontier OpenAI Models, Drop Deprecated Defaults * 🛰️ fix: Address Codex Review on OpenAI Model Refresh - Replace nonexistent gpt-5.5-chat-latest with the actual chat-latest alias; register its context window, output cap, pricing, and cache rates, and pin explicit rates for legacy gpt-5.x-chat-latest aliases so the new chat-latest key cannot out-match their cheaper pricing - Add long-context premium tiers (>272K input) for gpt-5.5 and gpt-5.4 - Disable streaming for pro reasoning models (o1-pro, gpt-5.x-pro), which OpenAI does not support, with spec coverage * 🛰️ fix: Address Codex Round-2 Review and CI Spec Failure - Allow chat-latest through the official OpenAI fetched-model filter - Export isProReasoningModel and drop unsupported sampling parameters for versioned pro models (gpt-5.4-pro, gpt-5.5-pro), which the versioned-model exemption previously let through - Honor the pro-model streaming disable in both agent chat-completions routes, which decide SSE from model_parameters before llmConfig exists - Update models.spec default-list assertions for the refreshed defaults and cover chat-latest filter retention * 🛰️ fix: Address Codex Round-3 Review - Convert max_tokens for chat-latest, which the gpt-[5-9] guard missed - Drop snake_case sampling params (top_p, logit_bias, penalties) in the reasoning-model exclusion list so addParams-sourced values are removed - Add createOpenAIAggregatorHandlers and wire them into the agent chat-completions service's non-streaming branch, which previously ran with no handlers and always returned an empty aggregated response * 🛰️ ci: Fix Import Order Drift and Controller Spec Mock - Sort type import first in service.spec.ts per import-order convention - Register isProReasoningModel in the openai controller spec's @librechat/api mock factory, whose enumerated exports left the new helper undefined and broke the non-streaming flow under test * 🛰️ chore: Trim Scope to Model Catalog Changes Revert the OpenAI endpoint and agent handler changes (pro-model streaming, sampling exclusions, non-streaming aggregation) — that surface is moving out of LibreChat into the agents SDK and belongs in its own change. Keep the model list, token windows, pricing, and the fetched-model filter for chat-latest. * 🛰️ fix: Correct GPT-5.4 Context Windows and Pro Long-Context Pricing - Set gpt-5.4 and gpt-5.4-pro context to the documented 1,050,000 window — 272K is the long-context pricing breakpoint, not the cap, and using it truncated prompts before they could reach that tier - Add gpt-5.4-pro long-context premium rates ($60/$270 above 272K) per its model page; gpt-5.5-pro documents no long-context tier * 🛰️ fix: Add gpt-5.4-nano and gpt-5.5-pro Long-Context Pricing - Register gpt-5.4-nano ($0.20/$1.25, cached $0.02, 400K context) in the model list, pricing, cache, and token maps — the longest-match fallback billed it at gpt-5.4's $2.50/$15 - Add gpt-5.5-pro long-context premium rates ($60/$270 above 272K); the pricing table lists the tier even though the model page omits it
Summary
I refreshed the default OpenAI model catalog to match OpenAI's current API lineup (verified June 2026 against the official models, pricing, and deprecations pages), adding the GPT-5.5 family and frontier models while removing dead and end-of-life entries, and aligned token windows and pricing for the additions. Scope is intentionally limited to the model catalog — endpoint/handler behavior for these models (pro-model streaming constraints, non-streaming aggregation) is deferred to the OpenAI-compat surface moving into the agents SDK.
gpt-5.5,gpt-5.5-pro, andchat-latest(GPT-5.5 Instant) tosharedOpenAIModels, alongsidegpt-5.4-mini,gpt-5.3-codex,gpt-5.2, andgpt-5.1-codex-max— all confirmed live API slugs.gpt-4.5-preview(+ snapshot),gpt-4-0314,gpt-4-32k-0314,gpt-4-0125-preview,gpt-4-1106-preview,gpt-4-turbo-preview, and thegpt-3.5-turbo-0613/-16kvariants.gpt-4/gpt-4-turbo/gpt-3.5-turbolegacy block (retires 2026-10-23) andgpt-5-chat-latest/gpt-5.1-chat-latest(retire 2026-07-23, superseded by GPT-5.5 Instant).gpt-5.4-thinking, which does not exist as an API slug ("Thinking" is ChatGPT-only branding), resolving the standing TODO on that entry; skippedgpt-5.2-codex(deprecated) andgpt-5.3-codex-spark(not in the API).gpt-5.5($5/$30),gpt-5.5-pro($30/$180),gpt-5.4-mini($0.75/$4.50), andchat-latest($5/$30) with matching cache-read rates; pro models correctly receive no cache discount.gpt-5.4-propricing from the placeholder $5/$30 to the now-published official $30/$180, resolving the TODO and fixing a 6x underbilling.gpt-5.5andgpt-5.4viapremiumTokenValues.gpt-5.x-chat-latestaliases so the new barechat-latestkey cannot out-match their cheaper rates in the longest-substring lookup.gpt-5.4-mini(400K) andchat-latest(400K) context windows and 128K output caps intokens.ts; without explicit entries the matcher resolved them incorrectly.chat-latestthrough the official-OpenAI fetched-model filter infetchOpenAIModels, which otherwise drops any model ID not matchinggpt-/o\d+.models.specdefault-list assertions and coveredchat-latestfilter retention.Change Type
Testing
packages/data-schemasJest suite for transaction pricing: 210 tests pass, including premium-tier and chat-latest alias key coverage.packages/data-providerJest suite: 1201 tests pass.packages/apimodels spec covering the fetched-model filter and default-list fallbacks.tsc --noEmit; prettier, eslint, and import-order checks pass on all changed files.Test Configuration:
Checklist