Artificial Intelligence (AI)

ChatGPT Bidi 1: What the Leaked OpenAI Bidirectional Voice Model Apparently Does

ChatGPT Bidi 1 leaked OpenAI bidirectional voice model: the unreleased voice architecture that surfaced through a June 23 2026 leak by testingcatalog on X and was subsequently covered by Android Authority Crypto Briefing Techlusive Let's Data Science and other secondary outlets reportedly offering three intelligence tiers labeled High Medium and Instant that mirror the cost-versus-capability trade structure across OpenAI's other lineups and notably introducing the substantive architectural change of simultaneous speech and listening so that the model can be interrupted mid-sentence respond to a user's pause without waiting for a turn-taking signal and switch tasks during an in-progress utterance rather than the strict speaker-and-then-listener pattern that the current generally-available ChatGPT voice mode operates on with real-time translation reportedly built in and an expected landing in OpenAI's Codex agentic coding surface alongside ChatGPT itself but with no official OpenAI announcement or release date as of late June 2026 meaning every specific claim in the leak should be treated as plausible pending confirmation rather than as a documented product feature.

OpenAI’s next ChatGPT voice model is apparently called Bidi 1, and the substantive change is that it can speak and listen simultaneously rather than taking turns. The information comes from a leak that surfaced on June 23, 2026 via the @testingcatalog account on X and was picked up by Android Authority, Crypto Briefing, Techlusive, and several other AI-focused outlets in the days following. As of this writing, OpenAI has made no official announcement. Every specific claim in the leak should be treated as plausible pending confirmation rather than as a documented product feature. With that important caveat established up front, the leak as reported is substantive enough to be worth covering carefully, because the architectural change it implies (a genuinely concurrent speech-and-listening voice model) would be a meaningful step in what AI voice interaction feels like in practice.

This piece walks through what the leak says, the substantive significance of bidirectional voice if the claims hold up, the reported intelligence-tier structure that mirrors OpenAI’s broader product pattern, what the leak does not say, the competitive context in voice AI in 2026, and the explicit distinction between Bidi 1 (a model) and the Jony-Ive-designed OpenAI hardware device (codename Sweetpea, a separate program that is sometimes confused with the voice model story). We close with the practical question of what to actually do with this information given that OpenAI has not committed to anything yet.

The short version is that Bidi 1, as the leak describes it, would address one of the more frustrating limitations of current AI voice interaction (the strict turn-taking that breaks natural conversational rhythm) by introducing genuine concurrency. If the architectural claims are accurate and the implementation is reliable, the user experience would be noticeably different from today’s ChatGPT voice mode. The "if" is doing real work in that sentence; OpenAI has not confirmed the existence of the model, the timeline for release, or any of the specific features the leak describes. Watch for an official announcement; do not commit to plans that depend on Bidi 1 until then.

What the leak says

The original leak, posted by @testingcatalog on X on or around June 23, 2026, identified an unreleased OpenAI voice model named "Bidi 1." The naming is reportedly OpenAI’s own spelling rather than a community designation; "Bidi" is short for bidirectional, which is the architectural property that distinguishes the model from existing voice systems.

The specifics in the leak, as reported across the secondary coverage:

Three intelligence tiers. Bidi 1 is reportedly offered in three configurations labeled High, Medium, and Instant. The High tier is the most capable and presumably the most expensive; Medium is the balanced everyday option; Instant is the low-latency low-cost option appropriate for high-volume use. The tier structure mirrors the cost-versus-capability trade pattern that OpenAI has been using across other recent product lines, most visibly the GPT-5.6 Sol/Terra/Luna naming we covered in our recent piece on that release.

Simultaneous speech and listening. This is the architectural substance. The current ChatGPT voice mode operates in a strict turn-taking pattern: the user speaks, the model speaks, the user speaks, and so on. The microphone is typically muted while the model is speaking (or at minimum, the model does not process incoming audio while it generates outgoing audio). Bidi 1 reportedly handles both directions concurrently. The user can interrupt the model mid-sentence and the model will acknowledge the interruption rather than continuing to speak past it. The user can pause mid-question and the model can respond to the partial question without waiting for a clear turn-taking signal. The user can task-switch mid-utterance and the model can adapt.

Real-time translation. Reportedly built into the model rather than handled as a separate translation pass. A conversation across languages would happen at speaking pace rather than at translate-then-speak pace.

Expected landing in Codex. Beyond ChatGPT, the leak suggests Bidi 1 will also be available in Codex, OpenAI’s agentic coding surface. The implication is that voice-driven coding (asking Codex to make code changes by talking through the changes) would benefit from the bidirectional capability.

The secondary coverage at Android Authority, Crypto Briefing, Techlusive, and Let’s Data Science generally cites the testingcatalog X thread as the primary source. None of the coverage adds substantively new claims beyond what the original leak said. The corroboration is essentially "this is what testingcatalog reported" repeated across multiple outlets, which provides some confidence that the leak is real but does not provide independent verification of the specific claims.

Why bidirectional voice matters

The substantive significance of Bidi 1, if the architectural claims hold up, is the change in what AI voice interaction feels like. The current state of AI voice interaction is good for the patterns it serves and limited in specific ways:

The turn-taking pattern is unnatural. Real human conversation involves frequent interruptions, mid-sentence pauses, partial overlaps, and backchannel responses ("uh-huh," "right," "mm-hmm"). The current AI voice mode does not handle these well. The user either waits for the model to finish (which feels formal and slow) or interrupts the model (which often confuses the model or causes it to lose context).

The latency of turn-taking compounds. Even when each turn is fast, the cumulative pattern of "user speaks, brief pause, model speaks, brief pause" is noticeably slower than natural conversation. Multi-turn interactions feel labored.

Task-switching mid-conversation is awkward. If a user is asking the model to do something and decides partway through that they want something different, the user typically has to wait for the model to finish (or interrupt and start over) before the new task can begin.

A bidirectional model addresses all three of these. Interruptions are first-class behavior rather than awkward edge cases. The model can adjust based on partial input rather than waiting for complete utterances. Task-switching happens fluidly because the model is processing input continuously.

The user-experience difference would be substantial if the implementation works. The current voice mode is impressive but feels like a controlled demonstration; bidirectional voice would feel meaningfully more like talking to a person who is actually listening while they think.

The technical challenges that would need to be solved are non-trivial. Maintaining context across simultaneous bidirectional audio is harder than maintaining context across alternating audio. Acoustic echo cancellation has to work well or the model will hear its own output and try to respond to it. The model needs to decide when to pause its own speech to listen rather than talking over the user. Each of these is a hard problem; OpenAI has been working on the voice surface for several years and presumably has answers, but the leak does not detail them.

The three-tier structure

The reported High/Medium/Instant naming follows the broader OpenAI naming pattern for tiered products. The implication is that the same underlying model architecture is offered at multiple price-and-capability points, with users choosing based on workload requirements.

For voice specifically, the tier choice has more dimensions than for text models:

Latency is more visible in voice than in text. A 200-millisecond delay in text response is imperceptible; a 200-millisecond delay before the model starts speaking is noticeable. The Instant tier would presumably optimize for sub-100ms time-to-first-audio.

Capability for voice work includes things like accent comprehension across languages, handling of background noise, speaker disambiguation in multi-speaker contexts, and the quality of the synthesized voice. The High tier would presumably lead on all of these.

Cost for voice work is typically per-second of audio rather than per-token. The High tier would be meaningfully more expensive per minute than the Instant tier.

The right tier for a given workload depends on the use case. A real-time conversational assistant for consumer use probably uses Medium or High depending on the application. A high-volume transcription-with-quick-responses surface uses Instant. A specialized professional use case (translation in a clinical setting, conversational tutoring) uses High.

OpenAI has not published the pricing for the three tiers (the leak does not specify), so the cost economics will become clearer when the official announcement lands.

What the leak does not say

Several things that would be useful to know but are not in the leak as reported:

The release date. The leak indicates the model exists in some development state but does not specify when it will be released. Other OpenAI products with similar leak patterns have shipped anywhere from weeks to months after the first leak; some have been substantially delayed; some have been quietly abandoned.

The pricing. Tier structure is reported but specific dollars-per-minute or dollars-per-second pricing is not.

The deployment surface. ChatGPT and Codex are mentioned. The API surface for developer access is not. Whether Bidi 1 will be available through the standard OpenAI API for third-party applications is unclear.

The voice quality and language coverage. The leak focuses on the bidirectional architecture and does not detail the synthesized voice quality, the supported languages for the real-time translation, or the comprehension accuracy across accents and noise conditions.

The relationship with prior voice models. OpenAI has shipped several voice models (the Whisper line for transcription, the voice mode in ChatGPT, the Realtime API). How Bidi 1 relates to these (does it replace them, supplement them, share architecture with them) is unclear.

The independent benchmarks. No third-party evaluation of Bidi 1 has been published because the model is not generally available. Any performance claims that emerge between now and the official launch should be treated as preview marketing rather than as verified evaluation.

The honest reading of all of these gaps is that the leak gives us the headline (a bidirectional voice model exists in OpenAI’s development pipeline) without the operational detail. That’s normal for leaks; verification waits for the official announcement.

Competitive context

The voice AI market in 2026 has several distinct positions worth understanding:

OpenAI ChatGPT voice mode is the current incumbent and the largest by user count. The existing voice mode is turn-taking, has been refined through multiple releases, and is integrated with the broader ChatGPT product. Bidi 1 would presumably be the next-generation replacement.

Google Gemini Live is Google’s voice-first product, available in the Gemini app and integrated with Google Workspace surfaces (Gmail, Calendar, Drive). Gemini Live has been moving in the same general direction as Bidi 1 (more natural interruption handling, better real-time interaction) but as of mid-2026 still operates in a primarily turn-taking pattern.

Sesame’s Maya and Miles are the most-discussed voice models in the conversational-quality category. Sesame is a smaller company specifically focused on voice; the demos through 2025 and 2026 have been widely shared as examples of how natural AI voice can feel. Maya and Miles produce notably-natural-sounding speech, with some level of interruption handling and adaptive response. The architectural details are less public than for the major-vendor offerings.

Anthropic’s Claude voice is in beta as of mid-2026. The capability is similar in concept to the other major voice surfaces. Anthropic has been more conservative about releasing voice features at scale, with the current capability being more limited than the competitors’ but with the standard Anthropic safety posture as the differentiator.

Microsoft’s Copilot Voice is the Microsoft-integrated version, primarily available through Windows and through the Microsoft 365 surfaces. Less aggressive on the conversational-naturalness dimension than Gemini Live but more deeply integrated with Microsoft’s productivity tools.

Apple’s Siri has been undergoing substantial revision through 2025 and 2026 with the integration of LLM-based capabilities. The substantive capability has improved but the rate of improvement has lagged the major AI labs.

In this landscape, Bidi 1 (if it ships as described) would be OpenAI’s bid to set the technical standard for voice AI naturalness. The bidirectional architecture is the differentiating claim; whether the implementation matches the architectural promise will determine how it lands in the market.

Not the Jony Ive hardware

A specific point of confusion worth addressing: Bidi 1 is not the OpenAI hardware device. The two product programs are sometimes conflated in coverage but are actually separate.

The OpenAI hardware device is codenamed Sweetpea per public reporting (the Axios coverage of Chris Lehane’s January 2026 comments is the most authoritative source). The device is described as a screenless behind-the-ear wearable, audio-first in its interaction model, with a secondary pen-form-factor product codenamed Gumdrop. The hardware program emerged from OpenAI’s May 2025 acquisition of Jony Ive’s "io" startup. The first device is targeted for an H2 2026 reveal.

Bidi 1 is a software model. Sweetpea is a hardware device. The two may eventually intersect (the audio-first device is the natural form factor for a bidirectional voice model, and using Bidi 1 to power Sweetpea would be an obvious product decision), but they are distinct programs and the public information about each is largely separate.

When coverage refers to "the OpenAI device" it usually means Sweetpea. When coverage refers to a voice model or a ChatGPT update, it usually means Bidi 1 or the existing ChatGPT voice mode. Watch for the specific noun being used.

What to do with this information

For developers, designers, or organizations considering AI voice capabilities, the practical guidance is straightforward:

Do not commit to Bidi 1 yet. The model is not officially announced and is not generally available. Building a product roadmap that depends on Bidi 1’s specific capabilities at a specific date is premature.

Continue with current voice surfaces for production work. The existing ChatGPT voice mode, Gemini Live, and the various competitors are working and shipping. Production workloads should be built on what’s available now.

Watch for the official announcement. When OpenAI ships Bidi 1, the official communication will confirm or correct the leaked details. The specific tier names, pricing, API availability, and feature set may differ from the leak.

Plan architecturally for bidirectional voice. Even before Bidi 1 ships, the trend toward more conversational voice interaction is clear across the major vendors. Voice-product architectures should anticipate that turn-taking limitations will be relaxed over the next 12 to 18 months across the category, not just at OpenAI.

Test with early access if available. Once OpenAI opens access (either as a public release or through a developer preview), the right way to evaluate Bidi 1 is to actually use it on representative workloads. The leak’s claims need to be tested against real applications.

Frequently asked questions

When will Bidi 1 be released? Unknown. The leak does not specify and OpenAI has not announced anything. Leaks of OpenAI products have preceded releases by weeks to months historically; some leaked products have shipped substantially delayed or not at all. Watch for official announcement.

Will Bidi 1 replace the current ChatGPT voice mode? Probably eventually, but the specific transition is not described. The likely pattern is that Bidi 1 ships as an opt-in option, becomes the default after a stabilization period, and the prior voice mode is deprecated. Timing for each step is unknown.

Is Bidi 1 the same as the Jony Ive hardware device? No. Bidi 1 is a software model. The Ive hardware (codename Sweetpea, with a Gumdrop pen variant) is a separate program. They may eventually be used together, but they are distinct products.

Will Bidi 1 work in non-English languages? The leak mentions real-time translation, which suggests multi-language support. The specific language coverage is not detailed and will become clearer at the official launch.

Is the leak credible? The original source (testingcatalog on X) has a track record of accurate OpenAI leaks. Multiple secondary outlets corroborated. The specific claims are consistent across sources, suggesting a real underlying leak. But credibility of the source does not equal verification of the specifics; the actual product may differ from what was leaked.

Can I use Bidi 1 through the OpenAI API? Not yet. The model is not generally available. Whether it will be exposed through the API at launch is one of the things the leak does not specify.

Will Bidi 1 cost more than the current voice mode? Almost certainly the High tier will. The Instant tier may be cheaper than the current voice mode (since it’s optimized for cost). The Medium tier will probably be in a similar range. Actual pricing will come with the announcement.

Does Bidi 1 work for languages other than English? The real-time translation feature implies multi-language support but the specific language list has not been published.

Is there a way to get early access? Not publicly. If a developer preview opens, it will likely be announced through the OpenAI developer channels. Watch openai.com, the OpenAI Twitter/X account, and the OpenAI developer documentation.

Should I be skeptical of any specific claim in the leak? The "three intelligence tiers" claim is the one with the most secondary corroboration and the most consistency with OpenAI’s broader naming patterns; this is the most credible part. The "simultaneous speech and listening" claim is the substantive architectural claim and is the one that most needs verification through actual product use. The "expected Codex landing" claim is plausible but less central to the announcement.

Will Anthropic, Google, or others release competing bidirectional voice models? Almost certainly. The category is moving in this direction across all major vendors. The timing and naming will vary; the architectural direction is shared.

This piece is a leak-stage update; we will publish a follow-up with verified information when OpenAI officially announces the model.

Digital Matters

Artificial Intelligence (AI) Desk