Under the hood is Sesame's Conversational Speech Model (CSM), a neural text-to-speech system developed from 2024 through 2026 . Unlike traditional TTS pipelines that read text aloud with flat intonation, CSM processes the full dialogue context — including recent conversational turns — and generates audio directly, incorporating timing, tone, and emotional modulation in real time
.
Sequoia Capital, an investor, noted that the model "doesn't just translate LLM output into audio — it generates speech directly, capturing the rhythm, emotion, and expressiveness of real dialogue" .
The model comes in sizes ranging from 1 billion to 8 billion parameters, a deliberate choice to keep it lightweight enough to eventually run on consumer-grade and wearable hardware . An open-source 1B-parameter version was released on GitHub under an Apache 2.0 license, with checkpoints hosted on Hugging Face
.
Sesame describes the app as "Privacy First," with App Store copy stating that conversations stay between the user and Sesame and are "secure and private by design" . The company's formal privacy policy, last updated May 7, 2026, explains how user data is collected and processed across its website, app, and services, including its virtual conversational agents
.
What the publicly available documents do not reveal are the granular user controls that many privacy-conscious users look for — tools like manual conversation deletion, opt-out from model training, or fine-grained data retention settings. The privacy policy references procedures for destroying or anonymizing personal information when it is no longer needed, but does not specify timelines or user-facing controls for requesting deletion .
Given that voice data is inherently sensitive and potentially biometric, this gap will likely draw scrutiny as Sesame scales and as regulators continue to tighten rules around AI data handling. For users seeking exact terms, the full policy is available at sesame.com/privacy.
Sesame's app is a means to an end. The company's long-term strategy revolves around embedding its voice agents into proprietary lightweight smart glasses designed for all-day wear, with a target launch window of 2027 .
The logic is both technical and commercial. On the technical side, the CSM's small parameter counts (1B–8B) are deliberately sized for on-device deployment, meaning the glasses could run the voice model locally rather than depending on a cloud round-trip . On the business side, Sesame sees controlling both the software and the hardware as a way to capture subscription fees and higher-margin device sales from the same user
.
This "hardware-first" strategy lets Sesame control the full experience — microphone behavior, wake word, latency, battery life, and the subscription bundle — rather than competing inside a third-party app ecosystem . The founding team's track record at Oculus and Meta, where they helped build consumer VR/AR hardware, gives this hardware ambition a credibility that a pure-software startup might lack
.
Public statements promise glasses with "high-quality audio" and an AI companion that can "observe the world alongside you" . Reports mention eye-tracking integration and real-time conversational feedback, though technical specifications remain sparse
.
On October 21, 2025, Sesame closed a $250 million Series B round, following earlier backing from Andreessen Horowitz . Investors include Sequoia Capital, which published a detailed partner article outlining the firm's thesis that voice-first AI represents a fundamental shift in human-computer interaction
.
The capital is earmarked for advancing the voice model, expanding the engineering team, and — critically — accelerating the development of companion wearable hardware . The round pushed Sesame toward a reported valuation of roughly $1 billion
.
Sesame enters a field where Apple, Google, Amazon, and OpenAI already have voice assistants with massive installed bases. Its path to differentiation rests on three bets:
The risks are real. Well-funded incumbents can add voice improvements over time. Audio hardware is notoriously difficult to design and manufacture at scale, especially light enough for all-day wear and stylish enough to gain consumer adoption. And the privacy gaps around voice data handling could invite regulatory and user backlash precisely as Sesame tries to build trust.
Whether Sesame's conversational warmth and hardware ambition can carve out a defensible position remains an open question — one that the iOS app launch and the coming glasses release will start to answer.
Comments
0 comments