top of page

Moshi, Kyutai voice-based model achieves fastest response time

Jul 15

1 min read

1

0

0

Kyutai, an open-source research lab based in France and funded by French Billionaire Xavier Niel and Eric Schmidt, has released a voice-based AI model just six months after its founding.


The model, named Moshi, was built with voice as the native communication method, unlike other players like Open AI, which typically use text. In the case of GPT-4o, a voice conversation is first transcribed to text, passed to the model, which responds in text, and then converted back to a voice response.


Building voice capabilities into the model from the start has enabled the lab to achieve an impressive response time of 160 milliseconds. This allows the model to engage in nearly natural conversations by listening to the user's voice and responding directly with voice, without any text interactions.


Moshi can interrupt its interlocutor in a discussion, a common human behavior, by using the predictive features of LLMs. It can also reproduce any voice from just a 7-second sample.


These capabilities open up numerous applications, from more natural virtual assistants to advanced customer service solutions.

Comments

Share Your ThoughtsBe the first to write a comment.
bottom of page