Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz ...
Abstract: As high dynamic range video is gaining popularity, video coding solutions able to efficiently provide both low and high dynamic range video, notably with a single bitstream, are increasingly ...
Abstract: Currently, balancing low bitrate coding with speech quality is a highly debated topic in the research community. At very low bitrates, existing methods often fail to maintain speech ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results