Cohere drops its first voice model, and it’s open source

Cohere just entered the speech recognition game. The enterprise AI company launched Transcribe, its first-ever voice model, built specifically for automatic speech recognition (ASR), TechCrunch AI reports.

What makes this interesting: Transcribe is open source, weighs in at just 2 billion parameters, and runs on consumer-grade GPUs. That’s a deliberate move aimed at teams who want to self-host without renting a data center.

What Transcribe brings to the table

  • 14 languages supported: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic
  • Speed: processes 525 minutes of audio per minute, which is impressive for a model this size
  • Benchmark leader: claims the top spot on the Hugging Face Open ASR leaderboard with a word error rate (WER) of 5.42, beating Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B Speech
  • Human eval results: 61% average win rate over competitors when human evaluators judged accuracy, coherence, and usability
  • Price: free through Cohere’s API

Where it falls short

Cohere acknowledges Transcribe struggles with Portuguese, German, and Spanish transcription compared to rivals. For a model supporting 14 languages, those gaps matter, especially for enterprise customers operating across Europe and Latin America. Worth watching whether Cohere addresses this in future updates.

How you can access it

Cohere is making Transcribe available in three ways:

  1. Open source for self-hosting on consumer GPUs
  2. Free via Cohere’s API
  3. Model Vault, Cohere’s managed inference platform

The company also plans to integrate Transcribe into North, its enterprise agent orchestration platform. That’s the real play here. A free, capable transcription model becomes a gateway into Cohere’s broader enterprise stack.

Why this matters

Speech recognition is having a moment. Apps like Granola and Wispr Flow are driving demand for accurate note-taking and dictation, and the enterprise market wants models they can run on their own infrastructure. Cohere is positioning Transcribe right at that intersection: open source, lightweight, and enterprise-ready.

The timing also lines up with Cohere’s growth trajectory. According to TechCrunch AI, the company reportedly told investors it hit $240 million in annual recurring revenue in 2025, and CEO Aidan Gomez has hinted at a potential IPO “soon.” Launching a competitive open source model builds developer goodwill and expands Cohere’s footprint right when it matters most.

This is a smart entry point. Cohere isn’t trying to compete with massive multimodal voice assistants. It’s targeting the specific, high-demand task of transcription and doing it with a model small enough to run locally. For teams that need reliable speech-to-text without sending audio to third-party servers, Transcribe is worth a serious look.

Full details are available at TechCrunch AI.

Scroll to Top