Whisper + Diarization API

Whisper + Diarization API

Whisper + Diarization API

  • Get transcripts with diarization with a simple API to easily integrate audio AI in your applications or tooling.

  • Bring your own fine-tuned Whisper model

  • Get summaries and more of your transcripts

Everything you need to integrate Audio AI in your application

State-of-the-art transcriptions

Get accurate transcripts in 99+ languages with Whisper Large-V3

Built in diarization

Latest Pyannote model to diarize audio

Summaries and more

NEW

LLM-based transcript analysis like summaries, action items and sentiment analysis

Custom Whisper model

Bring your own fine-tuned Whisper model. We handle the infrastructure

Simple API

A breeze to build with and easy to use

Pricing

Pricing

Pricing

Simple usage based pricing. Billed monthly. VAT taxes may apply.

Simple usage based pricing. Billed monthly. VAT taxes may apply.

Simple usage based pricing. Billed monthly. VAT taxes may apply.

Pay-as-you-go

Transcribe | Standard model

$0.0001 per second of audio

Transcribe | Standard model

$0.0001 per second of audio

Transcribe | Standard model

$0.0001 per second of audio

Transcribe | Enhanced model

$0.0005 per second of audio

Transcribe | Enhanced model

$0.0005 per second of audio

Transcribe | Enhanced model

$0.0005 per second of audio

Analyze | GPT-4o

$0.005 per 1k input tokens + $0.015 per 1k output tokens

Analyze | GPT-4o

$0.005 / 1k input tokens + $0.015 / 1k output tokens

Analyze | GPT-4o

$0.005 per 1k input tokens + $0.015 per 1k output tokens

Transcribe audio on your private cloud instance

We are building a solution to run our transcription + diarization pipeline privately on your own cloud environment. Our goal is to deliver a self-hostable package of our API, capable of being easily deployed and scaled on popular cloud providers.


Interested? Let us know —>

Changelog

Changelog

Changelog

Version 0.4

August 19, 2024

Faster transcript generating

Audio recordings are transcribed much faster than before, here's how:

  • Fast GPUs: We’re leveraging faster GPUs with our AI cloud provider.

  • Quicker startup: Cold boot times are dramatically reduced, thanks to enhancements from our AI cloud provider.


Expect to receive results up to 5x faster!

Details coming soon

Version 0.3

July 22, 2024

Introducing the Analyze endpoint for processing transcripts.

  • Summarize action: Extracts main points from conversations

  • Immediate results using GPT-4o API with custom prompts

  • Requires pre-processed transcripts


The endpoint is in beta and may be updated in the future. More actions are planned for upcoming releases. For details on usage and implementation, please refer to our API documentation.

Details coming soon

Version 0.2.1

July 17, 2024

Adding email-based sign-in and team member invitations.

This update expands access and enhances collaboration capabilities. 

New features: 

  • Email Sign-In: Users can now create accounts and sign in using their email addresses. This option is available alongside the existing GitHub login method. 

  • Team Invitations: Account holders can invite team members to join their projects. This feature allows for easier collaboration and shared access to resources.

Details coming soon

Version 0.2

July 12, 2024

Enhanced transcription model to complement our standard offering. This new model provides improved accuracy at the cost of slower processing times. 


Key features of the enhanced model: 

  • The enhanced model builds upon our standard model, using LLM-based post-processing to boost transcription accuracy. It offers more detailed diarization segments and can infer speaker labels or names. 

  • This model is particularly effective for languages that may have lower accuracy with our standard model. Please note that unlike the standard model, the enhanced model does not provide word-level timestamps or confidence scores. 

Details coming soon