Whisper + Diarization API
Get transcripts with diarization with a simple API to easily integrate audio AI in your applications or tooling.
Everything you need to integrate Audio AI in your application
State-of-the-art transcriptions
Get accurate transcripts in 99+ languages with Whisper Large-V3-Turbo
Built in diarization
Latest Pyannote model to diarize audio
Simple API
Easy to use, no need to worry about infrastructure
Custom Whisper model
Bring your own fine-tuned Whisper model. We handle the infrastructure
Pricing
Simple usage based pricing. Billed monthly. VAT taxes may apply.
Standard model
$0.0001
per second of audio
Enhanced model
$0.0005
per second of audio
Transcribe audio on your private cloud instance
We are building a solution to run our transcription + diarization pipeline privately on your own cloud environment. Our goal is to deliver a self-hostable package of our API, capable of being easily deployed and scaled on popular cloud providers.
Get notified
Changelog
Version 0.4.1
November 17, 2024Improved speed and stability
We have improved the speed and stability of our API. Jobs are now processed more reliably.
Version 0.4
August 19, 2024Faster transcript generating
Audio recordings are transcribed much faster than before, here's how:
- Fast GPUs: We're leveraging faster GPUs with our AI cloud provider.
- Quicker startup: Cold boot times are dramatically reduced, thanks to enhancements from our AI cloud provider.
Expect to receive results up to 5x faster!
Version 0.3
July 22, 2024Introducing the Analyze endpoint for processing transcripts.
- Summarize action: Extracts main points from conversations
- Immediate results using GPT-4o API with custom prompts
- Requires pre-processed transcripts
The endpoint is in beta and may be updated in the future. More actions are planned for upcoming releases. For details on usage and implementation, please refer to our API documentation.
Version 0.2.1
July 17, 2024Adding email-based sign-in and team member invitations.
This update expands access and enhances collaboration capabilities.
- Email Sign-In: Users can now create accounts and sign in using their email addresses. This option is available alongside the existing GitHub login method.
- Team Invitations: Account holders can invite team members to join their projects. This feature allows for easier collaboration and shared access to resources.
Version 0.2
July 12, 2024Enhanced transcription model to complement our standard offering.
This new model provides improved accuracy at the cost of slower processing times.
- The enhanced model builds upon our standard model, using LLM-based post-processing to boost transcription accuracy. It offers more detailed diarization segments and can infer speaker labels or names.
- This model is particularly effective for languages that may have lower accuracy with our standard model. Please note that unlike the standard model, the enhanced model does not provide word-level timestamps or confidence scores.