Whisper + Diarization API

Get transcripts with diarization with a simple API to easily integrate audio AI in your applications or tooling.

curl --request POST \
--url https://api.spectropic.ai/v1/transcribe \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data
"url": "https://example.com/file.mp3",
"model": "enhanced",
"numSpeakers": 2,
"language": "en",
"vocabulary": "Spectropic, AI, LLama, Mistral, Whisper,",
"webhook": "https://example.com/webhook"

Everything you need to integrate Audio AI in your application

State-of-the-art transcriptions

Get accurate transcripts in 99+ languages with Whisper Large-V3-Turbo

Built in diarization

Latest Pyannote model to diarize audio

Simple API

Easy to use, no need to worry about infrastructure

Custom Whisper model

Bring your own fine-tuned Whisper model. We handle the infrastructure

Pricing

Simple usage based pricing. Billed monthly. VAT taxes may apply.

Standard model

$0.0001

per second of audio

Enhanced model

$0.0005

per second of audio

Transcribe audio on your private cloud instance

We are building a solution to run our transcription + diarization pipeline privately on your own cloud environment. Our goal is to deliver a self-hostable package of our API, capable of being easily deployed and scaled on popular cloud providers.

Get notified

Changelog

Version 0.4.1

November 17, 2024

Improved speed and stability

We have improved the speed and stability of our API. Jobs are now processed more reliably.

Version 0.4

August 19, 2024

Faster transcript generating

Audio recordings are transcribed much faster than before, here's how:

  • Fast GPUs: We're leveraging faster GPUs with our AI cloud provider.
  • Quicker startup: Cold boot times are dramatically reduced, thanks to enhancements from our AI cloud provider.

Expect to receive results up to 5x faster!

Version 0.3

July 22, 2024

Introducing the Analyze endpoint for processing transcripts.

  • Summarize action: Extracts main points from conversations
  • Immediate results using GPT-4o API with custom prompts
  • Requires pre-processed transcripts

The endpoint is in beta and may be updated in the future. More actions are planned for upcoming releases. For details on usage and implementation, please refer to our API documentation.

Version 0.2.1

July 17, 2024

Adding email-based sign-in and team member invitations.

This update expands access and enhances collaboration capabilities.

  • Email Sign-In: Users can now create accounts and sign in using their email addresses. This option is available alongside the existing GitHub login method.
  • Team Invitations: Account holders can invite team members to join their projects. This feature allows for easier collaboration and shared access to resources.

Version 0.2

July 12, 2024

Enhanced transcription model to complement our standard offering.

This new model provides improved accuracy at the cost of slower processing times.

  • The enhanced model builds upon our standard model, using LLM-based post-processing to boost transcription accuracy. It offers more detailed diarization segments and can infer speaker labels or names.
  • This model is particularly effective for languages that may have lower accuracy with our standard model. Please note that unlike the standard model, the enhanced model does not provide word-level timestamps or confidence scores.