Get Started

This short tutorial will teach you the basics of making requests to the Voiceloft APIs. The Asynchronous Speech-to-Text API delivers high-quality transcription for pre-recorded audio.

Assumptions

This tutorial assumes that you have an access token.

Turnaround time and chunking

Chunking is the act of breaking audio files into smaller segments. Voiceloft uses this method to decrease the turnaround time of audio files greater than 3 minutes in length.

Often, especially for AI transcription jobs involving shorter files, your transcript will be ready in 5 minutes or less. It generally takes no longer than 15 minutes to return longer audio files.

The expected turnaround time is 12 to 24 hours for human transcription jobs.

ATTENTION

If you require a faster turnaround time, please contact the support team at [email protected]

File formats

The Asynchronous Speech-to-Text API supports all the file formats supported by FFmpeg. This includes common media formats such as MP3, MP4, Ogg, WAV, PCM and FLAC and many more.

API limits

The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:

10,000 transcription requests submitted every 10 minutes.
500 transcriptions processed every 10 minutes. Any submissions over this will be accepted but put into a queue and not started until the next interval.
Maximum audio duration of 17 hours.
File uploads submitted as multipart/form-data requests to the /upload endpoint have a concurrency limit of 5 and a file size limit of 2 GB per request.

ATTENTION

These default limits are configurable by Voiceloft support. To adjust these limits, contact the support team at [email protected]

File transcription

Two POST request formats can be used to transcribe a file: application/json or multipart/form-data.