This short tutorial will teach you the basics of making requests to the Voiceloft APIs. The Asynchronous Speech-to-Text API delivers high-quality transcription for pre-recorded audio.
This tutorial assumes that you have an access token.
Chunking is the act of breaking audio files into smaller segments. Voiceloft uses this method to decrease the turnaround time of audio files greater than 3 minutes in length.
Often, especially for AI transcription jobs involving shorter files, your transcript will be ready in 5 minutes or less. It generally takes no longer than 15 minutes to return longer audio files.
The expected turnaround time is 12 to 24 hours for human transcription jobs.
ATTENTION
If you require a faster turnaround time, please contact the support team at support@voiceloft.com
The Asynchronous Speech-to-Text API supports all the file formats supported by FFmpeg. This includes common media formats such as MP3, MP4, Ogg, WAV, PCM and FLAC and many more.
The following default limits apply per user, per endpoint for the Asynchronous Speech-to-Text API:
multipart/form-data
requests to the /upload
endpoint have a concurrency limit of 5 and a file size limit of 2 GB per request.ATTENTION
These default limits are configurable by Voiceloft support. To adjust these limits, contact the support team at support@voiceloft.com
Two POST
request formats can be used to transcribe a file: application/json
or multipart/form-data
.