Last Update: 7/13/2025
OpenAI Audio Transcription API
The OpenAI Audio Transcription API allows you to convert audio into text using OpenAI's speech recognition models. This document provides an overview of the API endpoints, request parameters, and response structure.
Endpoint
POST https://platform.llmprovider.ai/v1/audio/transcriptions
Request Headers
Header | Value |
---|---|
Authorization | Bearer YOUR_API_KEY |
Content-Type | multipart/form-data |
Request Body
Parameter | Type | Description |
---|---|---|
file | file | The audio file object (not file name) to transcribe, in one of these formats: flac , mp3 , mp4 , mpeg , mpga , m4a , ogg , wav , or webm . file maxsize <= 20M |
model | string | ID of the model to use (e.g., whisper-1 ). |
prompt | string | (Optional) Text to guide the model's style or continue a previous audio segment. |
response_format | string | (Optional) The format of the transcript output (json , text , srt , verbose_json , or vtt ). Default is json . |
temperature | number | (Optional) The sampling temperature, between 0 and 1. Default is 0. |
language | string | (Optional) The language of the input audio (e.g., en , es , fr ). |
timestamp_granularities[] | array | (Optional) The timestamp granularities to populate for this transcription. |
Response Body
The transcription object
or a verbose transcription object
.
The transcription object(JSON)
Parameter | Type | Description |
---|---|---|
text | string | The transcribed text. |
{
"text": "Hello, this is the transcribed text from the audio file."
}
The transcription object (Verbose JSON)
Parameter | Type | Description |
---|---|---|
task | string | The task performed by the model. |
language | string | The language of the input audio. |
duration | number | The duration of the audio in seconds. |
segments | array | Segments of the transcribed text and their corresponding details. |
text | string | The transcribed text. |
words | array | Extracted words and their corresponding timestamps. |
{
"task": "transcribe",
"language": "en",
"duration": 2.95,
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.95,
"text": "Hello, this is the transcribed text from the audio file.",
"tokens": [
50364,
2425,
11,
359,
307,
1161,
1123,
422,
264,
1467,
1780
],
"temperature": 0.0,
"avg_logprob": -0.458,
"compression_ratio": 0.688,
"no_speech_prob": 0.0192
}
],
"text": "Hello, this is the transcribed text from the audio file."
}
Example Request
- Shell
- nodejs
- python
curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="whisper-1"
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const formData = new FormData();
formData.append('file', fs.createReadStream('audio.mp3'));
formData.append('model', 'whisper-1');
axios.post('https://platform.llmprovider.ai/v1/audio/transcriptions', formData, {
headers: {
'Authorization': `Bearer ${YOUR_API_KEY}`,
...formData.getHeaders()
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error:', error);
});
import requests
audio_file = open("audio.mp3", "rb")
files = {
"file": audio_file
}
headers = {
"Authorization": f"Bearer {YOUR_API_KEY}"
}
response = requests.post(
"https://platform.llmprovider.ai/v1/audio/transcriptions",
headers=headers,
files=files,
data={
"model": "whisper-1"
}
)
print(response.json())
For more details, refer to the OpenAI API documentation.