The action Transcript audio with AI

Modified on Fri, 15 Aug, 2025 at 9:48 AM

Purpose

Transcribe audio or video files to plain text using an AI speech‑to‑text model. The action sends the file to the selected AI provider (OpenAI) and returns a text transcript that can be saved, inserted into templates, or used by subsequent actions (emailing, send to communication tools, CRM, translation, etc.).

Where to add

Add this action in any scenario that receives audio or video files, or after an earlier action that extracts an audio track from other formats.

Settings to transcript audio file to text

Main settings

AI model
Select the OpenAI speech model from the AI model dropdown. (Check your Rofiles version periodically, additional models may be added in future updates; the UI shows available models.)
API key (required)
Enter your OpenAI API key. You must have an account and active API access with OpenAI.
Note: Get your OpenAI key here >
Prompt
Optionally enter an instruction prompt to guide the transcription (for example: “Transcribe with timestamps and speaker labels” or “Produce a verbatim transcript with punctuation”). You can also use Insert fields to include dynamic metadata in the prompt.

Supported file formats

The action supports the audio/video types accepted by OpenAI’s speech models: .mp3, .mp4, .mpeg, .mpga, .m4a, .wav, .webm (and other formats shown in the UI). If you supply another format, convert it first or use a prior action that extracts or converts audio.

Output

The AI returns a plain-text transcript. This text becomes the action output and can be:
- Saved to disk as .txt
- Send by email, uploaded to an FTP
- Published in your Teams - Slack - Telegram channel
- ...

How to configure — quick steps

Add the Transcript Audio with AI action to your scenario.
Select the AI model from the AI model dropdown.
Enter your AI API key in the Api key field.
(Optional) Edit the Prompt to control transcription style, include formatting rules, request timestamps or speaker labels, etc.
Save the action and run or test your scenario.

Behavior notes and tips

Long files and size limits: AI providers have input size/time limits. For long recordings, split them into smaller chunks before transcribing and then reassemble transcripts if necessary.
Prompts: use specific prompts to request timestamps, speaker diarylation, verbatim transcription, profanity filtering, or formatting (e.g., “Include timestamps every 30 seconds” or “Label speakers as Speaker 1 / Speaker 2”). Results vary by model capability.
Speaker labels: automatic speaker diarylation quality varies; for high accuracy use a model that explicitly supports speaker separation or include diarylation instructions in your prompt.
Cost & quotas: transcription uses API credits. Monitor usage and quotas to control costs.

Privacy, security and compliance

Your audio/video content is transmitted to OpenAI for processing. Confirm that sending data to this third‑party provider complies with your organizational policies and applicable regulations.

Error handling & troubleshooting

Authentication errors: verify the API key is valid and has transcription permissions.
Timeouts / size errors: split the file into smaller segments and retry.
Low accuracy: improve audio quality, clarify the prompt, or try different prompt examples.
No output: confirm the file is a supported format and that the file is readable by the service (not encrypted or corrupted).

Example prompts

“Transcribe the following speech verbatim. Include timestamps for each paragraph in [HH:MM:SS] format.”
“Produce a clean, punctuation-correct transcript. Remove filler words such as ‘um’ and ‘uh’.”
“Transcribe and label speakers as Speaker 1, Speaker 2. Include timestamps every 30 seconds.”

Notes

The action in your software currently uses OpenAI models for transcription. If additional providers or models are added by an update, they will appear in the AI model dropdown.
The UI includes a help link for API key instructions — use it if you need provider-specific setup details.

This action makes it simple to convert audio/video into text: select the AI model, enter your API key, optionally tune the prompt, and the action returns a transcript ready for downstream use.