Speech-to-Text for Video and Audio Assets

Bynder can automatically generate transcripts for audio and video assets in your Bynder DAM via Speech-to-Text. This feature automatically converts audio content for multiple languages into text (transcriptions), making these assets easily searchable. Users can locate keywords or phrases in videos and audio files without manually adding individual tags. Clicking a word in the generated transcript will play the media from that specific location. In addition, you can improve the accessibility of your content by adding closed captions to your videos. Bynder also enables users to do in-line editing of transcripts for maximum accuracy.

This feature/solution requires your Customer Success Contact to enable, but then individual permissions can be done by the Bynder Admin.

Don't yet have Bynder? Start Here!

Once enabled users with the edit assets permission can edit speech-to-text.

Download Transcripts for Video and Audio Assets

The subtitles will be displayed within Bynder only. Subtitles will not appear for assets embedded outside of Bynder or in any other Bynder solutions.

Navigate to your Portal.
Select the Assets tab.
You can use the search bar to search for the video.
Select and open the video.
Select Transcript in the Asset Detail View.
A new window will pop up where you can view the transcript. Clicking on a word in the transcript will bring you to the exact location in the video. In order to edit users must have the correct permission and double-click the word to edit the text.
You will be able to view two tabs, the Transcript tab, and the Details tab.
- On the Details tab, you can view the date generated, length, language, word count, and confidence score.
- On the Transcript tab, you can view the text, the timing, and which file formats you can download the transcripts.
On the Transcript tab, click on the three file formats (SRT, VTT, TXT) to download the transcripts.

Speech-to-Text Settings

Navigate to your Assets.
Use the search bar to search for content spoken in the video.
Once you find the video, select next to Transcript.
In the bottom right of the selected video, click and select from the following options.
- Captions: Enable/disable captions
- STT: In-line transcript editing for ease of correction.
- Playback speed: Adjust the speed from Normal to 0.5, 0.75, 1.25 or 1.5.
- Picture-in-picture: View the video in a separate, smaller window if you’d like to switch between tabs while watching (available on all browsers except Firefox).

Supported Languages

Speech-to-Text offers support for 100 languages, with the following as the most common. If the language you're interested in is not included in the list, please reach out to your Customer Success Contact to verify its support status.

Arabic, Modern Standard	Japanese
Belarusian	Korean
Bosnian	Latvian
Bulgarian	Lithuanian
Catalan	Macedonian
Chinese, Simplified	Malay
Chinese, Traditional	Norwegian Bokmål
Croatian	Polish
Czech	Portuguese
Danish	Portuguese, Brazilian
Dutch	Romanian
English	Russian
Estonian	Serbian
Finnish	Slovak
French	Slovenian
German	Spanish
Greek	Tagalog/Filipino
Hebrew	Tamil
Hindi, Indian	Thai
Hungarian	Turkish
Icelandic	Vietnamese
Indonesian
Italian

File Restrictions

The following files cannot be transcribed:

Files larger than 2GB
Files longer than 4 hours
Files shorter than 3 seconds

Confidence Score

Transcripts will not be shown if their confidence score is less than 50 out of 100

A confidence score indicates the accuracy of a transcript. See below for some of the factors that can affect the confidence score:

Audio Quality: The audio input quality can significantly affect the confidence score. Clear, noise-free audio produces higher confidence, while poor quality or loud background audio results in lower scores.
Speaker Variability: If multiple speakers are in the audio, this can lower the confidence score, as distinguishing between different voices can be challenging.
- Currently, single-language identification is supported. The predominant language will be transcribed if two languages are spoken in the media.
Language Complexity: Complex vocabulary, accents, and dialects can impact the confidence score. Uncommon or technical terms may lead to lower scores.

Updated April 30, 2026 13:46

Speech-to-Text for Video and Audio Assets

Download Transcripts for Video and Audio Assets

Speech-to-Text Settings

Supported Languages

File Restrictions

Confidence Score

<%= previousTitle %>

<%= nextTitle %>

In this article

Categories

Toggle navigation menu

<%= category.name %>