Speech-to-Text for Video and Audio Assets

  • Updated

Bynder can automatically generate transcripts for audio and video assets in your Bynder DAM via Speech-to-Text. This feature automatically converts audio content for multiple languages into text (transcriptions), making these assets easily searchable. Users can locate keywords or phrases used within videos and audio files without having to manually add individual tags. Clicking on a word in the generated transcript will play the media from that specific location. In addition, you can improve the accessibility of your content by adding closed captions to your videos.

How to Enable the Speech-to-Text Search

Please contact your Customer Success Manager to learn more about enabling this feature and any associated costs.

Download Transcripts for Video and Audio Assets

Note

The subtitles will display within Bynder only. Subtitles will not appear for assets embedded outside of Bynder (i.e., via embed code) or in any other Bynder modules.

  1. Navigate to your Portal.
  2. Select the Assets tab. 
  3. You can use the search bar to search for the video. speech-to-text-results.png
  4. Select and open the video. 
  5. Select Transcript in the Asset Detail View.Screenshot 2023-11-17 at 3.33.33 PM.png
  6. A new window will pop up where you can view the transcript.

    Note

    If you click on a word in the transcript it will bring you to the exact location in the video.

    Screenshot 2023-11-17 at 3.35.46 PM.png
  7. View the date generated, length, language, word count, and confidence score.  Screenshot 2023-11-17 at 3.37.01 PM.png
  8. Click on the three file formats (SRT, VTT, TXT)to download the transcripts.

Speech-to-Text Settings

  1. Navigate to your Assets.
  2.  Use the search bar to search for content spoken in the video.
  3. Once you find the video select Screenshot 2024-03-28 at 10.46.08 AM.png next to Transcript
  4. In the bottom right of the selected video click 4a2d44c4-d532-4cf3-92c8-8d6d2a7ebe03 and select from the following options 
    • Captions: Enable/disable captions
    • Playback speed: Adjust the playback speed from Normal to 0.5, 0.75, 1.25 or 1.5.
    • Picture-in-picture: View the video in a separate smaller window, if you’d like to switch between the tabs while watching (available on all browsers except Firefox).

Supported Languages

Speech-to-Text offers support for 100 languages, with the following as the most common:

Arabic, Modern Standard
Belarusian
Bosnian
Bulgarian
Catalan
Chinese, Simplified
Chinese, Traditional
Croatian
Czech
Danish
Dutch
English
Estonian
Finnish
French
German
Greek
Hebrew
Hindi, Indian
Hungarian
Icelandic
Indonesian
Italian
Japanese
Korean
Latvian
Lithuanian
Macedonian
Malay
Norwegian Bokmål
Polish
Portuguese
Portuguese, Brazilian
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Tagalog/Filipino
Tamil
Thai
Turkish
Ukrainian
Vietnamese

If the language you're interested in is not included in the list, please contact your Customer Success Manager to verify its support status.

File Restrictions

The following files cannot be transcribed:

  • Files larger than 2GB
  • Files longer than 4 hours
  • Files shorter than 3 seconds

speech-to-text-results.png

Confidence Score

Transcripts will not be shown if their confidence score is less than 50 out of 100

A confidence score indicates the accuracy of a transcript. See below for some of the factors that can affect the confidence score:

  • Audio Quality: The quality of the audio input can significantly affect the confidence score. Clear, noise-free audio produces a higher confidence: poor quality or loud background audio results in lower scores.
  • Speaker Variability: If multiple speakers are in the audio, this can produce a lower confidence score, as distinguishing between different voices can be challenging.
    • Currently, single-language identification is supported. If two languages are spoken in the media, the predominant language will be transcribed.
  • Language Complexity: Complex vocabulary, accents, and dialects can impact the confidence score. Uncommon or technical terms may lead to lower scores.

Was this article helpful?

0 out of 0 found this helpful

We're sorry to hear that!

Find out more in our community

Have more questions? Find out more in our community