YouTube Speech-to-Text Recognition Functionality

YouTube Speech-to-Text Recognition Functionality

Share on

YouTube, you know that Google-owned online video service, has decided that rich metadata may be the way to go. They believe that it could make a big splash and change the way that videos are indexed.

The hot topic of the year so far has been increasing the usefulness and effectiveness of video indexing. That means being able to attach more indexable data to the videos themselves. Adobe recently announced the inclusion of speech-to-text automatic transcription in their upcoming video editing tools including Premier and Flash while other video search engine sites like Blinkx and Truveo have been using different technologies to make video more searchable by looking inside of them.

Announced on ReelSEO in July, YouTube is testing speech-to-text on U.S. Presidential campaign videos. This has been in action since June and has massive potential to make all video on YouTube easier to search. The searches should also make the results far more relevant than they have been.

The speech in the political video is translated to text which is made searchable so that you can find all instances of a specific phrase like say “video search engine optimization.” Well, not many politicians are talking about VSEO so perhaps a far better phrase to search for would be ‘tax policy’ or ‘war in Iraq.’ The service is offered on both Presidential campaigns as well as other political videos and can be found here.

Essentially YouTube is using speech-to-text to listen to the audio and analyze what it hears. This is then translated into text and is embedded in the form of metadata onto the video itself, much like what Adobe is promising in the near future in the video editing tools. While the video is playing you can place the cursor on the highlighted areas to find out the context of the exact phrase. Unfortunately you cannot get the full transcript of the video played at the bottom as the video plays.

The YouChoose service is not foolproof. Searching on ‘video game’ brought erroneous results and the transcribed text was not exactly the same as what Barack Obama said in his Father’s Day speech 2008. In the Flint, Michigan speech he again reiterates that “parents must turn off the television set and put away the video games and read to your child” where the transcription says “television set well the way the video game and read your child.” But it did accurately pinpoint the spot in the videos where he said the phrase.

The YouTube YouChoose feature is interesting and does hold some promise for the future of video search; however it does almost nothing to address issues of accessibility which the Adobe feature may. The transcription of the video cannot be played and so is essentially only useful for search purposes on YouTube. The Adobe-based feature should be able to allow for the viewing of the entire transcription during video playback thanks to the embedded nature of the metadata and added functionality in the Flash player.

Both are a step in the right direction in regards to video search optimization and video indexing on the web. They are the paving stones to the future and are the first steps toward a far searchable video-based web.


Video Industry

Share on

Read More Insights

©2021 Tubular Insights & Tubular Labs, Inc.