Image and Speech Recognition In Video SEO: The Digitalsmith Interview

Image and Speech Recognition In Video SEO: The Digitalsmith Interview

Share on

I interviewed Chief Executive Officer and Co-Founder of the video publishing platform and indexing solutions provider Digitalsmiths, Ben Weinberger, on how their combination of  image interpretation and speech recognition tools can enhance search results and SEO campaigns for businesses of all sizes.

Challenges with video metadata indexing and search optimization

Perhaps the biggest challenge with pulling data from video content has been that there are so many ways the data can be interpreted, and deciding what’s of most relevancy for the content provider’s own audience. For some companies, it is not time-efficient or cost-efficient for a human being to manually sort and add metadata for each single video that’s being produced. So Hw do you have an intelligent technology, say an artificial intelligence, that can recognize what you need to pull, aggregate, organize and search for effectively. Certainly for your own website, but also for the search engines that can index that metadata for optimized listings in keyword searches on their sites, and serve as an ideal SEO resource?

That is where Digitalsmiths comes in, according to its CEO Ben Weinberger. Digitasmiths is video indexing and digital content publishing technology provider serving major Hollywood studios, web video destinations, media companies and advertisers. According to their media information, Digitalsmiths is based on a proprietary video search algorithm combining image interpretation software with cutting-edge speech recognition tools. “The company’s proprietary computer-vision based video indexing, search and interpretation algorithms empower content owners and publishers to efficiently monetize their digital video content, and advertisers to automatically target ads to thematically relevant video content.

While there are other Software as a Service (SaaS) video publishing solutions offering speech recognition and automated keyword categorization as part of their SEO features, DigitalSmiths’s image recognition abilities is of particular benefit to video content providers specializing in, well, featuring people. These can be celebrity sites, (TMZ), media sites (Warner Brothers), and specialty community sites with recognizeable and popular individuals (Essence) – all of which are cliens of DigitalSmiths. (In fact, TMZ, one of the most popular and advanced media websites providing video content for mass audiences, dropped its account with Brightcove for Digitalsmiths – certainly deserving of real notice.)


People-friendly video search – fun, but with flaws

An example below shows how Digitalsmiths video search feature actually appears within the video itself. A user can first filter their search in a number of combinations: by a drop-down menu of major “celebrity” types, by a menu of “locations” (for the TMZ site, done as “hot spots” where you would be likely to see celebrities hang out), and then a text entry of “dialogue.”


This is a certainly a fun feature, but I think it would have been more helpful if it could do an automated filter when you select a certain menu item to only available results. Otherwise I often came up with zero results for the combinations I tried. (And in one instance, a “location” was actually the name of a celebrity.

ReelSEO Interview with Digitalsmith’s Ben Weinberger

Grant: There are more online video publishing platforms today than ever before. How does Digitalsmith stand out from them?

ben-weinberger-headshotBen: The main difference is that we don’t just focus on the publishing nuts-and-bolts of syndicating content; we actually analyze it and providing more data and information around the content as well.

An example would be when we are working with a content owner, we will actually be indexing their video, and generating a very deep set of metadata about their video, which is all time-based. So we know that any given time, a lot of information about the visuals and the audio/speech aspects can give the publisher a deep ability to monetize that content through searching, publishing – giving their partners more information to help drive results and things like search obviously, but other things like content recommendations or related content.

So how we really stand out is by providing this layer of data and enhancing the user experience for that publisher’s user base, which overall results in extended session times, creating new revenue opportunities.

Explain how your image interpretation software and speech recognition tools help you with the indexing of metadata from each video?

Our CTO and co-founder Matthew Berry has really done a great job in amassing a set of computer vision scientists and a group of PhDs that have been experts in their own fields. Examples of all these technologies will start with speech and then work over to visual. On the visual side, its things like facial recognition, scene identification and scene classification, object analysis, materials analysis.

One the speech side, some technologies that are becoming fairly mainstream like speech-to-text analysis; but then natural language processing, which takes the nuance of what people are talking about and turns it into subject matter. It’s the basics of actually translating what somebody is saying down to the exact words, but also giving them meaning. So if somebody said the word “knife,” we can tell you by examining the other words they’re saying – are they talking about a crime video, or maybe a cooking video, or an outdoor life video? How is the word knife being used? That’s the basic idea.

On the visuals end, there’s a series of things. Facial recognition is just what it sounds like – we can recognize faces. We can also help recognize duplicate faces, so we know not just every time a face is seen but if its a movie or television show, the face appears hundreds of thousands of times when you get down to the frame level. So we know every time a face is seen and where they’re seen in the video – where they come on camera, when they go off camera, things like that. Scene analysis is similar – we can break down the video and tell you when scenes start and stop. We can tell you if this was an environmental type of analysis; like, is it a scene that takes place in the dessert, the mountains, the beach, indoors/outdoors, the city, etc. But then we can also get specific and tell you, for example, is this a scene that took place not just in any city, but say for example, New York City? In the case of a television show like Seinfeld, did it take place in Seinfeld’s apartment, or Kramer’s apartment, or the diner? So it can get very specific with things like that.

It’s sort of a long but broad overview of the combined visual and speech analysis to form a time-based set of metadata about your video. So at any given point in time, Digitalsmiths can tell you where the video took place, who’s in that particular scene, what they’re saying from the specific word perspective as well what they’re talking about from a subject matter, and then sometimes objects in the scene depending on what they are – logos, broad objects, things like that.

Is your solution also SEO friendly? Can it be indexed by and optimized for the major search engines like Google?

Digitalsmiths is extremely SEO-friendly. The way that we index your data produces a very granular set of metadata. So we have essentially databases filled with this information that we can publish to your site, so that the text of the actual metadata can be exposed to a search engine. So for example, a face is not a visual element anymore – it becomes a text-based element, like “Tom Cruise” or whomever it is, that can be indexed by the search engines and readily searched by the user on somewhere like a Google or Yahoo!, or other search engine.

Is your solution scalable for the smaller-size businesses?

Digitalsmiths can definitely help a digital video business that’s looking to grow. We are always happy to help. The way we built our business model as far as revenue and structure to the actual content owner, it scales well with them. If you’re a content owner looking to publish video via online, mobile, or other type of platform, we can absolutely help you get that set up.


Video Industry

Share on

Read More Insights

© 2019 Tubular Insights & Tubular Labs, Inc.