Awhile back, we talked about the kinds of metadata, we talked about why it’s so important to write good metadata, and all that discussion coincided with our coverage of the metadata section of the YouTube Advertiser Playbook. The big reason why it’s important to properly classify and describe your videos with actual words is that facial and speech recognition software isn’t so good that it can tell you accurately, or with context, what a video is. Well, algorithms for facial recognition are getting a lot better, as a neural network built at Google learned to identify cats on YouTube…without knowing what a cat is.
Facial Recognition Getting Eerily Accurate…Not Nearly Perfect Yet
Google’s highly secretive X lab, which is so secretive that people narrow its location down to “possibly the California Bay Area,” built a neural network of 16,000 computers with 1 billion connections and let it browse YouTube.
The “brain” simulation was exposed to 10 million randomly selected YouTube video thumbnails over the course of three days and, after being presented with a list of 20,000 different items, it began to recognize pictures of cats using a “deep learning” algorithm. This was despite being fed no information on distinguishing features that might help identify one.
In other words:
“We never told it during the training, ‘This is a cat,’” Jeff Dean, the Google fellow who led the study, told the New York Times. “It basically invented the concept of a cat.”
It had a 74.8% accuracy in identifying cats, an 81.7% accuracy in human faces and 76.7% correct in identifying human parts. These findings, plus a whole a lot more that we normal humans can’t understand, will be presented at the International Conference on Machine Learning in Edinburgh, Scotland going on this week. The paper is called Building high-level features using large scale unsupervised learning.
Here’s a highlight from that paper:
Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
One of the computer scientists working on the project, Andrew Ng, does not think they are about to stumble upon the perfect algorithm. It should be said that in the study, the computers came back with a 15.8% accuracy on 20,000 different objects, which is an amazing 70% improvement on the previous high. But while the computer was able to identify cat faces without knowing what a cat was, and that’s totally amazing, we’re still a long way off from a computer being able to look at a video and producing keywords that would accurately describe it in even the broadest of terms.
But, they’re working on that, aren’t they? We hear 15.8% recognition with 20,000 objects and 74.8% on cats and we hear “it’s a long way off,” but how long before we start seeing those numbers getting so good it’s scary? I wouldn’t put anything past Google.