YouTube is something of a cesspool, with pockets of exceptional quality here and there. Even the higher quality videos have an ephemeral aspect, mysteriously vanishing or being marked Private, from one day to the next. Others succumb to the more prosaic, account suspended due to multiple copyright violations. Illegal uploads of major recording label artists abound, or did. YouTube is also becoming a go-to destination for low-fidelity live concert recordings.
There’s no shortage of fee-based alternatives, so I’m not complaining.
YouTube LOL search algorithm
Google Research developed an aLOLgorithm, “Quantifying comedy on YouTube: why the number of o’s in your LOL matter” to measure YouTube videos’ hilarity. Let’s just refer to it as the LOLgorithm, for my ease of typing. Initially, I thought it was a prior year’s April Fool’s Day post. It isn’t!
I watched three of the five most LOL inducing videos, as determined by the humor-seeking LOLgorithm. I was pleasantly surprised. The LOLgorithm selected videos with themes having universal appeal: A fisherman arguing with a grizzly bear, Annoying Orange, and a charming (well, sort of) video about an Italian man’s language misunderstandings while vacationing in Malta.
Discovery is challenging
Google began by identifying the humorous videos, which is easier said than done. YouTube’s search engine is not the greatest. I have two theories about that.
First: YouTube was an acquisition. Yes, I realize that many Google services are. There was, still is, a Google Video media player, which offers a better user experience. YouTube just seems… unstable, kludgy. I think, but am not certain, that it crashes less often now with HTML5 than with Adobe SWF.
Second: The content bar is set low. That is, YouTube channel owners can enter any old thing they want as a title, complete with misspellings or contextual mismatches. My current favorite example of an appalling spelling error is a cover of AC DC’s Thunderstruck, performed by The Vitamin String Quartet. The title is listed as TUNDERSTRUK. Looks like the LOLgorithm is working, because that’s what I’m doing now.
Another amusing example of contextual/semantic mismatch is a remixed melody from Brittany. The channel owner is from eastern Europe and thought the song’s origin was Scottish. To make matters worse, he labelled it as dubstep but it was actually hardstyle trance. The comments are full of good-natured corrections, in various languages, and alphabets. I haven’t a clue how any algorithm, even the LOLgorithm, could parse that! Admittedly, it is an edge case.
Google started with the semantic meaning of the title, designated by the uploader, and the video description and tags if provided. Next, they used viewer reactions as indicated by comments to categorize the humor videos into sub-genre.
Viewers emphasize their reaction to funny videos in several ways: capitalization (LOL), elongation (loooooool), repetition (lolololol), exclamation (lolllll!!!!!), and combinations thereof.
A “loooooool” indicates greater viewer amusement than a “loool”. The final step was ranking the selected videos by relative funniness. Google described their approach as follows:
We then trained a passive-aggressive ranking algorithm using human-annotated pairwise ground truth and a combination of text and audiovisual features.
Raw view count is insufficient as a ranking metric, as it is biased by video age and possibly by prior viewer exposure on an external website.
The Google Research blog post is terse. The LOLgorithm seems accurate to me. There’s an alternative explanation, though. Maybe I enjoy similar videos as many other YouTube viewers, and we’re an easily amused and homogeneous lot? There’s plenty of pre-selection bias. In other words, most viewers of YouTube comedy videos have a not-too-subtle preference profile, myself included. For example, I’ve been an Annoying Orange channel subscriber on YouTube since 2010.
The video about the Italian tourist reminded me of a literary passage that is hilarious.
Have a look. Maybe it will elicit a LOL or two from you.