Skip to content

Predictive Search Outliers (or “One of these things is not like the Other”)

Speaking of phrases you shouldn't Google with "Safe Search" turned off

I got my wrist slapped today for not cross posting here more often. The slapper (correctly) pointed out that I e-mail friends and family lots of asides, and post in a number of comment threads on things that interest me, and have started dabbling in tweeting – all of which would make fine content for this here page, which (as you all well know) I tend to ignore when I’m in hardcore “project on the go” mode, and don’t feel like writing anything substantive (or as “substantive” as we get around these parts).

This is all absolutely true. So let’s see if I can’t get better about that, by starting with a quick re-post of a comment I made to Denis McGrath’s great Dead Things On Sticks. Denis wrote a post wondering why he was getting a particularly grim predictive search result about killing babies in Google, and since I’ve been dealing with something similar (albeit on the search side) relating to the High Life website, I thought I’d lay down a quick note on why I believe bizarre, shocking, outliers can get promoted on Google (particularly in lists of predictive results).

I should have pointed out on Dead Things that this is simply a working theory – anyone who tells you they know exactly how Google weights results is lying (unless they’re Google engineers), and there’s a whole creepy sub-industry of geek Shaman telling you they know how to scatter the entrails, and read the bones to influence the mighty Google PageRank in the sky. Many of them are snake-oil salesmen of the worst kind.

What I can say, is that this is the best working theory I’ve got and fits not only this available data from this case, but many similar ones I’ve seen as well. So let’s file this all under “Scientific process in progress” shall we?

One of the problems with Google’s predictive search is that (like page results itself) it tends to equate what people actually click on with quality of result.

For example – if thousands of people searching for “screenwriting tips” click on your site from the resulting list, your site will start to come up higher in the rankings, because clearly it contains what people are looking for when they search for “screenwriting tips”. By comparison, perhaps not a lot of people click on your site when they’re searching for “dead things”, so your results would be depreciated on that list. This works pretty well for something like search where the act of searching requires you to input a complete phrase before getting results.

However when you apply this to predictive results, what happens is that occasional outlier results (which you get in any search database), are often so disconcerting that a large percentage of people searching for something else entirely (“how do I kill the mice in my attic”) stop their search to click on early, outlier results. This isn’t because it’s what they’re looking for, it’s entertainment, shock value, voyeurism, or just to figure out why the heck so many people are searching for “x” (did I miss a news story?)

This creates a classic positive feedback loop – Google considers that item “x” is a valuable result for search term “y”, promotes the outlier higher in the list, where it attracts more attention, gets more clicks, and moves higher until it sticks at (or close to) the top of the results list.

You’ll notice, often in these cases if you search for “thematic synonyms” (say “how do I”, “how can one”, “how to”) there’s usually no sign of the oddity results – while others trends are clearly visible (for example all three of the above variants generally have results about losing weight in the top couple… which one would expect).

And in related (heh) news – Denis also had a post I completely agree on about the Google Superbowl Ad