It seems that one of my favorite themes to write about is how Google works, not on a mathematical level, but on a broader, more abstract, semantic level.
This kind of stuff is interesting because when you think about it, what search engines are trying to do with their algorithms is to use a rigid, formal, logical system like mathematics to do the job of the flexible, often informal, often illogical human brain. In search, this means processing verbal questions which are often ambiguous and difficult to correctly interpret.
That’s why Google has developed programs and entities to organize information. That way, the relationships between separate pieces of information are much easier to connect, relate, and sort on different layers and levels.
I think one of the coolest things they’ve done is the Knowledge Graph. I’ve gushed about it before, but it really is a remarkable undertaking. (So is the super search engine joint project schema.org.)
Without getting into it all over again, these tools are ways to standardize and organize information. Once this is done, those connections and relationships I mentioned above will occur naturally to a certain extent. Then, content being posted to the Web does the rest.
Through their algorithm, Google is constantly refining ways to determine associations between words. This doesn’t mean words that always appear right next to one another, but rather words (which textually and accessibly represent a noun [so a person, place, thing, or idea]) related to each other contextually over thousands, millions, billions, or more separate instances of co-occurrence.
That’s why I always thought Hummingbird was more-or-less a non-story. Not only had it been in effect for months before its official announcement several weeks ago, but an update which makes Google better able to parse out and interpret longer, more human questions is a natural result of what Google is trying to do in a broader sense. So, no surprises there. If anything, it makes most online copywriters’ jobs easier for now.
Hummingbird is only the tip of a very large information iceberg here, which is why I’m always happy to see these articles which focus on the real heart of the issue: the patents, the systems, the techniques, and the ideas that are being used in order to make a computer program think more like us.