Google knows a lot about us, including how insecure we are about our programming knowledge. A new update to Google’s search recognizes strings of special characters, so those code searches we do now return the right values.
Special characters such as ‘+’ or ‘=‘ have worked in search-based mathematical operations for years, but ‘++’ confused Google. In addition to searches related to programming, Google says it will help cleverly-named products:
If you’re searching for the meaning of [c++17], you will get results for the well-known programming language instead of c17, which brings up a Boeing airplane. Additionally, organization and product names that include punctuation, such as She++ and Notepad++, will return more accurate results.
Great for oddly named apps, but what about code? I searched for simple boolean operators across several search engines, and got mixed results. Google may understand what ‘||’ means, but others aren’t quite as smart.
DuckDuckGo’s first result tells me the ‘or’ operator is actually an emoticon that means ‘bowl.’ It didn’t account for the parenthetical of that emoticon, and doesn’t seem to care that my code logic is making me cry and I need help.
Bing and Yahoo? They barely know anything, much less what operators are. DuckDuckGo eventually got there (it tells me ‘||’ is logical operator in the description of its third result), but Bing and Yahoo just gave up. When I altered my search to ‘swift ||,’ Google and DuckDuckGo got more contextual, but Google was the clear leader. The other two thought I was looking for financial messaging or a transportation provider, with Bing offering Swift up in the sidebar.
If you’re wondering what brought this on (or why it wasn’t already a feature for search), an ex-Googler commenting on a Hacker News thread sheds a bit of light on the matter, hinting that cost may have played a role:
Search engines rely on a data structure known as an inverted index; it’s basically a list, for each token, of every document that contains the token, and for a context-aware search engine like Google it usually contains the position within the document of the token as well. Single-character punctuation marks like periods, commas, parentheses, dashes etc. appear in literally every sentence. That means that the inverted index for periods or commas would have to contain an entry for literally every single sentence on the web.
There’s a similar problem for common words like ‘a’, ‘the’, prepositions, etc, but these are usually already solved by stop-wording.
That’s why this announcement only covers groups of punctuation with 2-3 characters. These don’t appear in ordinary text, and so you can generate posting lists for them that are reasonably-sized. (I suspect that the economics of the index have changed as well, making storage costs cheaper, but this work happened after I left and so I don’t know details.)
Hearing that we could have had this feature years ago is frustrating, but it seems Google just didn’t have the foresight to invest in this niche search category. Similarly, it may have seen that developers increasingly find DuckDuckGo’s customizations (like digging straight into sites like Stack Overflow using bangs) enticing and wants them back.