Bigdaddy Lays Down the Law at Google

The news of Yahoo!’s impending changes to its pay-per-click (PPC) platform has tended to overshadow similarly sizable news at rival Google. The Bigdaddy update is fully operational, and Daddy’s kicking some inferior-link ass and booting some low-grade duplicate content. And Web operators need to take note of those facts and optimize their sites to avoid his size 13 EEE shoe.

To refresh your memory, Bigdaddy is the code name given to a new data infrastructure installed at Google earlier this year. The aim was to provide the search engine with a boost in computing power that would let its network speed up the job of sorting and indexing Web pages—and in the process, it is hoped, add some new ways to refine and improve that index, making it more useful and relevant to consumers.

That’s why Bigdaddy should be big news for search marketers: because now that the upgrade is completed and all the switches have been flipped, Google seems to be looking at Web pages in a new way, and Web operators are already seeing that some tactics that won them high organic rankings in the past now no longer help and may in fact hurt their natural placement in search results.

In particular, the new Google algorithm seems to be taking an interest in user behavior as an indication of the relevance and value of the pages in its index. How Google sees visitors interacting with your Web pages may in time grow to be one of the most important determinants of your site’s rank for appropriate keyword searches.

Google is well equipped to keep track of user behavior, for a number of reasons. The company is a Web domain registrar, with visibility into all domains and their history; it owns a set of Web analytics tools thanks to its purchase of Urchin Software in March of last year; and it keeps tabs on every document in its index, including when those documents are called up in a search and what users do while they’re there.

Google follows the actions of its searchers, either through browser cookies or through proprietary products such as the Google toolbar, news alerts and Google bookmarks. Google tracks these user actions in the aggregate, stripped of personally identifiable data (although as its challenge to a Justice Department subpoena revealed earlier this year, some recorded search data could reveal some personal information if properly parsed.) But the kind of information Google wants to discover really relates to the documents, not to the people visiting them.

“It’s basic demographic information—nothing to freak out about,” says Jim Hedger, a consultant with search engine optimization firm StepForth Placement. “Somebody from Massachusetts accessed this page, they spent x seconds on the indexed page and then moved across this link or that link, moving towards some conversion or from page to page.”

If users tend to hang around on that page, Google makes the assumption that they’ve found it valuable. If they click on links contained on the page, Google assumes those links seemed useful. In its own parlance, those pages and links have demonstrable “trust” among users. And the Google algorithm likes trust. A lot.

“Google bases its ranking algorithm on trust,” Hedger says, paraphrasing one of his own recent blog posts. “That might sound naïve, but we’re talking about one of the most informed entities human hands have ever created. Google trusts what it knows—and it knows a heck of a lot.”

That trust has a certain halo effect: Documents that link to a trusted Web page also see their trust rankings go up. Google has always prided itself on a democratic bent to the way it ranks Web pages for relevance and value, hedger says, and has inferred that lots of links to a page indicates that the page’s content is useful.

But that’s a more oblique tactic for evaluation than what the search engine seems to be aiming for now, post-Bigdaddy: watching the actual on-site behavior of Google users to determine from their movements what’s good or bad, worthwhile or worthless on a Web page. If users hang around for a good while and then use the links provided, Google infers the page is well-made and useful; if they move quickly to the links, Google assumes that the page content must be weak.

It’s a judgment that takes place over time—Google’s looking for trends, not blips—but since it has such a wide reach on the Web and such a deep historical record for each page in its index, the company can actually make these behavior-based trust determinations pretty rapidly, Hedger says.

The new emphasis on behavior-based judgments is having particular impact on Web page rankings in a couple of areas, the elimination of duplicate content and the progressive eradication of spam pages and “splog” pages that are nothing more than vehicles for PPC ads dressed up with some content scraped from other, more authoritative sites.

Web search users know duplicate content when they see it. Who hasn’t been irritated by conducting a search and finding that four of the results all point to the same content, even if one different sites? But search engines have not always made it a priority to go after duplication. But Google and the other majors have grown pretty good at rooting out duplication, acquiring the ability to look at images, links and text down to the paragraph level to find copycats and cloned content.

So if you’re an e-commerce site using product information pulled from the same product database used by a raft of other merchant sites, you may find that Google has downgraded your ranking or even dropped your page out of its index. By the same token, duplications of links that suggest link trading will also put the hurt on your trust ranking with Google and, over time, lower your standing in organic results.

Lowering a page’s Google ranking or, worse yet, dropping it from the index altogether is a sure way to elicit screams from Web operators. Judging from the static in the search optimization forums, the Bigdaddy update has produced more yowls than a cargo container of cats. Google software engineer Matt Cutts posted a timeline of Bigdaddy rollout events in his blog on May 16 (http://www.mattcutts.com/blog/indexing-timeline/) in which he stepped through some of the trauma cases that Webmasters had brought to his attention. One of these was an East European real estate site with 387 indexed pages but some questionably relevant links included a t bottom of the page.

“Linking to a free ringtone site, an SEO contest and an Omega 3 fish oil site?” Cutts wrote. “I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’re improving how we handle reciprocal link exchanges and link buying/selling.”

Another real estate site presented with a case of falloff from 10,000 pages indexed by Google to 80. Again, according to Cutts, the problem was link-related; this time the site looked to be trading links with mortgage sites, credit-card sites and sites selling exercise equipment. “If you were getting crawled more before and you’re trading a bunch of reciprocal links, don’t be surprised if the new [Web] crawler has different priorities and doesn’t crawl as much,” he said.

As with every update that Google and the other engines have launched in the past, there will be errors, and there will be Web sites wrongly downgraded because Google judges their content to be duplicative or their links suspect. Most of those mistakes correct themselves in time, but as Hedger points out, “For a lot of our clients two or three days off the index means serious money.”

For Web operators who may find that the bottom has dropped out of their Google rankings thanks to Bigdaddy, Hedger has a few possible fixes:

* Add fresh, useful original content to your pages. “Fresh content rocks,” he says. If that content is useful to the users—as evidenced by their behavior on the site– Google will value it accordingly. But don’t go looking for shortcuts by scraping content from other sites or setting up blogs that contain nothing but links to other sites.

* Go through the incoming and outbound links on your site with a fine tooth comb, cutting back the ones that may be dragging down the trust ranking of your site. “Anything that you might ever have thought questionable, question it, and then question it again,” Hedger says. “Question incoming and outgoing affiliate links big time. Any link going into or coming out of your site that has nothing to do with the topic of your site shouldn’t be there.”

The content duplication issue will pose a special problem for Web merchants that have a lot of affiliates, and there may be no easy way around it. With one master site and thousands of surrounding sites all showing the same information derived from it, Google’s simply going to want to show that master site. As Cutts wrote of one T-shirt “favorites” site full of affiliate links and complaining that its indexed pages dropped from 100 to five, “The question I’d be asking is why anyone would choose your ‘favorites’ site instead of going directly to the site that sells the T-shirts?”

“You often hear Webmasters for affiliates of department-store sites complaining that the big guys are blocking their opportunities with the search engines,” Hedger says. “But the search engines aren’t about helping anyone build their business; they’re in it to present quality sets of information to users, based on keywords entered. If that winds up driving the affiliate business down, it’s sort of like a new lower speed limit hurting the trucking industry.”