DOJ Will Get to Peek at Google Data

A U.S. federal judge says he will probably give the Department of Justice some of the search data it has asked for from Google, now that government attorneys have greatly reduced the scope of their request.

The cut-down data sweep– in which the government asked for only 50,000 Web addresses and 5,000 search terms, rather the million of each it was expected to ask for—allowed both sides to announce an apparent win in the case, particularly since the judge may only grant part of the request in his final decision.

As expected, DOJ lawyers argued before U.S. District Court Judge James Ware that Google should be made to comply with an August 2005 subpoena for search data. The DOJ wants to use that information to buttress its case for reinstating a 1998 law shielding children from adult content. Its intent is to show that pornography on the Web is so prevalent that it can’t be filtered with software and will require a legal solution.

Google had opposed that subpoena, arguing that compiling the data would be a financial and operational burden to the company, slowing its service to users, and that revealing the information would make public some trade secrets about the size and scope of its Web indexing functions and could jeopardize user privacy.

After 90 minutes spent hearing the DOJ and Google arguments, Judge Ware said, “It is my intent to grant some relief to the government.” He said he would issue a written decision in the case “very quickly.” That could apparently mean a full decision in a matter of days or weeks.

Yesterday’s testimony was the first public notice that the government had scaled back yet again its data request from Google. The original DOJ request was for all the URLs in Google’s index and all search terms entered for June and July 2005. That request was subsequently reduced to random samplings of 1 million Web pages from the Google index and 1 million search terms from a typical week. MSN, Yahoo! and AOL all agreed to similar DOJ requests for that level of data from their search operations, it was revealed in January.

The stripped-down request seems to remove some of the grounds for Google’s refusal to comply. In pre-hearing filings, the company said that giving the government what it wanted would require new computer code to pull the data and might require eight full days of engineering time. Google also pointed out that running a week’s worth of sampled search terms on Google.com would add that much load to its system, impeding users’ service.

In fact, though, Justice Department attorney Joel McElvain reportedly told the court that the government would only use 10,000 of the Web URLs it was asking for and 1,000 of the search keywords to make its case for restoration of the Child Online Protection Act (COPA), which the U.S. Supreme Court blocked in 2004. A federal court in Philadelphia has slated that case for a hearing in October.

Presumably, the lighter data request would not impose the same burdens on Google’s system. Nor would such a small sample involve exposing operational questions that could put what Google considers trade secrets about its platform into the public forum, for competitors to benefit from.

On the other hand, what purpose would such a relatively small sample accomplish? When asked by Judge Ware if the DOJ had not already received enough data from other search engines to support its porn-law case, McElvain admitted that it probably had but added that “the study would be improved with Google’s data.”

Google outside counsel Albert Gidari, from Perkins Coie LLP, argued that the data the DOJ wants is statistically irrelevant to its proposed porn study. He told the court that government attorneys would be better off looking for relevant information on a metasearch engine that compiles data from numerous separate search indexes, such as Dogpile or Webtracker, and avoid implicating Google.

User privacy is also an issue, Gidari told Ware, adding that release of the data could expose certain content about users’ finances, Social Security numbers and sexual preferences.

According to press reports, Judge Ware made a point of mentioning his own concern over the DOJ request for a random sampling of search requests. The Associated Press quoted him as saying he wanted to avoid creating the impression that government officials could mine the data kept by Internet search engines and other large databases for surveillance use. Like the information provided by its rivals, Google search data would be scrubbed of personally identifying markers.

The AP said Ware asked Gidari whether he would prefer to see the release of Web page URLs or search terms. Gidari replied that handing over search queries could have a “chilling effect” on Internet search usage. That could be an indication that the court is leaning toward granting only half of the DOJ request.

After the hearing, Gidari said he considered the government’s reduction of its data request a “significant victory”

“We’re very encouraged by the judge’s thoughtful questions and comments,” said Nicole Wong, associate general counsel for Google, in an e-mail statement issued after the hearing. “They reflected our concerns about user privacy and the scope of the government’s subpoena request.”

Andrew Klungness, an attorney with Bryan Cave LLP who specializes in Internet and e-commerce law, said Google’s compliance may have more significance than just the digital ammunition it adds to the DOJ’s anti-pornography case.

“With the Patriot Act and other laws, the government is trying hard to swing the pendulum toward having pretty broad rights to get the information it needs,” he said. “In many ways, this may be an attempt by the government to set a precedent in this regard with respect to Internet service providers.”

Klungness said that winning compliance from the other three search engines the DOJ approached for data probably weakened Google’s case that the request would be unduly burdensome and pointless.

“The other parties capitulated, and I think that, coupled with the reduced request, was designed to make the government’s request appear reasonable and not harmful to Google in any way,” he said. “In these types of cases, you’re not really looking at the black letter of the law but weighing considerations of reasonableness. And when you have three or four exactly similarly situated companies doing something that one lone holdout says is unreasonable, there’s no doubt that would influence a decision.”

Last Friday, the Justice Department requested that Google be held to a three-week timetable for supplying any search data it was prepared to supply, so that government lawyers can make a May 3 deadline for expert evidence imposed by the Philadelphia judge.

The American Civil Liberties Union is a participant in that case, hoping to prevent the reinstatement of COPA on constitutional grounds. Last month, the ACLU filed a brief with the San Jose court that said that if government attorneys were given access to Google search data, the ACLU would also sue for detailed information about Google’s search processes. For example, the filing said the ACLU would want to know “how many total URLs Google has in its database, how often Google updates its database, how and where Google crawls the Web to locate URLs for its database, how many different servers those URLs are stored on, where those servers are located, and how many URLs there are within each server.”

The ACLU also said it would want to know how Google distinguishes a search request from and human being from one made by automated software.

Google has taken some PR lumps recently for insufficiently standing up for free speech and privacy, from cooperating with the Chinese government to exclude Web sites from the Chinese Google, to a new Google Desktop “search across computers” option which stores a user’s indexed data on Google servers for up to 30 days. The Electronic Freedom Foundation warned that could expose users’ data to theft or surveillance.

But the federal subpoena of search data posed a special PR problem because most users were not even aware that their privacy could be an issue in search. According to a January poll by The Ponemon Institute, 77% of respondents did not know that Google recorded and stored any information about their searches.

And a survey by the Center for Survey Research at the University of Connecticut found in February that 65% of respondents opposed government monitoring of Americans’ search behavior; but 60% said they also opposed the search engines permanently storing records of their search behavior. Google currently uses a “lifetime” cookie that expires in 2038. That’s a lot more permanent than the cookie Yahoo! uses, which expires in June 2006.