Mining NATURAL SEARCH Data

May 01, 2008 9:30 PM  By

Medium and large-size e-commerce sites often have more aspects to their natural search data than strategies to effectively make use of it. Natural search data encompasses not only the volume of queries visitors used prior to finding your site, but also all the metrics associated with behavior as each visit progresses.

The correlation of this data is a valuable asset, which, although it appreciates in value with each visitor to the site, is often ignored or neglected.

It shouldn’t be. The raw data is fairly easy to obtain for e-commerce sites, essentially from at most three distinct sources: Weblogs, shopping-cart contents, and available products. Once parsed and normalized, the data should be readily applicable to incremental revenue strategies.

Let’s explore in more detail two of the many possible applications of natural search data analysis.

Choosing inventory

Some of the pages on your site will be presented for users because they partially match a searcher’s query. You should investigate what the usage of those non-matched terms may betray in terms of users’ motivations and your site’s offerings.

For example, imagine some non-trivial number of natural search users arriving at a page on your site using the term “king-size plaid flannel sheets.” Your site’s page was, of course, presented within natural search results because the search engine’s algorithms deemed it relevant for terms in the query.

Now imagine that your site’s page does not mention the term flannel. (In general, every term of a query need not appear on the page selected by search engines in response to that query, as many of us have no doubt experienced.)

Discovering a term such as flannel in your data is an example of a single unit of data-mined information, and it can be exploited in different ways.

First and most obvious is the question: Does your site even offer king-size plaid flannel sheets? If not, and your site is garnering queries from natural search wherein there is frequent use of such a stowaway term among your visitors, you can reasonably infer that unmet consumer demand exists.

This information ought to be relevant to your purchasing department — especially if the data demonstrates cyclical usage over time. For our example term, one can easily imagine the data revealing a seasonal cycle.

If your site does in fact offer one or more flannel sheets, the information may still be of interest to the purchasing department. It may be worth offering an expanded product mix of items made from the material.

If all this analysis were performed merely to yield an anecdote about the term flannel, the resource expenditure would hardly be worth the effort. The issue is more general, though.

In this case, you should view the term as a feature term. Each term should be considered relative to other feature terms that also appear in natural search queries, whether partially or completely matched by your site’s content.

To see how this might be revealing, consider that several years ago, the data of a major electronics retailer revealed that the feature term “wireless” as part of the query “wireless speakers” was garnering the most unique visitors from natural search for the period under review. Although the site offered a broad selection of speakers in general, the retailer’s site offered only one product with this high-demand feature.

The selection of individual products that comprise a site’s catalog involve many considerations, including margin, appeal to target audience, and logistics. But such observations provide an objective metric to the purchasing department that may be useful in their deliberations.

At the very least, feature terms whose relative demand volume outpaces other feature terms should be candidates for expanded product offerings.

Reducing site abandonment

The primary goal of your e-commerce site is no doubt to sell products. The behavior of visitors who do not purchase should be examined as carefully as the behavior of visitors who do.

Many e-commerce reporting packages will report high abandonment pages. These packages often aggregate the number of non-converting visitors by last page viewed. While this data may identify problem areas, it reveals only a portion of the full story.

The original query used by the abandoning searcher as well as the nature of the click stream prior to abandonment contains useful information. As always, the data must be in sufficient quantity to warrant making any inferences.

Abandonment from a natural search point of view is direct or indirect. Direct abandonment occurs when a visitor from natural search makes no additional clicks on your site after viewing a page whose link was presented in natural search results.

Indirect abandonment occurs when a visitor initiates at least one additional click from a page whose link was presented in natural search results, but subsequently abandons the site without completing a conversion action. Your attempts to reduce abandonment should include consideration of the differences reflected in such consumer behavior, as depicted by the data.

You can analyze your site’s natural search originating click-path data to classify each above-average-abandonment-rate page as direct, indirect, or both. Although it will be obvious from the data, you will exclude pages that are expected to be abandoned pages — for instance, check-out confirmation and contact details.

Pages are classified as direct when at least some threshold percentage of abandoning visitors do so after having viewed only that page, and no other. Pages are classified as indirect when at least some threshold percentage of abandoning visitors do so having previously viewed other pages on the site.

These classifications are not mutually exclusive; a page can be classified as both directly and indirectly abandoned. In these cases, the same page will simply undergo two sets of considerations for remediation. This classification scheme provides one way, among others, to begin assessing what actions you can take to reduce the abandonment rate.

Directly abandoned pages are probably the simplest to analyze, given that there is a non-complicated story to describe the data: Visitors see this page after having directly arrived from natural search, and then leave.

While a reasonable response to such a page would be to offer a promotion or other enticement to keep visitors from retreating, that technique should be viewed as a blind-response. The originating queries — with which the visitors found your site’s page — are the key to understanding abandonment trend on a page-by-page basis.

Directly abandoning visitors should be clustered by originating query to identify the source of the apparent mismatch between their expectations and what the page provided. Distinct explanations will emerge when examining these originating queries.

For e-commerce sites, prominent candidate explanations include insufficient product sets, irrelevant product sets, and non-facilitation, as explained below.

Visitors will directly abandon a site when the product set presented is smaller than their expectations — if they intended to browse. This may occur when a single product is presented when a group is expected. Your remedial efforts following this type of diagnosis might focus around new product sets via categories, for example.

Visitors will directly abandon a site when the product set is plainly irrelevant to their intention, as expressed by their originating query. This often occurs out of your direct control, given the syntactic match that search engines perform against queries.

Consider the confusion of intent arising from a match between “cotton candy machine” and “machine washable cotton sheets.” Out of your control is not out of your influence, however, and ongoing site content maintenance and enhancement can aim to redirect this relevance to newly created content (usually in the form of product sets) designed to satisfy user demand.

Finally, direct abandonment may occur due simply to non-facilitation. That is, a user who lands on an overly broad category page — say, one dedicated to home furnishings, when the intention of the query demonstrates a much more specific item such as standing floor lamp — may quickly assess that the destination simply does not facilitate satisfying his or her search. This data alone can be the basis for effective navigational site enhancements.

The principles involved in analyzing indirect, in contrast to direct, abandoning visitors include all of the above, with additional factors such as click path prior to abandonment. The analyses can be particularly rich and useful to large e-commerce sites, where incremental but pan-site improvement can yield a significant gross revenue figure.

But the elaboration of those issues — which at the least involve multivariate analysis — is beyond the scope of this article.

After you’ve assessed choosing inventory and reducing site abandonment, the combination of observations of your site’s particular issues may themselves indicate further action. For example, the details of high-abandonment data could identify candidates for items to be removed from the catalog.

Of course, transaction data alone reveals some of these candidates; but for large sites, the nuances of generalized low demand vs. query-originating rejection via abandonment are important considerations.

The two applications we’ve discussed do not begin to survey all the ramifications of data analysis to e-commerce sites in general. But the people who are charged with managing, and are judged by, a site’s revenue metrics should see the possibility of matching general analytics to the practicalities of their specific situation.


Thom Adams is the CEO of SearchDex (www.searchdex.com), a Dallas-based search marketing company.