Installing Speech Recognition Technology in the Warehouse

Why not do it? This kind of talk is a sign of the times. As recently as two or three years ago, fulfillment and distribution executives were still asking various forms of “why” (cost, benefits, effort, stability, ROI) about speech technology. Today, speech technology has come of age. On a strategic and practical basis, the favorable evidence is substantial.

The technology is highly effective, freeing hands and eyes for more accurate and productive work. It is usually quick to install and implement, especially in environments already using radio frequency (RF) systems. For many operations, speech has an attractive ROI, often measured in terms of months. And it also positively affects some of the more intangible human factors, such as morale and turnover.

However, a strategic level of the issue of implementation also needs to be addressed: What longer-term goals are well served or negatively affected by this decision? The questions to ask are: Where will I need to be in “n” years, and how will I get there? What are the implications of this technology in the service of that future, even though I can’t know all of it? What is the true cost of acquisition in the face of these questions?

While some of these questions can be answered only in part, there are strategies to optimize the decision-making and limit the potential penalties. The big pushes in operations today are toward flexibility, information visibility, and cost reduction/productivity improvement. The best decision on speech is to be sure that it will move the organization forward on the path to those goals, rather than send it off on a tangent.

In the best circumstances, speech should afford mobile warehouse workers flexible and cost-effective access to a rich array of information with which to make decisions and complete all the tasks within their skill range. Speech will facilitate accuracy and responsiveness, as well as a real-time flow of results information to managers, customers, workers, and perhaps even suppliers, based on a tight integration with the host application (a WMS or ERP suite). Rather than being treated as a separate entity, speech can enhance the productivity benefits of other functions such as standards or incentives.


There are two somewhat different approaches to talking to your computer, or more accurately, to talking to your WMS or ERP. The best-known approach is the “free-standing” application, which communicates with its host via one or more interfaces or by using a so-called middleware layer. The second, a rapidly growing option as WMS systems mature, is software with voice features that come with the host system, are fully integrated with it, and share a common database with the host.

We’ll talk first about fully integrated voice, as it is a simpler process. Both versions require a careful evaluation (and perhaps a reworking) of warehouse processes. Is this the best way to perform replenishments? How many different ways do we move received product off the dock, and do we want to support all of them? How will we fill orders? And so on.

Once the evaluation is complete, the agreed-upon requirements need to be compared with the existing application features. This leads to a series of decisions about modifying the processes or the software. This step also helps define what terminology and vocabulary are needed to perform the functions. The evaluation should also produce a definition of what kind of and how much equipment is required to use voice for the functions to be supported, and an order for that equipment. In some projects, this step can have the longest lead time, so act early in the process.

Communications in the facility should be the next step, or better yet, make it a concurrent step. If the facility is already using RF, that same network (802.11b) will support speech technology, although it may need some additional components for full coverage. It is prudent to have the technology vendor do a site survey to identify any additional coverage needed. This step is a requirement if there is no RF network in place.

The application vendor usually installs, configures, and tests the system, working closely with key facility or system personnel. This is a fairly short process, especially if one function is to be completed first, as a pilot.

Training operators, supervisors, and an administrator for the system follows. This typically involves one or two short sessions, supported by some documentation. Some follow-up reinforcement and training should also be available for the group on an as-needed basis. In total, the entire training phase constitutes only a few man-days in the project. By implementing voice functions in sequence rather than all at once, any follow-up training can be done without significant added cost.

The steps for fully integrated voice are similar to those needed for free-standing voice applications. In addition, however, free-standing voice applications require creation of a communications path between the voice system and the host system. The size of this step varies by host system and is accomplished in one of two ways: building a custom interface or utilizing middleware whose function in life is to enable independent software applications to “talk” to one another. In either case, it is important to allow time, programming, and testing resources, and, of course, additional dollars for this essential step.


First, look closely at costs. This means both the cost to acquire and the cost to own the technology over time. Proprietary hardware and software add to the cost, as do. components with narrowly defined features or limited flexibility. Ongoing support and maintenance, as well as upgrades, are significant factors in gaining the full benefit of any solution, but they, too, come at a price.

Second, select a vendor with experience. Speech has been around for some time, and there are vendors with state-of-the-art products and significant operational experience. Take full advantage of their value.

Third, do a pilot project (site and function) with champions. Using enthusiasts as the pioneers further improves the odds of hitting a winner the first time.

Finally, start simple. Add bells and whistles later. Varying the application’s features based on the individual user’s skill and experience, for example, is possible but not necessary.

When done right, having a flexible, cost-effective speech technology resource enhances your operational profile. In an age when a broader range of capabilities is seen as a strength, this can be a surprisingly important competitive advantage.

Talk to you (sooner rather than) later.

Ron Hounsell is director of logistics services, for Denver-based Cadre Technologies, an innovator of fulfillment systems used by logistics service providers, distributors, and manufacturers. He can be contacted at