Now listen to this

For the past three years Corporate Express, a $4.6 billion supplier of office products, has relied on voice-directed technology to drive its unit-picking applications in its 25 distribution centers in the United States. The company made the shift from a cart and conveyor process for several reasons, says Tim Beauchamp, senior vice president of distribution with the Broomfield, CO-based firm. For one thing, because workers are not juggling a scanner and sheets of paper with item descriptions as they pick, their hands and eyes are free. As a result, picking accuracy has improved from 99.7% to 99.9%. Given that the company processes 100,000-plus orders daily, that seemingly small increase is significant indeed.

Productivity also improved, enabling Beauchamp to reduce the number of picking employees across the country by about 100. In addition, Corporate Express has been able to install voice systems for a fraction of the cost of pick-to-light applications. Case in point: Corporate Express spent about $100,000 to launch a pick-to-voice system in its Seattle-area distribution center. Previously, the company had spent $6 million to build a conveyor system in a distribution center in New Jersey.

“This is cost-effective, high quality, and reliable,” Beauchamp says of voice-directed technology. Management is now rolling out the technology within its eight DCs in Canada; the company is also beginning to use the technology for full-case picking in both the U.S. and Canada.

Corporate Express isn’t the only merchant seeing significant benefits from voice recognition technology in the DC. With voice-directed systems, warehouse workers wear headsets and follow spoken instructions transmitted from an application, such as a warehouse management system (WMS), regarding what and how many items to pick. The solutions can be an alternative to pick-to-light and scanning systems, although voice systems can also work in conjunction with these.

“This is the most important technology change since barcoding,” says Ken Ackerman, principal of the Ackerman Co., a Columbus, OH-based consulting firm focused on the warehouse industry. “It has a clear, demonstrable payback.” Industry experts say that seeing a return on investment in less than a year isn’t uncommon.

In addition to helping boost productivity and accuracy in picking applications — often significantly — voice-directed warehouse applications can improve safety in the warehouse, some industry analysts say. Workers are able to monitor their environment as they walk or drive around the warehouse rather than focusing on the equipment they’re using.

To be sure, the public relations efforts of the voice recognition industry have not been as effective as those of other technologies, particularly RFID, Ackerman says. But he argues that while RFID systems show great promise, voice recognition systems have already proven their worth.

In fact, voice recognition systems have been used in automotive warehouse and distribution centers since the 1980s, says Sam Flanders, president of Consulting Group in Portsmouth, NH.

The grocery industry adopted the technology in the early 1990s, says Larry Sweeney, cofounder/vice president of product management with Vocollect, a Pittsburgh-based provider of voice solutions. The tight margins under which most grocers operate compel them to consider any applications that boost productivity. In addition, voice devices hold up well in freezers and other environments in which scanners may be more likely to fail.

Now specialty retailers are looking at voice solutions, Sweeney says. “It costs a lot to build and expand warehouses, and retailers would rather build stores and increase their revenue,” he notes. As a result, they’re looking at tools to boost warehouse productivity, rather than have to build more of them.

ODW Logistics, a third-party logistics (3PL) provider, began implementing voice recognition technology at its headquarters campus in Columbus, OH, this past July for one of its customers, says Jon Petticrew, vice president of operations. ODW is using the technology to handle order processing as well as picking.

Like the team at Corporate Express, management at ODW appreciated that voice-driven systems are scalable and easy to implement and have been shown to boost accuracy and productivity. But Petticrew admits that there’s also the wow factor. “People say, ‘Look at this technology!’” That can give ODW an edge when potential clients are deciding which 3PL to hire.

With voice recognition, the number of orders completed per hour jumped 12.5%, and accuracy increased from 99.0% to 99.8%. Petticrew is confident that he will see a return on investment in less than one year.


With voice-driven technology in the warehouse, workers wear a headset and a mobile computer attached to their belt. Software converts the picking instructions contained within the WMS or other solution into voice commands.

In some solutions, the software operates within a dedicated text-to-speech server. The computers that the employees wear are wirelessly connected to the server, so that they can receive the voice commands from their computers via the headsets.

In other systems, the software actually works within the portable computers the workers carry, eliminating the need for a separate server. The commands are transmitted from the employees’ computers to the headsets. These computers are regularly plugged into the host system so that information is transmitted from one to the other, keeping both the portable computers and the host system updated.

Another option is terminal emulation, says Scott Medford, a partner with Rockwell, TX-based voice solutions provider Genesta. The portable computer is just a display monitor with no processing functionality — what’s known as a dumb terminal. It simply passes information back and forth, in real time, to the main computer, which runs the picking application.

With any of these solutions, workers receive verbal instructions, such as “Go to zone B, shelf 5, bin 6, and pick 10 red socks,” through the headset. Once the worker arrives at the location, he often finds what’s known as a check digit, or number. The worker reads the check digit out loud to let the system know that he is in the right place.

When a worker has completed a task, he again speaks into the system, saying something like “Completed. Next.” This information is transmitted, via the portable computers and software, back to the host system, so that the WMS is kept accurate. This also cues the voice system to give the worker his next task.

Voice systems typically work with a limited vocabulary. To keep the system simple and running quickly, about 120 words appears to be the limit. “Yes,” “no,” “repeat,” and “done” are among the most common terms.

Although firm statistics are not available, most industry experts say that the majority of warehouse applications are speaker-dependent, rather than speaker-independent. “The only current voice recognition technology that can support this ubiquitous applicability and reliability in noisy environments and for workers speaking in many different languages and dialects is speaker-dependent,” says Jef Morrow, vice president of marketing with Voxware, a supplier of voice-driven logistics solutions based in Lawrenceville, NJ.

A speaker-dependent system is trained to understand a specific speaker’s voice, inflection, and accent. When that speaker logs on to the system, his speech profile is downloaded to the portable computer so that the system successfully recognizes what he says. When the population of potential users is limited, as it is in a warehouse, speaker-dependent systems typically work better. Accuracy increases, Vocollect’s Sweeney says, because the systems recognize the idiosyncracies of each speaker’s voice.

In contrast, speaker-independent systems are designed to understand just about anyone’s voice. They often are used in applications designed for the general public, such as airline reservation systems.


Though voice recognition systems have a lot to offer, they’re not the solution for every operation. The biggest question is whether an operation is large enough and has enough activity to justify the investment. Most operations need at least 10-15 pickers to justify the investment in hardware and software, Flanders says.

Voice systems tend to provide the largest payback when used in facilities that have a high SKU count and products that vary in size and weight and in which the cost of inaccurate orders is large, says Elif Kizilkaya, vice president of delivery and support services with Voxware.

The solutions typically are an alternative to such technologies as RFID scanning and pick-to-light. Scanning systems can cost less. Flanders estimates that an inexpensive voice system will require an investment of about $100,000, compared with about $25,000 for a scanning system. But he predicts that, as is the case with most other technologies, costs will come down as the number of voice installations grows. “I think we’ll see that entry costs will become more competitive,” he says.

In fact, costs already have come down, says Genesta’s Medford. He estimates that the price tag to outfit one worker has dropped to several thousand dollars from about $10,000 just a few years ago.

Another advantage that scanners have over voice recognition systems is speed: A scanner can capture information more quickly than a voice system can read it. As a result, a scanning solution can make sense in situations where it is necessary to capture long strings of information, such as serial numbers. “It’s a good technology to capture, verify, and audit data,” says Sweeney. “Voice is good at directing work.”

Voice-directed systems also free up pickers’ hands, an advantage the technology has over scanners and RFID. That was a major consideration when Tractor Supply Co., a chain of more than 630 stores, decided to replace its paper-based picking system three years ago, says Larry Corrigan, director of distribution for the Brentwood, TN-based firm. Tractor Supply had also been considering an RFID system, but it decided in favor of voice recognition to make it easier for employees to pick and carry items, which at Tractor Supply range from small pet toys to lawn equipment.

Picking productivity has since increased 10%, Corrigan says. The company continues to use scanners when receiving and loading orders.

Another alternative to voice-directed picking is a pick-to-light system, in which a worker follows light signals to determine what and how many items to pick. Pick-to-light, like pick-to-voice, tends to generate high accuracy rates. But a pick-to-light system is usually more expensive than a voice recognition system, as it requires the installation of fixed conveyor systems. In addition, conveyor systems can limit the flexibility of a distribution center.


Once you have decided that a voice recognition system is right for your company, you have to determine which type of technology you’ll use to convert written text into speech commands.

One option is text-to-speech, in which written words are converted to computer-generated, synthesized voice commands. The alternative, digitized speech, occurs when a person records a digital file of words and phrases.

Digitized speech sounds more natural than text-to-speech, so users tend to prefer it, says Jason Wilburn, manager of marketing and business development with Lucas Systems, a Pittsburgh-based provider of voice solutions. On the other hand, the vocabulary is limited to those words that have been prerecorded.

Many companies combine the two options. They’ll use digitized speech for the small vocabulary of commonly used commands and text-to-speech to convert product descriptions; when the volume of SKUs gets into the thousands, having someone repeat each description becomes impractical.


As you would with any other technology investment, you need to thoroughly research the vendors of voice-directed systems before you decide on one. The vendor should have experience and be committed to the market. In addition to checking out other installations, some potential buyers ask vendors how much they are dedicating to research and development.

Before partnering with Vocollect, Tractor Supply worked with a voice vendor that quickly exited the market. “Make sure you’re not the test case,” Corrigan warns.

ODW’s Petticrew recommends looking for nonproprietary solutions so that you can use off-the-shelf headsets and portable terminals. Some systems require the purchase of proprietary headsets and terminals, which can increase costs and reduce flexibility.

In addition, Petticrew says, it makes sense to test any equipment from the warehouse floor, rather than from the quiet confines of a corporate boardroom.

Training employees on a voice recognition system typically takes no more than 20 minutes. Workers who have used another picking technology usually are back to full speed with a voice system after just a few days and then continue to improve. “The ramp-up time is extremely fast,” says Corrigan.

That’s not to suggest that implementing a system requires no more than an afternoon. Morrow of Voxware suggests allowing six to eight weeks. That provides time to define the system requirements; assemble the project team; configure, install, and test the system; and train users, among other tasks. Most companies also need to allow time to integrate the voice system with their other warehouse applications, such as the WMS.

There are a few additional tactics that can make a voice system function even more effectively. Because the worker isn’t able to see the entire order at once, as he could with a scanner system, it’s critical that the instructions be organized so that items that need to go on the bottom of the box or pallet are picked first, Petticrew says.

He also recommends regularly modifying the check digit, which the worker uses to verify that the place from which he is about to pick is the right one. ODW does this about every 60 days. That way, Petticrew says, workers won’t start to memorize the numbers and say them before they even get to the picking location.

Although they may not capture headlines like other technology solutions, voice-recognition systems are proven and highly effective, say those who’ve used them. “I’ve never seen something so bulletproof and reliable as this system,” Corporate Express’s Beauchamp says.

Karen M. Kroll, a freelance writer based in Minnetonka, MN, writes for American Way and Business Finance magazines, among other publications.

Voice recognition systems

Like snowflakes and fingerprints, no two voice recognition systems are alike. Here are but a few of the scores of systems on the market.

Cheshire, England

Rockwall, TX

Hatboro, PA

Dallas, TX

Sewickley, PA

Holtsville, NY

Pittsburgh, PA

Lawrenceville, NJ

Partner Content

Hincapie Sportswear Finds Omnichannel Success in the Cloud - Netsuite
For more and more companies, a cloud-based unified data solution is the way to make this happen. Custom cycling apparel maker Hincapie Sportswear has leveraged this capability to gain greater visibility into revenue streams, turning opportunities into sales more quickly while gaining overall operating efficiency. Download this ecommerce special report from Multichannel Merchant to more.
The Gift of Wow: Preparing your store for the holiday season - Netsuite
Being prepared for the holiday rush used to mean stocking shelves and making sure your associates were ready for the long hours. But the digital revolution has changed everything, most importantly, customer expectations. Retailers with a physical store presence should be asking themselves—what am I doing to wow the customer?
3 Critical Components to Achieving the Perfect Order - NetSuite
Explore the 3 critical components to delivering the perfect order.
Streamlining Unified Commerce Complexity - NetSuite
Explore how consolidating multiple systems through a cloud-based commerce platform provides a seamless experience for both you, and your customer.