Listen Up

We all know the drill. An order comes in, and the picker has to wade through sheets of labels to know what to pick. It takes time, and he also has to report that it’s complete. But there’s another way. The worker gets the order through a headset. When he’s finished, he says “done.”

And the best part is that there’s nobody at the other end. The fact that the items have been picked is transmitted to a database.

What are we talking about? It’s speech recognition, a technology that is starting to catch on with direct merchants.

One early adapter was Awana Clubs International, a nonprofit group that markets educational materials to church groups. It started using technology from Lucas Systems in 2004.

Awana has a large shipping job on its hands. It sends about 200,000 packages a year from its 60,000-sq.-ft. distribution center in Schaumburg, IL.

Its accuracy rate hovered around 90%, but it sometimes fell to 65% when temporary employees helped out during the holiday season. The group felt it could do better.

And it has. Thanks to speech recognition, Awana’s productivity has jumped by 40%. And its accuracy has improved — there are only about 50 exceptions a year, says director of distribution Steve Hale.

How does it work?

In Awana’s case, most orders arrive via the Web and are then funneled into order entry and a database. The Lucas system converts the data to speech, using a programmed vocabulary of 60 words such as “quantity” and “location.” These form the instructions given to the pickers.

Awana is so pleased that it now wants to use the technology for its receiving, replenishment and returns. “To me, it is just about unlimited as to what you can do with it,” Hale says.

These systems were once used mostly by large packaged goods brands and supermarket chains. But multichannel merchants are turning to them — in part because they are now affordable by mid-size operations. And experts say they’re worth the investment.

For starters, firms that replace paper labels with speech often see double-digit jumps in productivity. “This is hands-free and eyes-free,” claims Tom Kerr, director of applied research with systems provider Vocollect.

And speech technology is more reliable than radio frequency scanning, adds Jeff Slevin, chief operating officer with Lucas Systems. Yes, the scanning devices are accurate, but they’re also handheld — if users put them down, they may start again in the wrong spot.

But don’t be confused by terminology. While often used interchangeably, “voice recognition” and “speech recognition” refer to two different systems, says Dan Ciarcia, principal consultant with CTG.

Voice recognition captures a voice and identifies the speaker — it works like an audio fingerprint. But speech recognition identifies the words spoken in order to complete a process.

Most systems used in warehouses are speaker-dependent. That is, the person records a list of words — usually 100 or less — and his speech is easily recognized.

In contrast, systems accessed by consumers tend to be speaker-independent. They’ve been built to work with a range of voices and accents, and aren’t as precise — a request for flights to Boston may prompt a schedule for Austin.

Let’s say you’ve decided to use a speech system in your facility. There are two types available for communicating with employees.

One features recorded or digitized voices. Someone records the words that will tell workers what task to do next, and the recording is stored as a digital audio file.

The other is text-to-speech. Software application interprets text provided by an order entry system and creates the spoken words.

Recorded voice instructions sound more natural, but most systems use text-to-speech because it is less expensive and more flexible. And workers can boost productivity by speeding up the recording.

The headsets are supported by mobile computing devices that pickers usually wear on a belt. They communicate via radio frequency with a host computer.

The application that converts text to voice instructions can reside within the mobile device or in the central computer. If the former, the information tends to be more quickly transmitted.

Most companies use one headset per worker. And the hardware? Companies can either connect to an existing computer or dedicate a server to the system.

ODW Logistics, a third-party logistics provider, purchased a specialized server for its full-pallet picking operation. Active since July 2006, it is now enabling a client to ship about 800 orders per day, according to Jon Petticrew, vice president of operations.

Yes, ODW could have used its existing server. But the new one provided scalability and flexibility, he says.

And Pettricrew wants to do more. The firm will next use the system to serve an online cosmetics retailer, and then an apparel client. “When we did programming and integration, we made sure that we weren’t specific to one client or vertical,” he says.


Let’s say you’ve invested in a speech recognition system. How do you get the most out of it?

First, make sure you “know your operation inside-out,” says Larry Landhiser, operations manager with ODW. The system can’t tell workers what to do if you don’t have a clue yourself.

For example, the picker should know whether to head toward the item that’s furthest from the starting point or the closest one. And if the order requests two units and only two are in stock, the worker should know whether to pick the number there — or nothing.

As with any big shift in operations, you’ll need employee buy-in. Awana’s Hale explained the benefits of speech recognition to his staff about a year before implementing a system to give them time to prepare for the change.

Managers also need to demonstrate their support by using the system, adds Slevin of Lucas Systems. And the more they understand it, the easier it is for them to troubleshoot.

Before deciding on a particular system, Petticrew tested how well the headsets worked within his facilities. In one building, propane equipment runs during most shifts. He checked that employees could hear the commands over the noise.

You should also test the capacity of your IT network, Petticrew says. He added several access points at ODW to ensure that data was reliably transmitted from the server to employees’ mobile computers. For the system to pay off, the transmission needed to happen quickly enough that workers’ productivity would increase.

Remember to identify each picking location with large signs, Petticrew says. Employees can quickly scan the signs to verify that they’re in the right place. He also changes the check digits at each location after several weeks. Otherwise, workers start memorizing them, and will often begin saying the words into the system before they even get to the location. That increases the potential for error.


How many speech recognition systems are in use? It’s hard to say. But experts say they’re growing in number — and that they’re being applied to more things.

Yes, picking is the most labor-intensive activity. And that’s where the greatest savings will result, says Scott Yetter, CEO of voice technology provider Voxware.

But speech systems can also help with functions like receiving and put-away.

Sam Flanders, president of Consulting Group, agrees that standardized products eventually will gain ground. “Voice technology is so good, there will be demand for it.”

Meanwhile, the technology is improving. Text-to-speech voices now sound more human, Ciarcia says. And the systems are being built to recognize new languages.

Better yet, the portable devices are getting smaller and more comfortable, Kerr notes. And the batteries can go for longer periods of time between charges. Most now last for at least eight hours, while some will stay active for up to 12.

The best news of all? Costs are coming down. Ciarcia estimates that they’ve dropped by up to 20% over the past 18 months. That said, the systems are far from cheap. They run at least $100,000 on the low end, Flanders says.

That includes $50,000 to $60,000 for software, about $20,000 for implementation support, and several thousand dollars apiece for the computing devices. To generate enough savings, most companies must have at least 10 users for their systems, Slevin says.

But speech recognition isn’t always the answer. RFID or barcode scanning may be quicker when the worker must repeat long strings of numbers or letters to identify a product. The next wave of devices will combine voice and scanning capabilities, Slevin predicts.

And pick-to-light may be more effective in DCs that fulfill many SKUs, Flanders says. That’s because workers can quickly scan the shelf and see which items to pick.

But speech recognition is worth exploring. Indeed, Hale wants to use it to cycle counting and inventory. “Anything you can do, you can do with voice,” he says.

Karen M. Kroll is a freelance business writer based in Chanhassen, MN.

Partner Content

Hincapie Sportswear Finds Omnichannel Success in the Cloud - Netsuite
For more and more companies, a cloud-based unified data solution is the way to make this happen. Custom cycling apparel maker Hincapie Sportswear has leveraged this capability to gain greater visibility into revenue streams, turning opportunities into sales more quickly while gaining overall operating efficiency. Download this ecommerce special report from Multichannel Merchant to more.
The Gift of Wow: Preparing your store for the holiday season - Netsuite
Being prepared for the holiday rush used to mean stocking shelves and making sure your associates were ready for the long hours. But the digital revolution has changed everything, most importantly, customer expectations. Retailers with a physical store presence should be asking themselves—what am I doing to wow the customer?
3 Critical Components to Achieving the Perfect Order - NetSuite
Explore the 3 critical components to delivering the perfect order.
Streamlining Unified Commerce Complexity - NetSuite
Explore how consolidating multiple systems through a cloud-based commerce platform provides a seamless experience for both you, and your customer.