What’s Next in Speech-to-Text

Scott Bakken / Oct 2015 / Technology, Workforce Optimization

Innovative technology is a game-changer for speech analytics solutions.

A new era has dawned. According to industry leaders I spoke with, speech analytics solutions may soon be referred to as BV and AV: Before Voci and After Voci. As in Voci Technologies, the Pittsburgh–based firm that focuses on high-speed solutions for large-scale speech analytics. Voci’s innovative technology has revolutionized speech-to-text transcription and customer data analysis. How so? Let me count the ways.

1. An Open Platform

Three words: No more silos. With Voci’s Automated Voice Transcription solution producing speech-to-text transcription in an open systems format, audio file data can now be merged with any other text data and subsequently analyzed by any application that analyzes text data.

Until now, a company’s customer data in text format—from website comments to social media feeds—would be analyzed by one application, while the call center’s audio files would be analyzed through a different analytics application. “It became difficult if not impossible, to merge all that customer data into one analytics application for a full 360-degree view of the customer,” says Voci’s Mary Pat Lambke, Director of Marketing & Business Development. “What we do augments, rather than replaces, the analytics that a client is already performing.”

This analytic innovation has enabled companies like Clarabridge, which offers an omnichannel customer intelligence platform, to include their customers’ audio files in the overall analysis of customer data. “Analytics are a great way to distill information, but getting the information in the right format in order to distill it has been a challenge,” notes Brendon O’Donovan, Senior Product Marketing Manager at Clarabridge. “The transcripts that Voci produces are valuable because it’s raw information, it’s raw emotion and it is literally the voice of the customer.”

Voci also integrates with IBM’s Streams, which is an open platform system; audio files are analyzed inside streaming analytic applications together with other data in motion. “One of the reasons we partnered with Voci is because their specialized tool for performing the speech detect function supports our approach to deliver an open analytics environment,” says IBM’s Stewart Hanna, North American Streams Client Success Leader for IBM’s Analytics Organization brand.

Voci’s Automated Voice Transcription signals a brighter future for analytics. “An open system enables an organization to pay a lot more attention to what’s going on in their business,” O’Donovan says. “It informs the enterprise in a very rich way—maybe one of the richest ways—about what customers are feeling and saying. Prior to Voci, the piece that was missing was the ability to pull out the issues that customers are calling about and distributing those to the right people throughout the organization.”

2. A New Standard of Accuracy

I’m a skeptic at heart. So I naturally raised an eyebrow or two when I read Voci’s claim that the accuracy of their speech-to-text transcription was outperforming the industry by 15% to 25%.

After uploading some audio files into V-Spark, Voci’s voice analytics tool supported by V-Blaze, their speech-to-text processing platform, I was a believer. Even though the audio files I fed in were of substandard quality, with music in the background and distinct accents, the output was amazingly accurate. For high-quality recordings, accuracy can reach even higher levels. I’ll go on the record and state that Voci is by far the most accurate tool we’ve seen.

Incidentally, V-Spark only analyzes audio; however, you can export the transcripts that V-Spark or V-Blaze produces to any third-party text analytics application. In the meantime, the operations manager of a call center can use V-Spark to see the topics that are trending for that particular day and take immediate action. Those transcriptions can then be stored for deeper analytics insights in the days and weeks to come.

When I asked Brian Timmons, a cofounder and managing partner of TopBox, why Voci become their vendor of choice, his experience mirrored my own. “It’s actually pretty simple,” said Timmons, whose company specializes in analyzing customer interactions to identify root causes of business issues. “We sent the same sample call files to a handful of transcription companies and Voci had the highest accuracy rate. The files we sent were difficult in that the compression rates were high and they were mono-channel recorded calls so you didn’t have the benefit of separating the agent and the customer. Voci’s accuracy was at the top of the scale.”

IBM’s Hanna echoed Timmons’ conclusion. “Voci produced one of the most accurate results we’ve ever come across,” Hanna says. “We were engaged with customers on analytics projects prior to working with Voci but low accuracy was a big roadblock to delivering the results that customers were looking for. When we identified Voci as a provider of a nice set of capabilities, which included accuracy, it started to bring reality to life.”

A relentless focus on results also informed Clarabridge’s decision. “One of our core tenets is delivering accurate information to our customers,” O’Donovan says. “We pride ourselves on having very accurate text analytics and sentiment results, so we needed a partner that was the best when it came to accuracy. That was our largest driving factor when it came to choosing Voci as our partner.”

3. Off-the-Charts Speed

Another big differentiator is the speed of Voci’s transcription. “Through the hardware acceleration we’ve done with our appliances, we can process up to 100 hours of audio in one hour,” Lambke says. Considering that many industry players transcribe audio at a 1:1 ratio (e.g., taking 40 seconds to transcribe a 40-second audio file), increasing that speed by a hundredfold is significant.

The speed of Voci’s speech-to-text transcription completely destroys the longstanding LVCSR vs. phonetics debate. Previously, when people were deciding between LVCSR (Library Vocabulary Continuous Speech Recognition) and phonetic-based speech recognition engines, the basic argument was that phonetics was much faster while LVCSR was 10 to 20 times more accurate. That argument is now obsolete. Simply put, Voci has enabled LVCSR to leapfrog phonetics in both accuracy and speed. Voci can also customize an LVCSR library to account for various dialects and for particular terminology used in that client’s environment or industry.

Clarabridge has leveraged that speed to deliver the wow factor to its customers. “Whenever you’re dealing with analysis and data sources, it’s all about how fast you can get the insights to the business owner,” O’Donovan says. “Voci is able to transcribe hours and hours of transcriptions in minutes. The ability to get that data transferred into our system and get it out to an end user in hours, not days, was a differentiator for us. Telling a customer, ‘We can process your entire call center’s records and have a report ready for you in the morning,’ is a very powerful statement to be able to make.”

Hanna affirms the impact on customers. “In basically three days of ingesting transcripts from Voci and applying natural language processing analytics delivered in IBM Streams, we were able to produce a series of reports that were originally scoped to take weeks if not months to complete,” he says.

In making the decision to go with Voci, TopBox prioritized the top line over the bottom line, which, of course, inevitably boosts the bottom line in the long run. “We could have built it ourselves cheaper or bought it cheaper from somebody else,” Timmons says. “But there was enough of a delta between Voci’s accuracy that we chose them. Also, since Voci is purely transcription and their analytics is modular, there are no competitive issues in working with them.”

4. Built-in Scalability

Voci’s ability to scale its speech-to-text transcription based on the needs of its clients also sets it apart. “We’re able to accelerate the processing speed of what’s normally in our appliance and then scale it based on the number of hours,” Lambke says. “We know that V-Blaze goes at 100X speed. If there are more hours needed but we need to maintain that 100X, we’ll put a second appliance in the client’s server rack. That 100X processing speed is actually the starting point. We have not seen a cap.”

The capacity to analyze a greater number of calls was important for the IBM and Voci partnership. “Second to accuracy was the ability to do this at scale,” Hanna says. “We don’t want to look at 5% of the calls. We need to look at 50% or 75% or maybe even 100% of calls to disrupt the marketplace.”

Lambke makes it clear that Voci will continue its tight focus going forward. “We have no desire to become a full-blown text analytics company like Clarabridge; that’s not our direction,” she says. “Our core is processing audio files and doing speech-to-text transcription in a highly accurate, speedy and scalable fashion.”

5. Real-Time Results

For some customers, real time is the only time that matters. “Deriving analytics after the fact to support a predictive application is essential for business,” IBM’s Hanna says. “But what we hear quite frequently is that organizations are operating with blind spots. These blind spots could be months, weeks, days or a 24-hour period. Customers are looking to eliminate blind spots so they know when customers contact them or raise certain issues of interest. Most organizations are correlating data with past behavior and what they know. Now they want a decision very quickly to effect change in the moment. Voci’s hardware acceleration clearly performs in real time.”

Voci’s Lambke is used to prospects who suspect her company’s claims are too good to be true. “I was working with a prospect who wanted to see what the transcription looked like, so they sent us some files, we processed them and we sent them back,” she recalls with a smile. “But they wanted to make sure we weren’t doing anything behind the scenes. So while we were on a conference call, they sent us more files. We processed them while they were still on the phone and fired them right back. They signed up pretty quickly.”

That’s about as real as it gets.

6. No Barriers to Transcription

Some legacy recording companies are notorious for not allowing their customers access to the customer’s own recorded audio files. As a workaround, Voci can install an adapter so that the audio of a customer-agent conversation reaches a Y connector; one path goes to the closed recording system and the other path goes to Voci’s V-Blaze transcription application. “The customer can still record the conversation with their closed recording system and save the resulting recordings, but at the same time we’re doing the speech to text transcription as the recording is happening,” Lambke says. “In other words, the adapter is connecting right to the voice system, which is in front of where the recording system is.”

Since Voci is tapping in to the voice system before it gets to the legacy recording system, the audio is uncompressed and in stereo; that dual-channel format separates the agent and customer for higher-quality transcription.

Even if your recording system vendor provided open access to your recorded audio files, those files would probably be mono-channel and compressed; both of those factors would diminish audio quality and hence the quality of the subsequent transcriptions.

7. Applicable to Multiple Markets

As more analytics companies leverage Voci’s speech-to-text transcription prowess for their own customers, I expect Voci’s market share to grow. I’m not the only one who sees a bright future for this technology. “It’s underutilized in marketing and I would say it’s also underutilized in compliance,” affirms Clarabridge’s O’Donovan. “Both are good areas for growth of this technology immediately because Marketing would love to hear exactly what the customers are saying on the phone in their own words, and Compliance would absolutely love to have transcriptions of all their calls to find out how well everything’s in compliance.”

Indeed, Voci has steadily been making inroads into the compliance and eDiscovery space. For instance, professional services firms can use Voci’s platform to help financial services companies become compliant with federal regulations such as the Dodd-Frank Act, as well as to satisfy internal fraud regulation through converting audio files to accurate, searchable text.

IBM’s Hanna also sees value in applying IBM and Voci’s technology to surveillance. “Everyone is very concerned about data leakage given the constant threat of cyber-security attacks,” he says. “So monitoring outbound calls is now a key requirement. And being able to do that across channels—whether it’s an outbound phone call, an instant message, an e-mail or other communication—allows IBM and Voci to identify issues and raise an appropriate alert if need be.”

Closing Thoughts

I had suspected that an open platform format for the output of speech applications would be beneficial for analyzing customer data. However, after hearing such enthusiastic endorsements from executives at Clarabridge, IBM and TopBox who are actually employing this technology, I’m now convinced it’s a game-changer.

About MainTrax

MainTrax is a leading provider of speech analytics managed services to end users and industry partners. Free of allegiance to any one solution or supplier, MainTrax has earned a reputation as an independent, unbiased resource for consulting expertise across a variety of products and providers.

Subscribers Download Article [PDF]

Scott Bakken

As CEO of MainTrax, Scott helps organizations harness business intelligence through speech and voice analytics technology to optimize contact center performance. MainTrax has worked with over 20 different speech technologies helping over 350 contact centers best utilize their speech analytics software. It has created and crafted Intelligent Redaction, allowing scrubbing and replacing of sensitive information such as PCI, PII, or PHI to help organizations continue to access their data in their preferred formats while still maintaining compliance.

Free of allegiance to any one speech technology, Scott is recognized as an independent voice in the speech analytics industry and was named an Ernst & Young Entrepreneur of the Year finalist. He’s facilitated numerous workshops and has been published in a number of industry trade journals.