Voice over IP

Speech Recognition with Enterprise PBX and its applications

This article discusses what is speech recognition, what you should know about the types of speech recognition, what are the applications of speech recognition in enterprise level, what are the components of a speech recognition solution, what are the advantages and limitations of using speech recognition for enterprise/commercial applications.

What is speech recognition?

Speech recognition is a wide topic. There are many ways that speech recognition is used today – right from navigating the computer (for blind people/ people who cannot use the keyboard etc.) by issuing speech commands to the computer, dictation software’s which let you dictate to emails, documents etc. instead of typing on to them, etc. So, speech recognition is basically a technology that enables you to speak to a computer system, and let the system take a further action based on what you said. This naturally means that computer systems need to employ speech recognition software to convert the audio signals to mathematical output in order to analyse them using statistical methods and find out the closest matching word for that audio signal.

Applications of Speech Recognition in Enterprise/Commercial establishments:

Integrating speech recognition with an IP PBX enables you to automate certain key IVR processes (click on this link to know what is IVR) that are beyond DTMF (Pressing buttons) applications or makes those processes easier than DTMF applications – Like when a user dials in the board number of your company, you might want to prompt him to tell the name of the person with whom he wants to talk to, and upon a reply by him, the IP PBX can directly connect him with the right extension. In a call centre for a bank for example, you could use the speech recognition technology with your IVR to make changes in customer database records like email addresses, location, phone number, PIN number etc. by letting the speaker speak – than waiting for the operator for simple things that cannot be done by pressing buttons (DTMF). Speech recognition could also be used to automate commercial billing applications like pizza ordering process or give basic level help-desk assistance during nights etc.

What you need to know about the types of Speech Recognition:

A Speech Engine is at the heart of the Speech Recognition Software. There is an IP PBX and also a speech application (Like IVR, for example) which work along with the speech engine. The speech engine understands the words spoken by a user and passes it on to the application and the application decides what to do next.

Speech recognition is different from voice recognition. In voice recognition, the voice of the user is determined by using some statistical methods but it is used as a biometric verification device. Like, if the voice of the user matches an already stored record, the system could give access to certain applications.

There are basically two types of Speech Recognition: Speaker dependent and Speaker Independent.

Speaker dependent Speech Recognition: This type of Speech recognition is used in Dictation Software’s etc. where a user trains the Speech recognition Engine by reading some pre-loaded texts for certain number of hours. So, based on the particular speech patterns of that user, the Speech recognition engines learn to recognize how certain words are pronounced by the user and converts the speech in to text/words. This is useful in situations where a single user is going to use the application that consists of a wider range of words (grammar) and user has time to train the engine.

Speaker Independent Speech Recognition: What if there are some applications like IVR, where a large number of people keep calling and have no time for training the Speech Engine? You could use Speaker Independent Speech recognition technology there which limits the grammar (Vocabulary) that can be used by users based on the application. For example, if a user is calling the reception which is connected to an IP PBX with Speech recognition system, the speaker could just say the name of the person that he wants to talk to, and the IVR can automatically connect him to the right extension when he says the name. In this situation, the  limited vocabulary of names in the company database could be stored in the speech recognition software’s grammar, so that it can easily identify which name is being said by the caller. This is a compromise, and it limits the number of words that can be recognized, but it is essential in situations like these, to ensure accuracy. Especially when there is no time to train the system.

There are two types of responses that can be expected in a Speech recognition system: Natural Language and Directed Response.

Natural Language: In this type of response, the IVR asks the speaker a question like ‘What would you like to do’ and the speaker might respond like ‘I want to change my email address’. Well, this is great for business, but all the probable answers from the users might have to be pre-analysed and fed into the system. Which is quite a complex task.

Directed Response: In this type of response, the IVR might ask the speaker a question and also give three options like: You can change your email address, change your cell phone number, change your PIN number. Please say what you want to do. This can be more accurate for business systems as the vocabulary to be learnt by the speech recognition software is considerably lesser.

Components of a Speech Recognition Solution:

You require a Speech recognition Engine (vocabulary with Pre-defined grammar or custom built grammar), Licenses for Speech recognition (based on the number of simultaneous channels – concurrent calls) that are going to be used for speech recognition, IP PBX and IVR system, Speech recognition application and integration between all the above systems.

Advantages and Dis-advantages:

While this is an interesting and an up-coming technology, it is suggested not to shift the existing press button based DTMF applications in an Interactive Voice Response (IVR) application with speech recognition fully. A section of IVR can be enabled with speech recognition to extend its capability. Advantages are automation of key processes using a natural speech application, which might result in faster responses and ease of usability in certain cases. The main dis-advantage is that, this technology is not error proof (like the older DTMF press button) and error management needs to be done without irritating the caller (like asking him to repeat or spell certain phrases). But it would be wonderful to have  speech recognition as an option  in IVR’s even right now in situations like using the touch screen cell phone to respond to IVR systems in daylight (where the buttons are not easily visible), for example. Well the technology has matured now, and it is up to the user to determine if it could be useful in their business processes.


You can stay up to date on the various computer networking technologies by subscribing to this blog with your email address in the sidebar box mentioned as “Get email updates when new articles are published”

One Comment