Voice Gateway

Technical description

VoiceGateway (VG) enables the extension of the functionality of text chatbot systems with voice calls made in natural language, which makes decisions based on the words spoken by the interlocutor without the need to use the telephone keyboard to select specific options - the same as a normal telephone conversation.

The functionality described above is available thanks to the cooperation of a number of services such as: TTS (text-to-speech) - text-to-speech conversion, ASR (automatic-speech-recognition) - speech-to-text translation and the NLU engine interpreting the data provided by the interlocutor and making appropriate replies as defined in the FLOW module.

VG integrates the above-mentioned functionalities, extending them with the possibility of cooperation with telephone service providers, integration with existing telephone exchanges and call center systems.

Description of tasks and functions

The basic functionality of VG is to enable automatic voice calls to be made according to a given conversation pattern in order to achieve a specific goal (obtaining information) from the interlocutor or to provide him with specific information he is looking for. Conducting such conversations is possible with high intensity, the rate of simultaneous conversations within one VG server instance can reach up to several hundred simultaneous connections - depending on the performance of the hardware environment and the complexity of the designed dialogues. VG allows you to view the transcription of recognized statements and logs the entire conversation along with downloading and recording the status of the phone call. It enables making incoming calls according to request and also has the functionality of making outgoing calls (dialer) with an extensive possibility of configuring the call algorithm. Conversations can be recorded digitally as audio files

Figure 1. General block diagram of modern call-center systems

Figure 1. General block diagram of modern call-center systems

Figure 1 shows the architecture of telephone systems and call centers most often found in modern implementations. The main element of such a telephone system is always a telephone exchange (PBX), which has the ability to make and receive telephone calls thanks to integration with a telecommunications service provider. Such a central office can provide many different services and functionalities, generally provided in a modular form. These may be:

  • Call recording module
  • IVR module
  • Call center system

VoiceGateway can be treated as an additional module of the call-center system, enabling the replacement of a 'live' agent with an automatic bot carrying out dialogues according to a given scenario. The VoiceGateway system is designed to replace physical agents in most cases, and from the perspective of the entire system, there was no particular need to additionally adapt the call-center system to the use of voice bots.


The VoiceGateway system was designed and built in a microservices architecture (microservices). Each service is a separate software product prepared in Java, communicating with each other using Akka technology, launched using Docker containerization. In the current configuration, the launch of the entire multi-container system is based on the docker-compose tool.

Logical description of the system

Figure 2. Logical diagram of the Voice Gateway system

Figure 2. Logical diagram of the Voice Gateway system

List of modules (containers) included in the system:

  • Charon - SIP client with a number of dependencies - enabling connection in SIP technology as a client of the parent exchange (usually another extension number of the exchange with which the Voice Gateway is to cooperate). Integration with the headquarters is possible by establishing a permanent sip-trunk connection.
  • Croccota - module supporting the TTS system, with dependencies and technological integrations with a number of TTS (text-to-speech) service providers. This website works with the TTS service and allows text to be converted into voice during automatic dialogue.
  • Pytia - service supporting the Dictate ASR system, together with the required dependencies, enabling Voice Gateway to cooperate with speech recognition and transcription services in order to determine the interlocutor's intentions and take appropriate action according to a given scenario. Integration with the most important ASR service providers currently known on the market is available.
  • Gall - service for recording conversations
  • Dialog - a service intermediating in two-way data exchange with Automate

If it is not possible to integrate VoiceGateway with the existing telephone infrastructure, or if it is necessary to carry out outband campaigns independently through VoiceGateway, without the involvement of the customer's telephone exchanges, the list of available modules is extended by a proprietary PBX that has a number of integrations with individual VoiceGateway modules, which can be delivered and launched if required.

Hardware requirements and installation description

Due to the way the application is delivered in the form of a ready-to-run set of docker containers, the launch of the Voice Gateway system should be independent of the operating system - it only needs to support docker technology and have docker-compose installed. However, we do not have serious, production implementation experience using an operating system other than Linux. The system is independent of the distribution, most installations were carried out using the Debian distribution (version min. 8).

To operate the Voice Gateway system, a database instance is required, the PostgreeSQL database is preferred - the system was tested to work with the above-mentioned database in version 12 or higher. Start the system installation by copying the provided system files or downloading the latest version of individual containers from the given repository.

The working directory of the Voice Gateway system will contain directories with the configuration files of each of the component services, the global docker-compose.yml configuration file and the .env configuration file.
Each of the system's component services has an application.conf configuration file that specifies all necessary parameters for the operation of a given component service.

The hardware (installation) requirements of the VoiceGateway system are very dependent on the requirements of the environment in which it will operate. The parameters that should be taken into account when selecting the hardware of the environment are primarily the number of simultaneously supported voice sessions, as well as the nature of the dialogue and the frequency of the possible need to communicate with the database in order to properly conduct the dialogue - the key here is the access time to data. Therefore, it is necessary to ensure the best possible access parameters to the database and monitor the operation of the system, especially during times of increased traffic.

An example of hardware requirements in a virtual environment may be a small unit with 4 vCPU, 16-32 GB RAM and 60-100 GB HDD - in such an environment a system supporting several simultaneous connections (approx. 2 - 5) will work properly.

More advanced configurations will require correspondingly greater resources, which will be designed during installation works. The scaling of the environment can be changed over time based on current usage analysis, therefore an important value for the system is the use of a monitoring system.


Looking for more specific hardware recommendations?

Check out example configurations for different workloads in Platform scaling section