Pytia

Speech to text (ASR) integration

Client of ASR speech recognition services. It has a set of appropriate connectors for specific external services to which the audio stream should be transmitted and text recognition should be received in return.

Configuration

voice.gateway {

  pytia {
    technology = TECHMO
    api {
      base-url = "0.0.0.0:8081"
      api-key: "xxx"
    }

		service-tags: ["techmo-pl"]
    
    grpc-endpoint-port = 6443
    
    service-availability {
      ping-interval = 10d
      failure-limit = 2
      failure-time-window = 5m
      enabled = true
    }
  }
  techmo.engines = [
    {
      host = asr3-istio.example.com
      use-plain-text: false
      port = 8443
      name = "techmo-1"
    }
  ]
}

Field description

  • technology - name of a specific supplier - selection of the right connector for the service. Can be one of the following: GOOGLE, TECHMO, PHONEXIA. When choosing technology make sure that the technology subconfig is also present.
  • service-tags - Configures tags that are propagated in akka service discovery, may be used to choose a certain ASR service for communication for a certain dialog (e.g. for tests or to choose a specific engine). This list will be always supplemented by technology tag i.e.
pytia {  
  technology = "GOOGLE"  
	service-tags = ["some-tag"]  
}

Will result in pytia introducing itself in cluster with tags: ["some-tag", "GOOGLE"]

  • api - REST API parameters of the website
  • grpc-enpoint-port - direct access port to the speech recognition service (used in the Automate chat tester)
  • service-availability - configuration of checking the availability of the ASR service
  • techmo.engines - configuration of access to the ASR service with a name for identification and communication method (use-plain-text option)

Phonexia - example config file

voice.gateway {

  pytia {
    technology = "PHONEXIA"
    service-tags: ["Phonexia-pl"]

    api.base-url = "http://0.0.0.0:8080"
    service-availability {
      ping-interval = 10d
      failure-limit = 2
      failure-time-window = 5m
    }
 }

  Phonexia dictate subconfiguration.
  phonexia.engines = [{
    # Login to access Phonexia's API.
    login =  "my-login"
  
    # Password to access Phonexia's API.
    password = "secret-password"
    name = "phonexia-pl"
    # URL address of Phonexia's REST API.
    address = "http://1.2.3.4:8600"

    # Name of speech recognition model. Should be supplied by Phonexia.
    model = "PL_PL_6"
  
    # Interval for getting transcription and asking Phonexia API.
    polling-interval = 1s
  }]
}