SSML

Speech Synthesis Markup Language

What is SSML

Speech Synthesis Markup Language (SSML) is a markup language that controls how the speech is generated using text-to-speech.

It's a tool that allows speech customization and adds speech effects:

  • pronunciation
  • speaking rate
  • volume
  • intonation
  • pause


How to use it

The tag tells a Text-To-Speech system that the words contained within it are intended to be said.

Make sure that for your text to be spoken, wrap it up with this speech tag. You can then customize and add up different speech tags to create the speech effect you want.

<speak>Hi, my name is Philipp.</speak>

On automate, you can apply these tags directly in the message tool.

688


Test & Preview SSML tags



SSML tags

Here are the list of the most common tags

< break >

You cans et the length of the break:

By time
- by seconds <break time="3s"/>
- by milliseconds <break time="250ms"/>

By strength
- <break strength="x-weak"/>
- <break strength="weak"/>
- <break strength="medium"/>
- <break strength="strong"/>
- <break strength="x-strong"/>

Example:

<speak> Hello, I'm now controlled by the SSML language thanks to the speak command.

Now I'm going to take a break for 3 seconds. <break time="3s"/></speak>


< emphasis >

Emphasizing words to make them louder and slower with the following values:

  • strong
  • moderate
  • none
  • reduced

🚧

This tag should only be used around a full sentence. Enclosing words within a sentence may cause unwanted pauses in speech.

Example:

<speak>Hello! My name is <emphasis level="strong">Bianca</emphasis>.
I am your virtual <emphasis level="moderate">Bank assistant</emphasis>.</speak>.


< lang >

Specifies the intended language the voice should speak. Click here to see the list of language code.

Example:

<speak>
Hello! I am your virtual bank assistant Bianca. 
Would you like to continue in English? 
<lang xml:lang="fr-FR">où voulez-vous continuer en français ?</lang></speak>


< p >

Adding a pause between paragraphs. It is equivalent to

< s >

Adding a pause between sentences. This is equivalent to:

  • Ending a sentence with a period (.).
  • Specifying a pause with `

👍

  • Use ... tags to wrap full sentences, especially if they contain SSML elements that change prosody (that is, , , , , and ).
  • If a break in speech is intended to be long enough that you can hear it, use ... tags and put that break between sentences.


Using phonetic pronunciation

< say-as interpret-as=

Controlling how special types of words, phrases or numbers are spoken.

Currency

<speak>
  <say-as interpret-as='currency' language='en-US'>$42.01</say-as>
</speak>

Telephone
Below, 39 represent the country code for a local number

<say-as interpret-as="telephone" format="39">0117577577</say-as>

verbatim or spelling

<speak>
  <say-as interpret-as="verbatim">abcdefg</say-as>
  <say-as interpret-as="verbatim">CA20/2543/21</say-as>
</speak>

or

<speak>Hello. Now I will spell your name slowly. Is your last name<prosody rate="x-slow">
        <say-as interpret-as="characters">Baziński</say-as>
    </prosody> </speak>

Numbers

<speak>Now numbers. You are in the queue <prosody rate="slow">.
    <say-as interpret-as="cardinal">10</say-as>, alternatively 
    <say-as interpret-as="ordinal">10</say-as> or I can also say 
    <say-as interpret-as="characters">10</say-as></prosody>
    <say-as interpret-as="digits">10</say-as></prosody>.

Maybe a little about fractions. Did you know that 
    <say-as interpret-as="fraction">3/4</say-as> of people in Poland want to be vaccinated, a 
    <say-as interpret-as="fraction">2/7</say-as> think that Bill 
    <say-as interpret-as="expletive">Gates</say-as> of Mic.
    <say-as interpret-as="expletive">rosoft</say-as> will then control them.</speak>

Ordinal

<speak>
  <say-as interpret-as="ordinal">1</say-as>
</speak>

Date

Good morning, I will now say the date: 
<say-as interpret-as="date" format="yyyymmdd">1960-09-10</say-as>.
<say-as interpret-as="date" format="dmy" detail="2"> 10-9-1960</say-as>
<say-as interpret-as="date" format="dm">10-9</say-as>

</speak>.

**Time

<speak>
  <say-as interpret-as="time" format="hms12">2:30pm</say-as>
</speak>

Units

<speak>
  <say-as interpret-as="unit">10 foot</say-as>
</speak>


< prosody >

Controlling volume, speaking rate, and pitch.

Rate:

  • x-slow, slow, medium, fast, x-fast
  • 100% increases the rate

  • < 100% decreases the rate
  • 100% is the normal rate
<speak>I hope you still like me. 
    <emphasis level="strong">Announcement: I can also modulate my voice</emphasis>.
    <break time="1s"/>.
    <prosody rate="slow" pitch="-10st">quite low</prosody>.
    <break time="1s"/>.
    <prosody rate="slow">speak normally like a robot</prosody>.
    <break time="1s"/>.
    <prosody rate="slow" pitch="+10st">a little higher so you don't waste time</prosody>.
    <break time="1s"/>.
</speak>.

Pitch:

  • x-low, low, medium, high, x-high
  • from 1% to 50% (50% is the maximum value)
  • to -1% to -33,3% (33,3% is the lowest value)
  • using semitones: +1st or -1st
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

Volume:

  • silence, x-soft, soft, medium, loud, x-loud
  • +1db to 4db


< sub >

Pronouncing acronyms and abbreviations.

Example 1:
<speak><sub alias="World Wide Web Consortium">W3C</sub></speak>

Example 2:
<speak>now add 200 <sub alias= "miligram">ml</sub> of flour.
</speak>


Full SSML reference guide