The Speak-Tweak tool gives the ability to type someone’s name into a text field and see how Jibo is going to pronounce it. If he is mis-pronouncing it then the user has a chance to try different spellings to see if we can correct Jibo’s pronunciation. Failing that there is the ability to give exact phonemes to try and build a correct pronunciation that way, but that method can be a bit difficult.
Once you have found a name that you have a better pronunciation text for, there is a little “copy” icon to click on and the exact text (or phonemes) for that version will be copied to a clipboard. That change then needs to be communicated to us (a Google sheet with the participant name, participant ID, and new pronounciation text is a good method) and we will modify our database on the back-end to use the new pronunciation.
Note: “plain” versions of the text will just copy the exact text, while the phonetic versions will be wrapped in a <phoneme>...</phoneme>
markup language. This is normal and expected and it is the correct text to send to us.
Here is what the tool looks like:
The large empty box is where you can type in the names to test out. Press to have Jibo speak the name. You can keep pressing multiple times to have him repeat the current name as you listen to see if you think it needs any modifying.
As you enter names they will appear in a list below. This is so you can enter different spellings (or different phonetics) and not lose track of the ones you’ve tried. You can click on the little blue circle for a given name to jump between them and hear the differences side by side (you can also just click on the name itself). The text box will be populated with the one you clicked. You can propose several alternatives and quickly play them against each other to help settle on the best one.
Once you have settled on one you can click on the double-boxes icon at the end of the name and that one will be copied to your clipboard.
The “plain” names are the default. You can click on the “phonetic” toggle below the input box to switch to phonetic mode.
“l eh k s ih k ah n” is an example phonetic string that is suppose to say “lexicon”.
When you copy a phonetic version it’s going to wrap everything in some markup that indicates it’s a phonetic string. That will look something like this: x. That’s normal and we want that whole string to be sent to us please. Note: the plain versions will not have this additional markup.
The Clear button will clear out your list of names. Also, you can press the key when inside the text box to clear the text in the box. Handy when you are trying different things, or when you switch from “plain” mode to “phonetic” mode.
Phonetics
Regarding phonetics, our archive of documentation from Jibo Inc. says:
“The phonetic set used is ARPABET https://en.wikipedia.org/wiki/Arpabet . It is compulsory to use stress markets such as 0 or 1 for vowels. However the system doesn’t support stress marker “2” which is secondary stress.”
Here is a PDF which is a description of our Embodied Speech Markup Language (ESML). It has a section at the very bottom describing the supported phonemes, of which I have attached screen shots below. It also describes additional things you can do with ESML that might be useful, like <pitch>
, <duration>
, and <style>
, which we are not going to go into here.
SDK-SDK - ESML-121023-203758.pdf
That document also talks about a number of other speech modifiers, like <pause>/<break>
, <duration>
, and <pitch>
(and phoneme). Maybe <style>
and <say-as>
might work too. You should be able to use those specific tags in the Speak Tweak tool if they are useful. You will need to type out the full tags as described, in “plain” mode. Many of the other tags are not going to work (things like animations, playing sounds files, body movement etc…) as they are part of a higher level system, the embodied speech system, which uses ESML Embodied Speech Markup Language. We are only dealing with the TTS system with Speak Tweak, and that uses SSML Speech Synthesis Markup Language (which is a subset of ESML).