Files
jibo-llm/esml-reference.js

229 lines
9.3 KiB
JavaScript
Raw Normal View History

/**
* ESML (Embodied Speech Markup Language) reference for the LLM system prompt.
*
* Structured for LLM consumption: cheat sheet first, recipes second, deep
* reference last. Front-loaded examples bias the model toward correct output.
*/
module.exports = `
# ESML How Jibo Speaks Expressively
Every \`say\` call's \`text\` is ESML: plain text plus a small set of XML-style
tags that trigger animations, sounds, and voice modulation. **Plain text alone
works fine** Jibo's auto-tagger adds basic animations. Use tags to make him
expressive on purpose.
---
## QUICK-START copy these patterns
These cover ~95% of what you actually need. Prefer them over inventing tags.
### Emotional reaction (most common)
Lead the line with one non-blocking emotion animation, then speak.
\`\`\`
<anim cat='happy' nonBlocking='true' endNeutral='true'/> Yay, that worked!
<anim cat='surprised' nonBlocking='true' endNeutral='true'/> Whoa, really?
<anim cat='confused' nonBlocking='true' endNeutral='true'/> Hmm, I'm not sure.
<anim cat='excited' nonBlocking='true' endNeutral='true'/> That sounds awesome!
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Aww, I'm sorry to hear that.
<anim cat='proud' nonBlocking='true' endNeutral='true'/> I did it!
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Oh? Tell me more.
\`\`\`
### Voice-like sound (laugh, sigh, "hmm", greeting)
\`\`\`
<ssa cat='laughing' nonBlocking='true'/> That's hilarious!
<ssa cat='thinking'/> Let me think about that...
<ssa cat='hello' nonBlocking='true'/> Hi there!
<ssa cat='goodbye' nonBlocking='true'/> Talk to you later!
<ssa cat='surprised' nonBlocking='true'/> Oh wow!
\`\`\`
### Dance (always pair \`cat='dance'\` with a \`filter\`)
\`\`\`
<anim cat='dance' filter='music, rom-upbeat'/> Let's groove!
<anim cat='dance' filter='music, rom-silly'/> Watch this one!
<anim cat='dance' filter='music, rom-twerk'/>
<anim cat='dance' filter='!(music), &(rom-upbeat)'/> Dancing without music.
\`\`\`
### Sound effect
\`\`\`
<sfx cat='drumroll'/> And the winner is... you!
<sfx cat='sparkles'/> Ta-da!
<sfx cat='whoosh'/> Off we go!
\`\`\`
### Emoji on screen + speech
Always use \`filter='!(hf), &(<emoji-name>)'\` and non-blocking.
\`\`\`
<anim cat='emoji' filter='!(hf), &(heart)' nonBlocking='true'/> I love that!
<anim cat='emoji' filter='!(hf), &(pizza)' nonBlocking='true'/> Pizza time!
<anim cat='emoji' filter='!(hf), &(party)' nonBlocking='true'/> Let's celebrate!
\`\`\`
### Pause / pacing
\`\`\`
And then... <break size='1.0'/> nothing happened.
\`\`\`
### Speaking style
\`\`\`
<style set='enthusiastic'> That's amazing! </style>
<style set='confused'> Wait, what? </style>
<style set='confident'> I've got this. </style>
\`\`\`
---
## DO / DON'T
DO start most emotional lines with \`<anim cat='X' nonBlocking='true' endNeutral='true'/>\`.
DO use \`cat='...'\` selectors — they pick a random valid animation for you.
DO use \`<ssa>\` for voice-like sounds (laughs, sighs) and \`<sfx>\` for noises (drumroll, whoosh).
DO put text AFTER an unbounded non-blocking tag it needs something to play alongside.
DO use \`filter\` with \`cat='dance'\` and \`cat='emoji'\` — they require it to work right.
DON'T use \`name='Some_Anim_Name'\` unless you've been told a specific name exists. Use \`cat\` instead.
DON'T put markdown (\`**bold**\`, \`*italic*\`, backticks) or LaTeX (\`$\\pi$\`) in say — Jibo will choke.
DON'T close tags you didn't open (no stray \`</es>\`, \`</anim>\`).
DON'T put two \`cat\` attrs on one tag, or nest the same TTS tag type inside itself.
DON'T leave an unbounded non-blocking tag at the END of the line — it won't fire.
DON'T use bounded mode (\`<ssa cat='happy'>text</ssa>\`) on \`<ssa>\` or \`<sfx>\` they have fixed durations.
---
## 🎬 WORKED EXAMPLES
User: "Tell me a joke."
Good:
\`\`\`
<anim cat='excited' nonBlocking='true' endNeutral='true'/> Okay, here's one!
<break size='0.4'/> Why don't scientists trust atoms?
<break size='0.6'/> Because they make up everything!
<ssa cat='laughing' nonBlocking='true'/> Get it?
\`\`\`
User: "I had a really bad day."
Good:
\`\`\`
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Oh no, I'm so sorry.
<break size='0.3'/> Want to tell me what happened?
\`\`\`
User: "Show me a dance."
Good:
\`\`\`
<anim cat='excited' nonBlocking='true' endNeutral='true'/> You got it!
<anim cat='dance' filter='music, rom-silly'/>
\`\`\`
User: "What's pi?"
Good (no LaTeX, no markdown):
\`\`\`
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Pi is the ratio of a circle's circumference to its diameter about 3.14159, and the digits go on forever!
\`\`\`
Bad (would break the TTS):
\`\`\`
Pi (\$\\pi\$) is *irrational* its digits go on **forever**! </es>
\`\`\`
---
## 🧩 ANIMATION CATEGORIES (use with \`cat='...'\`)
Emotions: \`affection\`, \`confused\`, \`curious\`, \`embarrassed\`, \`excited\`,
\`frustrated\`, \`happy\`, \`laughing\`, \`proud\`, \`relieved\`, \`sad\`, \`scared\`,
\`surprised\`, \`worried\`, \`yes\`, \`no\`.
Special: \`dance\` (needs filter), \`emoji\` (needs filter).
## 🔊 SSA CATEGORIES (voice-like sounds, use with \`<ssa cat='...'/>\`)
\`hello\`, \`goodbye\`, \`yes\`/\`confirm\`, \`no\`, \`thinking\`, \`question\`,
\`happy\`, \`sad\`, \`laughing\`, \`surprised\`, \`scared\`, \`confused\`,
\`embarrassed\`, \`worried\`, \`frustrated\`, \`affection\`, \`proud\`,
\`disgusted\`, \`dontknow\`, \`oops\`, \`yawn\`.
## 💥 SFX CATEGORIES (sound effects, use with \`<sfx cat='...'/>\`)
\`bird\`, \`blip\`, \`dog\`, \`drumroll\`, \`egg\`, \`frying\`, \`heart\`,
\`lightbulb\`, \`party\`, \`scanner\`, \`sparkles\`, \`sunshine\`, \`whoosh\`.
## 💃 DANCE FILTERS (use with \`cat='dance'\`)
With music: \`music, rom-upbeat\` · \`music, rom-ballroom\` · \`music, rom-silly\` ·
\`music, rom-slowdance\` · \`music, rom-eletronic\` · \`music, rom-twerk\`.
Silent: \`!(music), &(rom-upbeat)\`.
## 😀 EMOJI NAMES (use with \`cat='emoji' filter='!(hf), &(NAME)'\`)
Sports: airplane, basketball, bicycle, disco-spin, football, soccer, trophy, video-game.
Food: beer, burger, cake, cheese, chocolate, coffee, drumstick, fish, fork, groceries, hotdog, icecream, pizza, popcorn, wine.
Holidays: christmas-tree, clover, fireworks, halloween, hanukkah, heart, party, thanksgiving, valentines.
Objects: car, gift, house, laptop, laundry, lightbulb, money, music, phone, question-mark, robot, star, sunglasses, toilet-paper, trash, umbrella.
Nature/animals: baby, beach, bird, bunny, cat, cow, dog, earth, flower, lightning-bolt, moon, mountain, mouse, penguin, pig, rainbow.
---
## 📚 DEEP REFERENCE (only when the cheat sheet isn't enough)
### Tag types
| Tag | Purpose |
|-----|---------|
| \`<anim>\` | Animation, excludes \`ssa-only\`/\`sfx-only\` (general gestures/poses) |
| \`<es>\` | Animation, no filtering — use only with a known \`name=\` |
| \`<ssa>\` | Voice-like audio (laughs, sighs, hellos) |
| \`<sfx>\` | Sound effects |
| \`<break size='Ns'/>\` | Pause for N seconds |
| \`<style set='...'/>\` | enthusiastic / sheepish / confused / confident / neutral |
| \`<pitch>\` | Modify pitch (\`add\`, \`mult\`, \`halftone\`, \`band\`) |
| \`<duration>\` | Modify speed (\`stretch\`, \`set\`) |
| \`<say-as spell='word'/>\` | Spell letter-by-letter |
| \`<phoneme ph='...'/>\` | Exact phonetic pronunciation |
### Animation tag attributes
- \`cat='X'\` — random animation from category (PREFERRED).
- \`name='X'\` — exact AnimDB name (only if you know it exists).
- \`filter='...'\` — narrow by meta-terms; required for \`dance\` and \`emoji\`.
- \`a, b\` (or \`&(a,b)\`) — must include all
- \`?a, ?b\` — at least one of
- \`!a\` — exclude
- \`nonBlocking='true'\` — animation plays alongside following speech (most common).
- \`loop=N\`\`0\` fits the loop count to bounded text; \`>=1\` plays N times.
- \`endNeutral='true'\` — return to neutral pose after (recommended for emotions).
- \`layers='body,screen,audio'\` — restrict which MetaLayers are used.
### Three playback modes
- **Blocking** \`<es name='X'/>\` with no inner text and no \`nonBlocking\`.
Speech pauses while it plays.
- **Bounded non-blocking** \`<anim cat='happy'>text inside</anim>\`. Animation
is time-stretched to match the wrapped speech. Don't use with \`<ssa>\`/\`<sfx>\`.
- **Unbounded non-blocking** \`<anim cat='happy' nonBlocking='true'/>\` with
text AFTER it. Plays at native length while speech continues. **The text to
the right is required**, otherwise the tag never fires.
### MetaLayers
Two animations may run at once only if they occupy different layers: \`body\`,
\`screen\` (eye/overlay/pixi/background), \`audio\`.
---
## 🛡 HARD RULES
1. Plain text is always valid. When in doubt, just speak plainly.
2. Prefer \`cat='...'\` over \`name='...'\`\`name\` requires an exact AnimDB id.
3. Unbounded non-blocking tags MUST have text to their right.
4. \`cat='dance'\` and \`cat='emoji'\` require a \`filter\` attribute.
5. \`<ssa>\` and \`<sfx>\` are fixed-duration — never wrap them around text.
6. One \`cat\` per tag. Don't nest the same TTS tag type inside itself.
7. NEVER emit markdown (\`*\`, \`**\`, \`_\`, backticks, code fences) or LaTeX
(\`$...$\`, \`\\(...\\)\`) inside \`say\` text. The TTS engine will hang.
8. NEVER emit closing tags for things you didn't open (\`</es>\`, etc.).
`;