Files
JiboDocs/Dictionary/ESML.md

22 lines
1.2 KiB
Markdown
Executable File

**Embodied Speech Markup Language (ESML)** is a specialized XML-based markup language designed to control how virtual humans, avatars, or robots communicate. Unlike standard text-to-speech (TTS) which only focuses on audio, ESML "embodies" the speech by synchronizing the voice with non-verbal behaviors like gestures, facial expressions, and posture.
It acts as a bridge between the "brain" of an AI (the text it wants to say) and the "body" of the character (how it should move while saying it).
---
### 1. Key Components of ESML
ESML allows developers to tag text with specific instructions that the animation engine interprets in real-time.
- **Prosody Control:** Adjusting pitch, rate, and volume to make the voice sound more human and less robotic.
- **Gestural Markers:** Telling the avatar exactly when to point, shrug, or nod during a sentence.
- **Facial Expression Tags:** Triggering emotions like `<smile>` or `<frown>` that coincide with the spoken words.
- **Synchronization:** Ensuring that a "pointing" gesture happens exactly when the avatar says the word "there."
> [!warning]
> The Above explanations is AI Generated, Learn more at : [[ESML-SDK.pdf]]