22 lines
1.2 KiB
Markdown
Executable File
22 lines
1.2 KiB
Markdown
Executable File
**Embodied Speech Markup Language (ESML)** is a specialized XML-based markup language designed to control how virtual humans, avatars, or robots communicate. Unlike standard text-to-speech (TTS) which only focuses on audio, ESML "embodies" the speech by synchronizing the voice with non-verbal behaviors like gestures, facial expressions, and posture.
|
|
|
|
It acts as a bridge between the "brain" of an AI (the text it wants to say) and the "body" of the character (how it should move while saying it).
|
|
|
|
---
|
|
|
|
### 1. Key Components of ESML
|
|
|
|
ESML allows developers to tag text with specific instructions that the animation engine interprets in real-time.
|
|
|
|
- **Prosody Control:** Adjusting pitch, rate, and volume to make the voice sound more human and less robotic.
|
|
|
|
- **Gestural Markers:** Telling the avatar exactly when to point, shrug, or nod during a sentence.
|
|
|
|
- **Facial Expression Tags:** Triggering emotions like `<smile>` or `<frown>` that coincide with the spoken words.
|
|
|
|
- **Synchronization:** Ensuring that a "pointing" gesture happens exactly when the avatar says the word "there."
|
|
|
|
|
|
|
|
> [!warning]
|
|
> The Above explanations is AI Generated, Learn more at : [[ESML-SDK.pdf]] |