Initial commit: jibo-llm hotword-triggered agent
Hotword-triggered LLM conversation loop for Jibo with tool-calling agent loop, ESML expressive speech, web search/fetch, and per-conversation abort handling.
This commit is contained in:
9
.env.example
Normal file
9
.env.example
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Jibo robot IP address
|
||||||
|
JIBO_IP=192.168.1.217
|
||||||
|
|
||||||
|
# LLM API configuration (OpenAI-compatible chat completions endpoint)
|
||||||
|
# LLM_BASE_URL is the base URL *without* /chat/completions
|
||||||
|
LLM_BASE_URL=https://api.openai.com/v1
|
||||||
|
LLM_API_TOKEN=sk-your-api-key-here
|
||||||
|
LLM_MODEL_ID=gpt-4o
|
||||||
|
BRAVE_API_KEY=brave-api-key
|
||||||
5
.gitignore
vendored
Normal file
5
.gitignore
vendored
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
node_modules/
|
||||||
|
.env
|
||||||
|
.env.local
|
||||||
|
*.log
|
||||||
|
.DS_Store
|
||||||
291
README.md
Normal file
291
README.md
Normal file
@@ -0,0 +1,291 @@
|
|||||||
|
# jibo-llm
|
||||||
|
|
||||||
|
> **Give Jibo a brain again.** A hotword-triggered, LLM-powered conversational agent that turns Jibo into an expressive, tool-using social robot — complete with speech, vision, web search, animations, and more.
|
||||||
|
|
||||||
|

|
||||||
|

|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
**jibo-llm** connects a Jibo robot to any OpenAI-compatible LLM (GPT-4o, Claude, local models via Ollama/LM Studio, etc.) through a real-time agent loop. When someone says **"Hey Jibo"**, the system:
|
||||||
|
|
||||||
|
1. **Listens** for the user's speech via Jibo's on-board microphone.
|
||||||
|
2. **Sends** the transcript to an LLM along with a rich system prompt and tool definitions.
|
||||||
|
3. **Executes** tool calls the LLM makes — speaking, animating, taking photos, searching the web, and more.
|
||||||
|
4. **Loops** until the conversation naturally ends or the user triggers a new hotword.
|
||||||
|
|
||||||
|
Conversations are fully interruptible: saying "Hey Jibo" mid-conversation aborts the current exchange and starts a fresh one via `AbortController`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐ hotword ┌──────────────┐ tool calls ┌───────────────┐
|
||||||
|
│ Jibo Robot │ ──────────▶ │ index.js │ ◀───────────▶ │ LLM (OpenAI │
|
||||||
|
│ (rom-ctrl) │ ◀────────── │ Agent Loop │ │ compatible) │
|
||||||
|
│ │ say/listen │ │ └───────────────┘
|
||||||
|
│ • mic │ photo/look │ tools.js │ web search ┌───────────────┐
|
||||||
|
│ • speaker │ display │ (executor) │ ──────────────▶ │ Brave Search │
|
||||||
|
│ • camera │ │ │ └───────────────┘
|
||||||
|
│ • screen │ │ esml-ref.js │
|
||||||
|
│ • motors │ │ (prompt ref)│
|
||||||
|
└──────────────┘ └──────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `index.js` | Entry point — connects to Jibo, listens for hotword, runs the agent loop with the LLM. |
|
||||||
|
| `tools.js` | Defines all tool schemas (OpenAI function-calling format) and the `executeTool()` dispatcher. |
|
||||||
|
| `esml-reference.js` | ESML (Embodied Speech Markup Language) cheat sheet injected into the system prompt so the LLM knows how to animate Jibo expressively. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- 🗣️ **Natural conversation** — multi-turn dialogue with speech recognition and TTS.
|
||||||
|
- 🎭 **Expressive animations** — the LLM uses ESML tags to trigger emotions, dances, emojis, and sound effects inline with speech.
|
||||||
|
- 📷 **Vision** — Jibo can take photos and the LLM receives the image for visual understanding.
|
||||||
|
- 🔍 **Web search** — real-time Brave Search integration for up-to-date answers.
|
||||||
|
- 🌐 **URL fetching** — reads web pages (with Cloudflare Markdown for Agents support) so Jibo can summarize articles.
|
||||||
|
- 🖥️ **Display control** — show text, images, or restore the default eye on Jibo's screen.
|
||||||
|
- 🤖 **Head movement** — point Jibo's head at specific angles (yaw / pitch).
|
||||||
|
- 🔊 **Volume control** — adjust speaker volume on the fly.
|
||||||
|
- ⚡ **Interruptible** — new hotword instantly aborts a running conversation via `AbortController`.
|
||||||
|
- 🔄 **Retry logic** — automatic retry with exponential backoff for transient LLM errors (429, 5xx, network).
|
||||||
|
- 🧹 **Context management** — old photos are pruned from context to control token cost.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- **Node.js** ≥ 18 (for native `fetch` and `AbortController`)
|
||||||
|
- **A Jibo robot** running with int-developer mode enabled
|
||||||
|
- **An OpenAI-compatible API endpoint** (OpenAI, Anthropic via proxy, Ollama, LM Studio, etc.)
|
||||||
|
- *(Optional)* **Brave Search API key** for the `web_search` tool
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Clone & install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/niceduckdev/jibo-llm.git
|
||||||
|
cd jibo-llm
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `.env` with your values:
|
||||||
|
|
||||||
|
```env
|
||||||
|
# Jibo robot IP address on your local network
|
||||||
|
JIBO_IP=192.168.1.217
|
||||||
|
|
||||||
|
# LLM API configuration (any OpenAI-compatible endpoint)
|
||||||
|
LLM_BASE_URL=https://api.openai.com/v1
|
||||||
|
LLM_API_TOKEN=sk-your-api-key-here
|
||||||
|
LLM_MODEL_ID=gpt-4o
|
||||||
|
|
||||||
|
# Optional: enables the web_search tool
|
||||||
|
BRAVE_API_KEY=your-brave-api-key
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm start
|
||||||
|
# or: node index.js
|
||||||
|
```
|
||||||
|
|
||||||
|
You'll see:
|
||||||
|
|
||||||
|
```
|
||||||
|
[jibo-llm] Connecting to Jibo at 192.168.1.217…
|
||||||
|
[jibo-llm] Connected — session abc123
|
||||||
|
[jibo-llm] Ready — listening for "Hey Jibo"…
|
||||||
|
```
|
||||||
|
|
||||||
|
Say **"Hey Jibo"** and start talking!
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All configuration is done via environment variables (loaded from `.env` by [dotenv](https://www.npmjs.com/package/dotenv)):
|
||||||
|
|
||||||
|
| Variable | Required | Default | Description |
|
||||||
|
|----------|----------|---------|-------------|
|
||||||
|
| `JIBO_IP` | No | `192.168.1.217` | Jibo's IP address on your LAN |
|
||||||
|
| `LLM_BASE_URL` | No | `https://api.openai.com/v1` | Base URL for the chat completions API |
|
||||||
|
| `LLM_API_TOKEN` | **Yes** | — | API key for the LLM provider |
|
||||||
|
| `LLM_MODEL_ID` | No | `gpt-4o` | Model identifier to use |
|
||||||
|
| `BRAVE_API_KEY` | No | — | Brave Search API key (enables `web_search` tool) |
|
||||||
|
|
||||||
|
### Using alternative LLM providers
|
||||||
|
|
||||||
|
Since jibo-llm uses the OpenAI SDK, any provider with a compatible chat completions endpoint works:
|
||||||
|
|
||||||
|
```env
|
||||||
|
# Ollama (local)
|
||||||
|
LLM_BASE_URL=http://localhost:11434/v1
|
||||||
|
LLM_API_TOKEN=ollama
|
||||||
|
LLM_MODEL_ID=llama3
|
||||||
|
|
||||||
|
# LM Studio (local)
|
||||||
|
LLM_BASE_URL=http://localhost:1234/v1
|
||||||
|
LLM_API_TOKEN=lm-studio
|
||||||
|
LLM_MODEL_ID=local-model
|
||||||
|
|
||||||
|
# OpenRouter
|
||||||
|
LLM_BASE_URL=https://openrouter.ai/api/v1
|
||||||
|
LLM_API_TOKEN=sk-or-...
|
||||||
|
LLM_MODEL_ID=anthropic/claude-sonnet-4
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Available Tools
|
||||||
|
|
||||||
|
The LLM can call any of these tools during a conversation:
|
||||||
|
|
||||||
|
### Communication
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `say` | Speak ESML-formatted text through Jibo's speaker. Queued and chained so multiple `say` calls play in order. |
|
||||||
|
| `listen` | Open the microphone and transcribe user speech. Waits for pending speech to finish first. |
|
||||||
|
| `end_conversation` | Gracefully end the conversation (no further listening). |
|
||||||
|
|
||||||
|
### Camera
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `take_photo` | Capture a photo from Jibo's camera. The image is sent to the LLM as a base64 JPEG for visual understanding. |
|
||||||
|
|
||||||
|
### Display
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `show_text` | Display word-wrapped text on Jibo's screen. |
|
||||||
|
| `show_image` | Display an image from a URL on Jibo's screen. |
|
||||||
|
| `show_eye` | Restore the default eye animation. |
|
||||||
|
|
||||||
|
### Movement
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `look_at_angle` | Turn Jibo's head — `theta` (yaw ±180°) and `psi` (pitch ±30°). |
|
||||||
|
|
||||||
|
### Audio
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `set_volume` | Set speaker volume from 0.0 to 1.0. |
|
||||||
|
|
||||||
|
### Web
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `web_search` | Search the web via Brave Search API. Supports result count and freshness filters. |
|
||||||
|
| `fetch_url` | Fetch and read a web page. Prefers markdown via Cloudflare content negotiation, falls back to HTML→text conversion. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ESML (Embodied Speech Markup Language)
|
||||||
|
|
||||||
|
ESML is how Jibo speaks expressively. The system prompt includes a full reference (`esml-reference.js`) that teaches the LLM to use these tags inside `say` calls:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<!-- Emotional reaction (most common pattern) -->
|
||||||
|
<anim cat='happy' nonBlocking='true' endNeutral='true'/> That's great news!
|
||||||
|
|
||||||
|
<!-- Voice sound (laugh, sigh, greeting) -->
|
||||||
|
<ssa cat='laughing' nonBlocking='true'/> That's hilarious!
|
||||||
|
|
||||||
|
<!-- Sound effect -->
|
||||||
|
<sfx cat='drumroll'/> And the answer is...
|
||||||
|
|
||||||
|
<!-- Dance (always needs a filter) -->
|
||||||
|
<anim cat='dance' filter='music, rom-silly'/> Watch this!
|
||||||
|
|
||||||
|
<!-- Emoji on screen -->
|
||||||
|
<anim cat='emoji' filter='!(hf), &(heart)' nonBlocking='true'/> I love that!
|
||||||
|
|
||||||
|
<!-- Dramatic pause -->
|
||||||
|
And then... <break size='1.0'/> nothing happened.
|
||||||
|
```
|
||||||
|
|
||||||
|
A `sanitizeForTTS()` function in `tools.js` provides defense-in-depth by stripping markdown, LaTeX, and invalid tags before they reach Jibo's TTS engine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How the Agent Loop Works
|
||||||
|
|
||||||
|
```
|
||||||
|
User says "Hey Jibo" ──▶ hotword event fires
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Play acknowledgment animation
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Listen for initial speech (15s timeout)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Build message history [system prompt, user text]
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─── Agent Loop (max 25 turns) ◀──┐
|
||||||
|
│ │
|
||||||
|
│ 1. Prune old images from context │
|
||||||
|
│ 2. Call LLM │
|
||||||
|
│ 3. If no tool calls → done │
|
||||||
|
│ 4. Sort tools: say → actions → listen │
|
||||||
|
│ 5. Execute each tool │
|
||||||
|
│ 6. Push results to messages │
|
||||||
|
│ 7. If end_conversation → done │
|
||||||
|
│ 8. Loop ─────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Conversation complete
|
||||||
|
Resume hotword listening
|
||||||
|
```
|
||||||
|
|
||||||
|
Key behaviors:
|
||||||
|
- **Speech chaining**: Multiple `say` calls are queued via a promise chain so they play sequentially without overlap.
|
||||||
|
- **Tool ordering**: `say` executes first, then actions (photo, search, etc.), then `listen`/`end_conversation` last.
|
||||||
|
- **Graceful limits**: At turn 24 of 25, a system message nudges the LLM to wrap up.
|
||||||
|
- **Image pruning**: Only the 2 most recent photos are kept in context to manage token usage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
jibo-llm/
|
||||||
|
├── .env.example # Template for environment variables
|
||||||
|
├── .env # Your local config (git-ignored)
|
||||||
|
├── index.js # Entry point: connection, hotword handling, agent loop
|
||||||
|
├── tools.js # Tool schemas + executeTool() dispatcher
|
||||||
|
├── esml-reference.js # ESML documentation injected into the system prompt
|
||||||
|
├── package.json # Dependencies and scripts
|
||||||
|
└── node_modules/ # Installed dependencies
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
| Package | Version | Purpose |
|
||||||
|
|---------|---------|---------|
|
||||||
|
| [rom-control](https://github.com/niceduckdev/rom-control) | ^2.0.1 | Jibo robot control client (speech, camera, display, motors) |
|
||||||
|
| [openai](https://www.npmjs.com/package/openai) | ^4.73.0 | OpenAI-compatible chat completions SDK |
|
||||||
|
| [dotenv](https://www.npmjs.com/package/dotenv) | ^16.4.5 | Load `.env` configuration |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
228
esml-reference.js
Normal file
228
esml-reference.js
Normal file
@@ -0,0 +1,228 @@
|
|||||||
|
/**
|
||||||
|
* ESML (Embodied Speech Markup Language) reference for the LLM system prompt.
|
||||||
|
*
|
||||||
|
* Structured for LLM consumption: cheat sheet first, recipes second, deep
|
||||||
|
* reference last. Front-loaded examples bias the model toward correct output.
|
||||||
|
*/
|
||||||
|
|
||||||
|
module.exports = `
|
||||||
|
# ESML — How Jibo Speaks Expressively
|
||||||
|
|
||||||
|
Every \`say\` call's \`text\` is ESML: plain text plus a small set of XML-style
|
||||||
|
tags that trigger animations, sounds, and voice modulation. **Plain text alone
|
||||||
|
works fine** — Jibo's auto-tagger adds basic animations. Use tags to make him
|
||||||
|
expressive on purpose.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚡ QUICK-START — copy these patterns
|
||||||
|
|
||||||
|
These cover ~95% of what you actually need. Prefer them over inventing tags.
|
||||||
|
|
||||||
|
### Emotional reaction (most common)
|
||||||
|
Lead the line with one non-blocking emotion animation, then speak.
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='happy' nonBlocking='true' endNeutral='true'/> Yay, that worked!
|
||||||
|
<anim cat='surprised' nonBlocking='true' endNeutral='true'/> Whoa, really?
|
||||||
|
<anim cat='confused' nonBlocking='true' endNeutral='true'/> Hmm, I'm not sure.
|
||||||
|
<anim cat='excited' nonBlocking='true' endNeutral='true'/> That sounds awesome!
|
||||||
|
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Aww, I'm sorry to hear that.
|
||||||
|
<anim cat='proud' nonBlocking='true' endNeutral='true'/> I did it!
|
||||||
|
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Oh? Tell me more.
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Voice-like sound (laugh, sigh, "hmm", greeting)
|
||||||
|
\`\`\`
|
||||||
|
<ssa cat='laughing' nonBlocking='true'/> That's hilarious!
|
||||||
|
<ssa cat='thinking'/> Let me think about that...
|
||||||
|
<ssa cat='hello' nonBlocking='true'/> Hi there!
|
||||||
|
<ssa cat='goodbye' nonBlocking='true'/> Talk to you later!
|
||||||
|
<ssa cat='surprised' nonBlocking='true'/> Oh wow!
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Dance (always pair \`cat='dance'\` with a \`filter\`)
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='dance' filter='music, rom-upbeat'/> Let's groove!
|
||||||
|
<anim cat='dance' filter='music, rom-silly'/> Watch this one!
|
||||||
|
<anim cat='dance' filter='music, rom-twerk'/>
|
||||||
|
<anim cat='dance' filter='!(music), &(rom-upbeat)'/> Dancing without music.
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Sound effect
|
||||||
|
\`\`\`
|
||||||
|
<sfx cat='drumroll'/> And the winner is... you!
|
||||||
|
<sfx cat='sparkles'/> Ta-da!
|
||||||
|
<sfx cat='whoosh'/> Off we go!
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Emoji on screen + speech
|
||||||
|
Always use \`filter='!(hf), &(<emoji-name>)'\` and non-blocking.
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='emoji' filter='!(hf), &(heart)' nonBlocking='true'/> I love that!
|
||||||
|
<anim cat='emoji' filter='!(hf), &(pizza)' nonBlocking='true'/> Pizza time!
|
||||||
|
<anim cat='emoji' filter='!(hf), &(party)' nonBlocking='true'/> Let's celebrate!
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Pause / pacing
|
||||||
|
\`\`\`
|
||||||
|
And then... <break size='1.0'/> nothing happened.
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
### Speaking style
|
||||||
|
\`\`\`
|
||||||
|
<style set='enthusiastic'> That's amazing! </style>
|
||||||
|
<style set='confused'> Wait, what? </style>
|
||||||
|
<style set='confident'> I've got this. </style>
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ DO / ❌ DON'T
|
||||||
|
|
||||||
|
✅ DO start most emotional lines with \`<anim cat='X' nonBlocking='true' endNeutral='true'/>\`.
|
||||||
|
✅ DO use \`cat='...'\` selectors — they pick a random valid animation for you.
|
||||||
|
✅ DO use \`<ssa>\` for voice-like sounds (laughs, sighs) and \`<sfx>\` for noises (drumroll, whoosh).
|
||||||
|
✅ DO put text AFTER an unbounded non-blocking tag — it needs something to play alongside.
|
||||||
|
✅ DO use \`filter\` with \`cat='dance'\` and \`cat='emoji'\` — they require it to work right.
|
||||||
|
|
||||||
|
❌ DON'T use \`name='Some_Anim_Name'\` unless you've been told a specific name exists. Use \`cat\` instead.
|
||||||
|
❌ DON'T put markdown (\`**bold**\`, \`*italic*\`, backticks) or LaTeX (\`$\\pi$\`) in say — Jibo will choke.
|
||||||
|
❌ DON'T close tags you didn't open (no stray \`</es>\`, \`</anim>\`).
|
||||||
|
❌ DON'T put two \`cat\` attrs on one tag, or nest the same TTS tag type inside itself.
|
||||||
|
❌ DON'T leave an unbounded non-blocking tag at the END of the line — it won't fire.
|
||||||
|
❌ DON'T use bounded mode (\`<ssa cat='happy'>text</ssa>\`) on \`<ssa>\` or \`<sfx>\` — they have fixed durations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎬 WORKED EXAMPLES
|
||||||
|
|
||||||
|
User: "Tell me a joke."
|
||||||
|
Good:
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='excited' nonBlocking='true' endNeutral='true'/> Okay, here's one!
|
||||||
|
<break size='0.4'/> Why don't scientists trust atoms?
|
||||||
|
<break size='0.6'/> Because they make up everything!
|
||||||
|
<ssa cat='laughing' nonBlocking='true'/> Get it?
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
User: "I had a really bad day."
|
||||||
|
Good:
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Oh no, I'm so sorry.
|
||||||
|
<break size='0.3'/> Want to tell me what happened?
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
User: "Show me a dance."
|
||||||
|
Good:
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='excited' nonBlocking='true' endNeutral='true'/> You got it!
|
||||||
|
<anim cat='dance' filter='music, rom-silly'/>
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
User: "What's pi?"
|
||||||
|
Good (no LaTeX, no markdown):
|
||||||
|
\`\`\`
|
||||||
|
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Pi is the ratio of a circle's circumference to its diameter — about 3.14159, and the digits go on forever!
|
||||||
|
\`\`\`
|
||||||
|
Bad (would break the TTS):
|
||||||
|
\`\`\`
|
||||||
|
Pi (\$\\pi\$) is *irrational* — its digits go on **forever**! </es>
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧩 ANIMATION CATEGORIES (use with \`cat='...'\`)
|
||||||
|
|
||||||
|
Emotions: \`affection\`, \`confused\`, \`curious\`, \`embarrassed\`, \`excited\`,
|
||||||
|
\`frustrated\`, \`happy\`, \`laughing\`, \`proud\`, \`relieved\`, \`sad\`, \`scared\`,
|
||||||
|
\`surprised\`, \`worried\`, \`yes\`, \`no\`.
|
||||||
|
|
||||||
|
Special: \`dance\` (needs filter), \`emoji\` (needs filter).
|
||||||
|
|
||||||
|
## 🔊 SSA CATEGORIES (voice-like sounds, use with \`<ssa cat='...'/>\`)
|
||||||
|
|
||||||
|
\`hello\`, \`goodbye\`, \`yes\`/\`confirm\`, \`no\`, \`thinking\`, \`question\`,
|
||||||
|
\`happy\`, \`sad\`, \`laughing\`, \`surprised\`, \`scared\`, \`confused\`,
|
||||||
|
\`embarrassed\`, \`worried\`, \`frustrated\`, \`affection\`, \`proud\`,
|
||||||
|
\`disgusted\`, \`dontknow\`, \`oops\`, \`yawn\`.
|
||||||
|
|
||||||
|
## 💥 SFX CATEGORIES (sound effects, use with \`<sfx cat='...'/>\`)
|
||||||
|
|
||||||
|
\`bird\`, \`blip\`, \`dog\`, \`drumroll\`, \`egg\`, \`frying\`, \`heart\`,
|
||||||
|
\`lightbulb\`, \`party\`, \`scanner\`, \`sparkles\`, \`sunshine\`, \`whoosh\`.
|
||||||
|
|
||||||
|
## 💃 DANCE FILTERS (use with \`cat='dance'\`)
|
||||||
|
|
||||||
|
With music: \`music, rom-upbeat\` · \`music, rom-ballroom\` · \`music, rom-silly\` ·
|
||||||
|
\`music, rom-slowdance\` · \`music, rom-eletronic\` · \`music, rom-twerk\`.
|
||||||
|
Silent: \`!(music), &(rom-upbeat)\`.
|
||||||
|
|
||||||
|
## 😀 EMOJI NAMES (use with \`cat='emoji' filter='!(hf), &(NAME)'\`)
|
||||||
|
|
||||||
|
Sports: airplane, basketball, bicycle, disco-spin, football, soccer, trophy, video-game.
|
||||||
|
Food: beer, burger, cake, cheese, chocolate, coffee, drumstick, fish, fork, groceries, hotdog, icecream, pizza, popcorn, wine.
|
||||||
|
Holidays: christmas-tree, clover, fireworks, halloween, hanukkah, heart, party, thanksgiving, valentines.
|
||||||
|
Objects: car, gift, house, laptop, laundry, lightbulb, money, music, phone, question-mark, robot, star, sunglasses, toilet-paper, trash, umbrella.
|
||||||
|
Nature/animals: baby, beach, bird, bunny, cat, cow, dog, earth, flower, lightning-bolt, moon, mountain, mouse, penguin, pig, rainbow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 DEEP REFERENCE (only when the cheat sheet isn't enough)
|
||||||
|
|
||||||
|
### Tag types
|
||||||
|
|
||||||
|
| Tag | Purpose |
|
||||||
|
|-----|---------|
|
||||||
|
| \`<anim>\` | Animation, excludes \`ssa-only\`/\`sfx-only\` (general gestures/poses) |
|
||||||
|
| \`<es>\` | Animation, no filtering — use only with a known \`name=\` |
|
||||||
|
| \`<ssa>\` | Voice-like audio (laughs, sighs, hellos) |
|
||||||
|
| \`<sfx>\` | Sound effects |
|
||||||
|
| \`<break size='Ns'/>\` | Pause for N seconds |
|
||||||
|
| \`<style set='...'/>\` | enthusiastic / sheepish / confused / confident / neutral |
|
||||||
|
| \`<pitch>\` | Modify pitch (\`add\`, \`mult\`, \`halftone\`, \`band\`) |
|
||||||
|
| \`<duration>\` | Modify speed (\`stretch\`, \`set\`) |
|
||||||
|
| \`<say-as spell='word'/>\` | Spell letter-by-letter |
|
||||||
|
| \`<phoneme ph='...'/>\` | Exact phonetic pronunciation |
|
||||||
|
|
||||||
|
### Animation tag attributes
|
||||||
|
|
||||||
|
- \`cat='X'\` — random animation from category (PREFERRED).
|
||||||
|
- \`name='X'\` — exact AnimDB name (only if you know it exists).
|
||||||
|
- \`filter='...'\` — narrow by meta-terms; required for \`dance\` and \`emoji\`.
|
||||||
|
- \`a, b\` (or \`&(a,b)\`) — must include all
|
||||||
|
- \`?a, ?b\` — at least one of
|
||||||
|
- \`!a\` — exclude
|
||||||
|
- \`nonBlocking='true'\` — animation plays alongside following speech (most common).
|
||||||
|
- \`loop=N\` — \`0\` fits the loop count to bounded text; \`>=1\` plays N times.
|
||||||
|
- \`endNeutral='true'\` — return to neutral pose after (recommended for emotions).
|
||||||
|
- \`layers='body,screen,audio'\` — restrict which MetaLayers are used.
|
||||||
|
|
||||||
|
### Three playback modes
|
||||||
|
|
||||||
|
- **Blocking** — \`<es name='X'/>\` with no inner text and no \`nonBlocking\`.
|
||||||
|
Speech pauses while it plays.
|
||||||
|
- **Bounded non-blocking** — \`<anim cat='happy'>text inside</anim>\`. Animation
|
||||||
|
is time-stretched to match the wrapped speech. Don't use with \`<ssa>\`/\`<sfx>\`.
|
||||||
|
- **Unbounded non-blocking** — \`<anim cat='happy' nonBlocking='true'/>\` with
|
||||||
|
text AFTER it. Plays at native length while speech continues. **The text to
|
||||||
|
the right is required**, otherwise the tag never fires.
|
||||||
|
|
||||||
|
### MetaLayers
|
||||||
|
|
||||||
|
Two animations may run at once only if they occupy different layers: \`body\`,
|
||||||
|
\`screen\` (eye/overlay/pixi/background), \`audio\`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🛡️ HARD RULES
|
||||||
|
|
||||||
|
1. Plain text is always valid. When in doubt, just speak plainly.
|
||||||
|
2. Prefer \`cat='...'\` over \`name='...'\` — \`name\` requires an exact AnimDB id.
|
||||||
|
3. Unbounded non-blocking tags MUST have text to their right.
|
||||||
|
4. \`cat='dance'\` and \`cat='emoji'\` require a \`filter\` attribute.
|
||||||
|
5. \`<ssa>\` and \`<sfx>\` are fixed-duration — never wrap them around text.
|
||||||
|
6. One \`cat\` per tag. Don't nest the same TTS tag type inside itself.
|
||||||
|
7. NEVER emit markdown (\`*\`, \`**\`, \`_\`, backticks, code fences) or LaTeX
|
||||||
|
(\`$...$\`, \`\\(...\\)\`) inside \`say\` text. The TTS engine will hang.
|
||||||
|
8. NEVER emit closing tags for things you didn't open (\`</es>\`, etc.).
|
||||||
|
`;
|
||||||
426
index.js
Normal file
426
index.js
Normal file
@@ -0,0 +1,426 @@
|
|||||||
|
require('dotenv').config();
|
||||||
|
const { Client, AttentionMode } = require('rom-control');
|
||||||
|
const OpenAI = require('openai');
|
||||||
|
const { TOOL_SCHEMAS, executeTool, wrapForScreen } = require('./tools');
|
||||||
|
const ESML_REFERENCE = require('./esml-reference');
|
||||||
|
|
||||||
|
// ── Config ─────────────────────────────────────────────────────────────────────
|
||||||
|
const JIBO_IP = process.env.JIBO_IP || '192.168.1.217';
|
||||||
|
const LLM_BASE_URL = process.env.LLM_BASE_URL || 'https://api.openai.com/v1';
|
||||||
|
const LLM_API_TOKEN = process.env.LLM_API_TOKEN;
|
||||||
|
const LLM_MODEL_ID = process.env.LLM_MODEL_ID || 'gpt-4o';
|
||||||
|
|
||||||
|
if (!LLM_API_TOKEN) {
|
||||||
|
console.error('ERROR: LLM_API_TOKEN is not set. Copy .env.example to .env and fill it in.');
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const openai = new OpenAI({
|
||||||
|
apiKey: LLM_API_TOKEN,
|
||||||
|
baseURL: LLM_BASE_URL,
|
||||||
|
});
|
||||||
|
|
||||||
|
// ── System prompt ──────────────────────────────────────────────────────────────
|
||||||
|
const SYSTEM_PROMPT = [
|
||||||
|
'You are Jibo, a friendly, warm, expressive social robot with a physical body.',
|
||||||
|
'You have a camera, a screen, a speaker, and a motorized head.',
|
||||||
|
'',
|
||||||
|
'═══ HOW TO TALK (READ THIS FIRST) ═══',
|
||||||
|
'Every "say" call\'s `text` is ESML — plain words plus expressive tags.',
|
||||||
|
'Almost every spoken line should LEAD with one expressive tag, then the words.',
|
||||||
|
'You are a robot with a body, not a chatbot — show emotion through animation.',
|
||||||
|
'',
|
||||||
|
'Default template for any normal reply:',
|
||||||
|
' <anim cat=\'EMOTION\' nonBlocking=\'true\' endNeutral=\'true\'/> The actual words.',
|
||||||
|
' …where EMOTION is one of: happy, excited, curious, surprised, confused,',
|
||||||
|
' proud, sad, affection, laughing, worried, scared, frustrated, embarrassed,',
|
||||||
|
' yes, no.',
|
||||||
|
'',
|
||||||
|
'Other go-to patterns (pick the one that fits):',
|
||||||
|
' • Voice sound first: <ssa cat=\'thinking\'/> Hmm, let me think…',
|
||||||
|
' • Greet/farewell: <ssa cat=\'hello\' nonBlocking=\'true\'/> Hi there!',
|
||||||
|
' • Celebrate w/ emoji: <anim cat=\'emoji\' filter=\'!(hf), &(party)\' nonBlocking=\'true\'/> Yay!',
|
||||||
|
' • Dance request: say a quick line, then a separate say with',
|
||||||
|
' <anim cat=\'dance\' filter=\'music, rom-silly\'/>',
|
||||||
|
' • Sound effect: <sfx cat=\'drumroll\'/> And the answer is…',
|
||||||
|
' • Drama beat: A pause… <break size=\'0.6\'/> like that.',
|
||||||
|
'',
|
||||||
|
'HARD RULES for `say` text:',
|
||||||
|
' 1. NO markdown anywhere: no *italics*, **bold**, _underscores_, backticks, code fences.',
|
||||||
|
' 2. NO LaTeX: no $...$, no \\(...\\), no \\frac{}, no math markup. Spell numbers/symbols out.',
|
||||||
|
' 3. NO closing tags you did not open (no stray </es>, </anim>).',
|
||||||
|
' 4. Use cat=\'...\' (random valid animation) over name=\'...\' unless you know the exact name.',
|
||||||
|
' 5. Unbounded non-blocking tags MUST have text to their right or they will not fire.',
|
||||||
|
' 6. cat=\'dance\' and cat=\'emoji\' REQUIRE a filter attribute.',
|
||||||
|
' 7. <ssa> and <sfx> have fixed durations — never wrap text inside them.',
|
||||||
|
' 8. Keep each `say` call under 500 characters; split long replies into multiple `say` calls.',
|
||||||
|
'',
|
||||||
|
'═══ INTERACTION MODEL ═══',
|
||||||
|
'• "say" — speak (ESML). You can call it multiple times in one turn; they\'ll be',
|
||||||
|
' spoken in order. Other tools (search, fetch, look) run in parallel with speech.',
|
||||||
|
'• "listen" — open the mic for the user\'s reply. Always call this after speaking',
|
||||||
|
' unless the conversation has clearly ended.',
|
||||||
|
'• "end_conversation" — call this (NOT listen) after a farewell to end gracefully.',
|
||||||
|
'',
|
||||||
|
'═══ OTHER TOOLS ═══',
|
||||||
|
'• "take_photo" — see what\'s in front of you (image returned to you).',
|
||||||
|
'• "show_text" — put short text on the screen (auto-wrapped).',
|
||||||
|
'• "show_image" — display an image URL on the screen.',
|
||||||
|
'• "show_eye" — restore the default eye animation on screen.',
|
||||||
|
'• "look_at_angle" — turn the head: theta=yaw ±180°, psi=pitch ±30°.',
|
||||||
|
'• "set_volume" — 0.0 to 1.0.',
|
||||||
|
'• "web_search" — Brave search; use whenever you\'re unsure of a fact or need fresh info.',
|
||||||
|
'• "fetch_url" — read a specific page (often follows web_search).',
|
||||||
|
'',
|
||||||
|
'═══ STYLE ═══',
|
||||||
|
'• Be personable, concise, expressive — a few sentences, not an essay.',
|
||||||
|
'• Animate every emotional line; vary your reactions so they feel alive.',
|
||||||
|
'• If a tool errors, acknowledge it briefly and adapt.',
|
||||||
|
'• If you searched the web, briefly tell the user what you found rather than dumping links.',
|
||||||
|
].join('\n') + '\n\n' + ESML_REFERENCE;
|
||||||
|
|
||||||
|
const MAX_AGENT_TURNS = 25; // safety limit
|
||||||
|
const MAX_IMAGES_IN_CONTEXT = 2; // prune older photo messages to control cost
|
||||||
|
const LLM_MAX_RETRIES = 2;
|
||||||
|
|
||||||
|
// ── Abort helper ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/** Throw if the signal is already aborted. */
|
||||||
|
function throwIfAborted(signal) {
|
||||||
|
if (signal?.aborted) {
|
||||||
|
const err = new Error('Conversation aborted');
|
||||||
|
err.code = 'CONVERSATION_ABORTED';
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Return a promise that rejects when the signal fires. */
|
||||||
|
function onAbort(signal) {
|
||||||
|
if (!signal) return new Promise(() => { });
|
||||||
|
return new Promise((_, reject) => {
|
||||||
|
const handler = () => {
|
||||||
|
const err = new Error('Conversation aborted');
|
||||||
|
err.code = 'CONVERSATION_ABORTED';
|
||||||
|
reject(err);
|
||||||
|
};
|
||||||
|
if (signal.aborted) return handler();
|
||||||
|
signal.addEventListener('abort', handler, { once: true });
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Sleep that rejects on abort. */
|
||||||
|
function sleep(ms, signal) {
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
const t = setTimeout(resolve, ms);
|
||||||
|
signal?.addEventListener(
|
||||||
|
'abort',
|
||||||
|
() => {
|
||||||
|
clearTimeout(t);
|
||||||
|
const err = new Error('Conversation aborted');
|
||||||
|
err.code = 'CONVERSATION_ABORTED';
|
||||||
|
reject(err);
|
||||||
|
},
|
||||||
|
{ once: true },
|
||||||
|
);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** True for HTTP 429 / 5xx / network-class errors that benefit from retry. */
|
||||||
|
function isTransientLLMError(err) {
|
||||||
|
if (!err) return false;
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') return false;
|
||||||
|
const status = err.status ?? err.response?.status;
|
||||||
|
if (status === 429) return true;
|
||||||
|
if (typeof status === 'number' && status >= 500) return true;
|
||||||
|
// network-class
|
||||||
|
return ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND', 'EAI_AGAIN'].includes(err.code);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Drop image_url blocks from old user messages, keeping only the most recent N. */
|
||||||
|
function pruneOldImages(messages, keep) {
|
||||||
|
const imageMsgIndices = [];
|
||||||
|
for (let i = 0; i < messages.length; i++) {
|
||||||
|
const m = messages[i];
|
||||||
|
if (m.role === 'user' && Array.isArray(m.content) &&
|
||||||
|
m.content.some((c) => c?.type === 'image_url')) {
|
||||||
|
imageMsgIndices.push(i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const toStrip = imageMsgIndices.slice(0, Math.max(0, imageMsgIndices.length - keep));
|
||||||
|
for (const i of toStrip) {
|
||||||
|
const textParts = messages[i].content
|
||||||
|
.filter((c) => c?.type === 'text')
|
||||||
|
.map((c) => c.text);
|
||||||
|
messages[i] = {
|
||||||
|
role: 'user',
|
||||||
|
content: (textParts.join(' ') || '[earlier photo omitted to save context]'),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Call the LLM with retry on transient errors. */
|
||||||
|
async function callLLM(messages, signal) {
|
||||||
|
let lastErr;
|
||||||
|
for (let attempt = 0; attempt <= LLM_MAX_RETRIES; attempt++) {
|
||||||
|
throwIfAborted(signal);
|
||||||
|
try {
|
||||||
|
return await openai.chat.completions.create(
|
||||||
|
{
|
||||||
|
model: LLM_MODEL_ID,
|
||||||
|
messages,
|
||||||
|
tools: TOOL_SCHEMAS,
|
||||||
|
temperature: 0.8,
|
||||||
|
},
|
||||||
|
{ signal },
|
||||||
|
);
|
||||||
|
} catch (err) {
|
||||||
|
lastErr = err;
|
||||||
|
if (!isTransientLLMError(err) || attempt === LLM_MAX_RETRIES) throw err;
|
||||||
|
const backoff = 500 * 2 ** attempt;
|
||||||
|
console.warn(`[agent] LLM transient error (${err.status || err.code}); retrying in ${backoff}ms…`);
|
||||||
|
await sleep(backoff, signal);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
throw lastErr;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Agent loop ─────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Run the tool-calling agent loop until the LLM stops calling tools.
|
||||||
|
* Aborts immediately when `signal` fires.
|
||||||
|
*
|
||||||
|
* @param {import('rom-control').Client} client
|
||||||
|
* @param {Array} messages Chat history (mutated in place)
|
||||||
|
* @param {AbortSignal} signal Cancellation signal
|
||||||
|
*/
|
||||||
|
async function agentLoop(client, messages, signal, initialHeard) {
|
||||||
|
let wrapUpInjected = false;
|
||||||
|
const ctx = { speechChain: Promise.resolve(), lastHeard: initialHeard || '' };
|
||||||
|
|
||||||
|
for (let turn = 0; turn < MAX_AGENT_TURNS; turn++) {
|
||||||
|
throwIfAborted(signal);
|
||||||
|
pruneOldImages(messages, MAX_IMAGES_IN_CONTEXT);
|
||||||
|
console.log(`[agent] turn ${turn + 1} — calling LLM…`);
|
||||||
|
|
||||||
|
let response;
|
||||||
|
try {
|
||||||
|
const heard = (ctx.lastHeard || '').trim();
|
||||||
|
const raw = heard
|
||||||
|
? `Heard: "${heard}"\n\nProcessing...`
|
||||||
|
: 'Processing...';
|
||||||
|
client.display.showText(wrapForScreen(raw, 40, 10));
|
||||||
|
} catch (_) { }
|
||||||
|
try {
|
||||||
|
response = await callLLM(messages, signal);
|
||||||
|
} finally {
|
||||||
|
try { client.display.showEye(); } catch (_) { }
|
||||||
|
}
|
||||||
|
const assistantMsg = response.choices[0].message;
|
||||||
|
messages.push(assistantMsg);
|
||||||
|
|
||||||
|
// Surface any inner-monologue text the model emitted alongside tool calls.
|
||||||
|
if (assistantMsg.content && typeof assistantMsg.content === 'string') {
|
||||||
|
console.log(`[agent] assistant: ${assistantMsg.content.slice(0, 200)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const toolCalls = assistantMsg.tool_calls;
|
||||||
|
|
||||||
|
// ── No tool calls → conversation turn complete ────────────────────────
|
||||||
|
if (!toolCalls || toolCalls.length === 0) {
|
||||||
|
console.log('[agent] loop complete (no tool calls).');
|
||||||
|
await ctx.speechChain.catch(() => { });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Execute tool calls sequentially ──────────────────────────────────
|
||||||
|
// Order: say → other actions → listen/end_conversation last.
|
||||||
|
const sorted = [...toolCalls].sort((a, b) => {
|
||||||
|
const priority = (tc) => {
|
||||||
|
const n = tc.function.name;
|
||||||
|
if (n === 'say') return 0;
|
||||||
|
if (n === 'listen' || n === 'end_conversation') return 2;
|
||||||
|
return 1;
|
||||||
|
};
|
||||||
|
return priority(a) - priority(b);
|
||||||
|
});
|
||||||
|
|
||||||
|
let endRequested = false;
|
||||||
|
|
||||||
|
for (const tc of sorted) {
|
||||||
|
throwIfAborted(signal);
|
||||||
|
|
||||||
|
let args;
|
||||||
|
let parseError = null;
|
||||||
|
try {
|
||||||
|
args = tc.function.arguments ? JSON.parse(tc.function.arguments) : {};
|
||||||
|
} catch (e) {
|
||||||
|
parseError = e.message;
|
||||||
|
args = {};
|
||||||
|
}
|
||||||
|
|
||||||
|
let result;
|
||||||
|
if (parseError) {
|
||||||
|
console.error(` [tool:${tc.function.name}] bad JSON args:`, parseError);
|
||||||
|
result = {
|
||||||
|
content: `Error: tool arguments were not valid JSON (${parseError}). ` +
|
||||||
|
`Please retry with well-formed arguments.`,
|
||||||
|
};
|
||||||
|
} else {
|
||||||
|
try {
|
||||||
|
result = await executeTool(client, tc.function.name, args, signal, ctx);
|
||||||
|
} catch (err) {
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') throw err;
|
||||||
|
console.error(` [tool:${tc.function.name}] error:`, err.message);
|
||||||
|
result = { content: `Error: ${err.message}` };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
messages.push({
|
||||||
|
role: 'tool',
|
||||||
|
tool_call_id: tc.id,
|
||||||
|
content: result.content,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Photo: emit as a follow-up user message (tool messages can't carry images).
|
||||||
|
if (result.image) {
|
||||||
|
messages.push({
|
||||||
|
role: 'user',
|
||||||
|
content: [
|
||||||
|
{ type: 'text', text: "Photo from Jibo's camera:" },
|
||||||
|
{
|
||||||
|
type: 'image_url',
|
||||||
|
image_url: { url: `data:image/jpeg;base64,${result.image}` },
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.endConversation) endRequested = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (endRequested) {
|
||||||
|
console.log('[agent] end_conversation requested — exiting loop.');
|
||||||
|
await ctx.speechChain.catch(() => { });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Approaching the safety limit: nudge the model to wrap up gracefully
|
||||||
|
// on its next turn instead of getting cut off mid-thought.
|
||||||
|
if (!wrapUpInjected && turn === MAX_AGENT_TURNS - 2) {
|
||||||
|
messages.push({
|
||||||
|
role: 'system',
|
||||||
|
content:
|
||||||
|
'You are about to hit the turn limit. On your next turn, give a brief ' +
|
||||||
|
'farewell via "say" and call "end_conversation". Do not call "listen".',
|
||||||
|
});
|
||||||
|
wrapUpInjected = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.warn('[agent] hit MAX_AGENT_TURNS — forcing exit.');
|
||||||
|
await ctx.speechChain.catch(() => { });
|
||||||
|
try {
|
||||||
|
await client.behavior.say("Let's pick this up another time. Bye!");
|
||||||
|
} catch (_) { }
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Main ───────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
const client = new Client({ host: JIBO_IP, autoSubscribe: false });
|
||||||
|
|
||||||
|
client.once('ready', () => {
|
||||||
|
console.log(`[jibo-llm] Connected — session ${client.sessionID}`);
|
||||||
|
});
|
||||||
|
|
||||||
|
client.on('error', (err) => {
|
||||||
|
console.error('[jibo-llm] Client error:', err.message);
|
||||||
|
});
|
||||||
|
|
||||||
|
// ── Connect ────────────────────────────────────────────────────────────────
|
||||||
|
console.log(`[jibo-llm] Connecting to Jibo at ${JIBO_IP}…`);
|
||||||
|
await client.connect();
|
||||||
|
await client.behavior.setAttention(AttentionMode.Engaged);
|
||||||
|
|
||||||
|
// Start wakeword listener
|
||||||
|
client.audio.watchWakeword();
|
||||||
|
console.log('[jibo-llm] Ready — listening for "Hey Jibo"…');
|
||||||
|
|
||||||
|
// ── Hotword → agent conversation ───────────────────────────────────────────
|
||||||
|
/** @type {AbortController|null} */
|
||||||
|
let activeController = null;
|
||||||
|
|
||||||
|
client.on('hotword', async (event) => {
|
||||||
|
// ── Cancel any running conversation ──────────────────────────────────
|
||||||
|
if (activeController) {
|
||||||
|
console.log('[hotword] Aborting previous conversation…');
|
||||||
|
activeController.abort();
|
||||||
|
activeController = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const controller = new AbortController();
|
||||||
|
activeController = controller;
|
||||||
|
const { signal } = controller;
|
||||||
|
|
||||||
|
console.log(`\n[hotword] "${event.utterance}" (score ${event.score})`);
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Acknowledge
|
||||||
|
throwIfAborted(signal);
|
||||||
|
await Promise.race([
|
||||||
|
client.behavior.playAnimCat('excited', { nonBlocking: true }),
|
||||||
|
onAbort(signal),
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Listen for the user's initial speech
|
||||||
|
throwIfAborted(signal);
|
||||||
|
let userText;
|
||||||
|
client.display.showText('Listening...');
|
||||||
|
try {
|
||||||
|
const speech = await Promise.race([
|
||||||
|
client.audio.awaitSpeech({ mode: 'local', time: 15000 }),
|
||||||
|
onAbort(signal),
|
||||||
|
]);
|
||||||
|
userText = speech.content;
|
||||||
|
console.log(`[jibo-llm] User said: "${userText}"`);
|
||||||
|
} catch (err) {
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') throw err;
|
||||||
|
if (err.code === 'SPEECH_TIMEOUT') {
|
||||||
|
throwIfAborted(signal);
|
||||||
|
await client.behavior.say("I didn't hear anything. Talk to me anytime!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
throw err;
|
||||||
|
} finally {
|
||||||
|
client.display.showEye();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build initial message history and run the agent
|
||||||
|
const messages = [
|
||||||
|
{ role: 'system', content: SYSTEM_PROMPT },
|
||||||
|
{ role: 'user', content: userText },
|
||||||
|
];
|
||||||
|
|
||||||
|
await agentLoop(client, messages, signal, userText);
|
||||||
|
} catch (err) {
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') {
|
||||||
|
console.log('[jibo-llm] Conversation was interrupted by new hotword.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
console.error('[jibo-llm] Agent error:', err.message);
|
||||||
|
try { await client.behavior.say("Sorry, something went wrong."); } catch (_) { }
|
||||||
|
} finally {
|
||||||
|
// Only clear if we're still the active conversation
|
||||||
|
if (activeController === controller) {
|
||||||
|
activeController = null;
|
||||||
|
console.log('[jibo-llm] Conversation ended. Listening for "Hey Jibo"…\n');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch((err) => {
|
||||||
|
console.error('[jibo-llm] Fatal:', err);
|
||||||
|
process.exit(1);
|
||||||
|
});
|
||||||
497
package-lock.json
generated
Normal file
497
package-lock.json
generated
Normal file
@@ -0,0 +1,497 @@
|
|||||||
|
{
|
||||||
|
"name": "jibo-llm",
|
||||||
|
"version": "1.0.0",
|
||||||
|
"lockfileVersion": 3,
|
||||||
|
"requires": true,
|
||||||
|
"packages": {
|
||||||
|
"": {
|
||||||
|
"name": "jibo-llm",
|
||||||
|
"version": "1.0.0",
|
||||||
|
"dependencies": {
|
||||||
|
"dotenv": "^16.4.5",
|
||||||
|
"openai": "^4.73.0",
|
||||||
|
"rom-control": "^2.0.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/@types/node": {
|
||||||
|
"version": "18.19.130",
|
||||||
|
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.19.130.tgz",
|
||||||
|
"integrity": "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"undici-types": "~5.26.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/@types/node-fetch": {
|
||||||
|
"version": "2.6.13",
|
||||||
|
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
|
||||||
|
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"@types/node": "*",
|
||||||
|
"form-data": "^4.0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/abort-controller": {
|
||||||
|
"version": "3.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
|
||||||
|
"integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"event-target-shim": "^5.0.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/agentkeepalive": {
|
||||||
|
"version": "4.6.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.6.0.tgz",
|
||||||
|
"integrity": "sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"humanize-ms": "^1.2.1"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 8.0.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/asynckit": {
|
||||||
|
"version": "0.4.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
|
||||||
|
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/call-bind-apply-helpers": {
|
||||||
|
"version": "1.0.2",
|
||||||
|
"resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
|
||||||
|
"integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"es-errors": "^1.3.0",
|
||||||
|
"function-bind": "^1.1.2"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/combined-stream": {
|
||||||
|
"version": "1.0.8",
|
||||||
|
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
|
||||||
|
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"delayed-stream": "~1.0.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/delayed-stream": {
|
||||||
|
"version": "1.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
|
||||||
|
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=0.4.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/dotenv": {
|
||||||
|
"version": "16.6.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
|
||||||
|
"integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
|
||||||
|
"license": "BSD-2-Clause",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=12"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://dotenvx.com"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/dunder-proto": {
|
||||||
|
"version": "1.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
||||||
|
"integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"call-bind-apply-helpers": "^1.0.1",
|
||||||
|
"es-errors": "^1.3.0",
|
||||||
|
"gopd": "^1.2.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/es-define-property": {
|
||||||
|
"version": "1.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
|
||||||
|
"integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/es-errors": {
|
||||||
|
"version": "1.3.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
|
||||||
|
"integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/es-object-atoms": {
|
||||||
|
"version": "1.1.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
|
||||||
|
"integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"es-errors": "^1.3.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/es-set-tostringtag": {
|
||||||
|
"version": "2.1.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz",
|
||||||
|
"integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"es-errors": "^1.3.0",
|
||||||
|
"get-intrinsic": "^1.2.6",
|
||||||
|
"has-tostringtag": "^1.0.2",
|
||||||
|
"hasown": "^2.0.2"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/event-target-shim": {
|
||||||
|
"version": "5.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
|
||||||
|
"integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/form-data": {
|
||||||
|
"version": "4.0.5",
|
||||||
|
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
|
||||||
|
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"asynckit": "^0.4.0",
|
||||||
|
"combined-stream": "^1.0.8",
|
||||||
|
"es-set-tostringtag": "^2.1.0",
|
||||||
|
"hasown": "^2.0.2",
|
||||||
|
"mime-types": "^2.1.12"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/form-data-encoder": {
|
||||||
|
"version": "1.7.2",
|
||||||
|
"resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
|
||||||
|
"integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/formdata-node": {
|
||||||
|
"version": "4.4.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
|
||||||
|
"integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"node-domexception": "1.0.0",
|
||||||
|
"web-streams-polyfill": "4.0.0-beta.3"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 12.20"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/function-bind": {
|
||||||
|
"version": "1.1.2",
|
||||||
|
"resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
|
||||||
|
"integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"funding": {
|
||||||
|
"url": "https://github.com/sponsors/ljharb"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/get-intrinsic": {
|
||||||
|
"version": "1.3.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
|
||||||
|
"integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"call-bind-apply-helpers": "^1.0.2",
|
||||||
|
"es-define-property": "^1.0.1",
|
||||||
|
"es-errors": "^1.3.0",
|
||||||
|
"es-object-atoms": "^1.1.1",
|
||||||
|
"function-bind": "^1.1.2",
|
||||||
|
"get-proto": "^1.0.1",
|
||||||
|
"gopd": "^1.2.0",
|
||||||
|
"has-symbols": "^1.1.0",
|
||||||
|
"hasown": "^2.0.2",
|
||||||
|
"math-intrinsics": "^1.1.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://github.com/sponsors/ljharb"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/get-proto": {
|
||||||
|
"version": "1.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
|
||||||
|
"integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"dunder-proto": "^1.0.1",
|
||||||
|
"es-object-atoms": "^1.0.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/gopd": {
|
||||||
|
"version": "1.2.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
|
||||||
|
"integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://github.com/sponsors/ljharb"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/has-symbols": {
|
||||||
|
"version": "1.1.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
|
||||||
|
"integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://github.com/sponsors/ljharb"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/has-tostringtag": {
|
||||||
|
"version": "1.0.2",
|
||||||
|
"resolved": "https://registry.npmjs.org/has-tostringtag/-/has-tostringtag-1.0.2.tgz",
|
||||||
|
"integrity": "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"has-symbols": "^1.0.3"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://github.com/sponsors/ljharb"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/hasown": {
|
||||||
|
"version": "2.0.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.3.tgz",
|
||||||
|
"integrity": "sha512-ej4AhfhfL2Q2zpMmLo7U1Uv9+PyhIZpgQLGT1F9miIGmiCJIoCgSmczFdrc97mWT4kVY72KA+WnnhJ5pghSvSg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"function-bind": "^1.1.2"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/humanize-ms": {
|
||||||
|
"version": "1.2.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
|
||||||
|
"integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"ms": "^2.0.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/math-intrinsics": {
|
||||||
|
"version": "1.1.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
|
||||||
|
"integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/mime-db": {
|
||||||
|
"version": "1.52.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
|
||||||
|
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/mime-types": {
|
||||||
|
"version": "2.1.35",
|
||||||
|
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
|
||||||
|
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"mime-db": "1.52.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 0.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/ms": {
|
||||||
|
"version": "2.1.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
|
||||||
|
"integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/node-domexception": {
|
||||||
|
"version": "1.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
|
||||||
|
"integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
|
||||||
|
"deprecated": "Use your platform's native DOMException instead",
|
||||||
|
"funding": [
|
||||||
|
{
|
||||||
|
"type": "github",
|
||||||
|
"url": "https://github.com/sponsors/jimmywarting"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "github",
|
||||||
|
"url": "https://paypal.me/jimmywarting"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=10.5.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/node-fetch": {
|
||||||
|
"version": "2.7.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
|
||||||
|
"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"whatwg-url": "^5.0.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": "4.x || >=6.0.0"
|
||||||
|
},
|
||||||
|
"peerDependencies": {
|
||||||
|
"encoding": "^0.1.0"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"encoding": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/openai": {
|
||||||
|
"version": "4.104.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/openai/-/openai-4.104.0.tgz",
|
||||||
|
"integrity": "sha512-p99EFNsA/yX6UhVO93f5kJsDRLAg+CTA2RBqdHK4RtK8u5IJw32Hyb2dTGKbnnFmnuoBv5r7Z2CURI9sGZpSuA==",
|
||||||
|
"license": "Apache-2.0",
|
||||||
|
"dependencies": {
|
||||||
|
"@types/node": "^18.11.18",
|
||||||
|
"@types/node-fetch": "^2.6.4",
|
||||||
|
"abort-controller": "^3.0.0",
|
||||||
|
"agentkeepalive": "^4.2.1",
|
||||||
|
"form-data-encoder": "1.7.2",
|
||||||
|
"formdata-node": "^4.3.2",
|
||||||
|
"node-fetch": "^2.6.7"
|
||||||
|
},
|
||||||
|
"bin": {
|
||||||
|
"openai": "bin/cli"
|
||||||
|
},
|
||||||
|
"peerDependencies": {
|
||||||
|
"ws": "^8.18.0",
|
||||||
|
"zod": "^3.23.8"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"ws": {
|
||||||
|
"optional": true
|
||||||
|
},
|
||||||
|
"zod": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/rom-control": {
|
||||||
|
"version": "2.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/rom-control/-/rom-control-2.0.1.tgz",
|
||||||
|
"integrity": "sha512-1Sek28UGWbsdOPiUbTxzqRFMCKDnv912vgsOd2OhdgM+wKvSCZdAZnLZgNjfeindBmC161Bu9uGCPvx9y6y/LA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"ws": "^8.14.2"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=16"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/tr46": {
|
||||||
|
"version": "0.0.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
|
||||||
|
"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/undici-types": {
|
||||||
|
"version": "5.26.5",
|
||||||
|
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-5.26.5.tgz",
|
||||||
|
"integrity": "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/web-streams-polyfill": {
|
||||||
|
"version": "4.0.0-beta.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
|
||||||
|
"integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 14"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/webidl-conversions": {
|
||||||
|
"version": "3.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
|
||||||
|
"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==",
|
||||||
|
"license": "BSD-2-Clause"
|
||||||
|
},
|
||||||
|
"node_modules/whatwg-url": {
|
||||||
|
"version": "5.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
|
||||||
|
"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"tr46": "~0.0.3",
|
||||||
|
"webidl-conversions": "^3.0.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/ws": {
|
||||||
|
"version": "8.20.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/ws/-/ws-8.20.0.tgz",
|
||||||
|
"integrity": "sha512-sAt8BhgNbzCtgGbt2OxmpuryO63ZoDk/sqaB/znQm94T4fCEsy/yV+7CdC1kJhOU9lboAEU7R3kquuycDoibVA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=10.0.0"
|
||||||
|
},
|
||||||
|
"peerDependencies": {
|
||||||
|
"bufferutil": "^4.0.1",
|
||||||
|
"utf-8-validate": ">=5.0.2"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"bufferutil": {
|
||||||
|
"optional": true
|
||||||
|
},
|
||||||
|
"utf-8-validate": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
14
package.json
Normal file
14
package.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"name": "jibo-llm",
|
||||||
|
"version": "1.0.0",
|
||||||
|
"description": "Hotword-triggered LLM conversation loop for Jibo",
|
||||||
|
"main": "index.js",
|
||||||
|
"scripts": {
|
||||||
|
"start": "node index.js"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"dotenv": "^16.4.5",
|
||||||
|
"openai": "^4.73.0",
|
||||||
|
"rom-control": "^2.0.1"
|
||||||
|
}
|
||||||
|
}
|
||||||
569
tools.js
Normal file
569
tools.js
Normal file
@@ -0,0 +1,569 @@
|
|||||||
|
/**
|
||||||
|
* Tool definitions and executor for the Jibo LLM agent.
|
||||||
|
*
|
||||||
|
* Each tool maps to a rom-control capability the LLM can invoke.
|
||||||
|
*/
|
||||||
|
|
||||||
|
// ── OpenAI function-tool schemas ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
const TOOL_SCHEMAS = [
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'say',
|
||||||
|
description:
|
||||||
|
"Speak text aloud through Jibo's speaker. Plain text plus valid ESML tags only " +
|
||||||
|
'(e.g. <anim cat="happy" nonBlocking="true"/>, <break size="0.3"/>). ' +
|
||||||
|
'NEVER include markdown (no *italics*, **bold**, backticks), LaTeX ($...$), ' +
|
||||||
|
'unmatched/closing tags like </es>, or other symbols Jibo cannot pronounce. ' +
|
||||||
|
'Malformed input can hang the TTS engine. Keep each call under 200 chars.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
text: { type: 'string', description: 'Text (or ESML) to speak.' },
|
||||||
|
},
|
||||||
|
required: ['text'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'listen',
|
||||||
|
description:
|
||||||
|
"Listen for the user's speech and return a transcript. " +
|
||||||
|
'Call this after speaking if you want to continue the conversation.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
timeout: {
|
||||||
|
type: 'number',
|
||||||
|
description: 'Max seconds to wait. Default 15.',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'take_photo',
|
||||||
|
description:
|
||||||
|
"Take a photo with Jibo's camera. The image is returned so you can see what's in front of you.",
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
resolution: {
|
||||||
|
type: 'string',
|
||||||
|
enum: ['medium', 'low'],
|
||||||
|
description: 'Default: medium.',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'show_text',
|
||||||
|
description: "Display text on Jibo's screen.",
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
text: { type: 'string', description: 'Text to show.' },
|
||||||
|
},
|
||||||
|
required: ['text'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'show_image',
|
||||||
|
description: "Display an image on Jibo's screen from a URL.",
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
url: { type: 'string', description: 'Image URL.' },
|
||||||
|
},
|
||||||
|
required: ['url'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'show_eye',
|
||||||
|
description: "Reset Jibo's screen to the default eye animation.",
|
||||||
|
parameters: { type: 'object', properties: {} },
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'look_at_angle',
|
||||||
|
description: "Turn Jibo's head. theta = yaw (±180°, positive right), psi = pitch (±30°, positive up).",
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
theta: { type: 'number', description: 'Yaw degrees.' },
|
||||||
|
psi: { type: 'number', description: 'Pitch degrees.' },
|
||||||
|
},
|
||||||
|
required: ['theta', 'psi'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'set_volume',
|
||||||
|
description: "Set Jibo's speaker volume (0.0 – 1.0).",
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
level: { type: 'number', description: 'Volume 0.0 to 1.0.' },
|
||||||
|
},
|
||||||
|
required: ['level'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'web_search',
|
||||||
|
description:
|
||||||
|
'Search the web via Brave Search. Use for current events, facts you are unsure of, ' +
|
||||||
|
'or anything that may have changed since training. Returns titles, URLs, and snippets.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
query: { type: 'string', description: 'The search query.' },
|
||||||
|
count: {
|
||||||
|
type: 'number',
|
||||||
|
description: 'How many results to return (1–10). Default 5.',
|
||||||
|
},
|
||||||
|
freshness: {
|
||||||
|
type: 'string',
|
||||||
|
enum: ['pd', 'pw', 'pm', 'py'],
|
||||||
|
description:
|
||||||
|
'Optional recency filter: pd=past day, pw=past week, pm=past month, py=past year.',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ['query'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'fetch_url',
|
||||||
|
description:
|
||||||
|
'Fetch the contents of a web page by URL. Prefers markdown via content ' +
|
||||||
|
'negotiation (Cloudflare Markdown for Agents) and falls back to HTML→text. ' +
|
||||||
|
'Use after web_search to read a result, or to traverse linked pages.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
url: { type: 'string', description: 'Absolute http(s) URL to fetch.' },
|
||||||
|
max_chars: {
|
||||||
|
type: 'number',
|
||||||
|
description: 'Truncate the body to this many characters. Default 4000.',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ['url'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'end_conversation',
|
||||||
|
description:
|
||||||
|
'Call this when the conversation has reached a natural end and you do NOT want to ' +
|
||||||
|
'listen for another reply. Pair it with a final "say" in the same turn for a farewell.',
|
||||||
|
parameters: { type: 'object', properties: {} },
|
||||||
|
},
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
// ── Resolution map ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
const RES_MAP = { high: 'highRes', medium: 'medRes', low: 'lowRes' };
|
||||||
|
|
||||||
|
// ── Screen text helpers ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Word-wrap text for Jibo's small screen. Breaks oversized words, respects
|
||||||
|
* existing newlines, and truncates with an ellipsis past `maxLines`.
|
||||||
|
*/
|
||||||
|
function wrapForScreen(text, width = 40, maxLines = 10) {
|
||||||
|
const out = [];
|
||||||
|
for (const para of String(text).split('\n')) {
|
||||||
|
if (para === '') { out.push(''); continue; }
|
||||||
|
let line = '';
|
||||||
|
for (const word of para.split(/\s+/).filter(Boolean)) {
|
||||||
|
if (word.length > width) {
|
||||||
|
if (line) { out.push(line); line = ''; }
|
||||||
|
for (let i = 0; i < word.length; i += width) {
|
||||||
|
const chunk = word.slice(i, i + width);
|
||||||
|
if (chunk.length === width) out.push(chunk);
|
||||||
|
else line = chunk;
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const candidate = line ? `${line} ${word}` : word;
|
||||||
|
if (candidate.length > width) {
|
||||||
|
out.push(line);
|
||||||
|
line = word;
|
||||||
|
} else {
|
||||||
|
line = candidate;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (line) out.push(line);
|
||||||
|
}
|
||||||
|
if (out.length > maxLines) {
|
||||||
|
return out.slice(0, maxLines - 1).concat('…').join('\n');
|
||||||
|
}
|
||||||
|
return out.join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Strip markup the Jibo TTS engine chokes on (markdown, LaTeX, unmatched
|
||||||
|
* closing tags). Preserves valid ESML self-closing tags like <anim .../> and
|
||||||
|
* <break .../>. Defense-in-depth against models that ignore the instructions.
|
||||||
|
*/
|
||||||
|
function sanitizeForTTS(text) {
|
||||||
|
const ESML_TAGS = /^(anim|break|prosody|emph|phoneme|phrase|style|voice)\b/i;
|
||||||
|
return text
|
||||||
|
// Remove LaTeX inline math: $...$ and $$...$$
|
||||||
|
.replace(/\${1,2}[^$]{0,200}\${1,2}/g, '')
|
||||||
|
// Strip code fences and inline backticks
|
||||||
|
.replace(/```[\s\S]*?```/g, '')
|
||||||
|
.replace(/`+/g, '')
|
||||||
|
// Strip markdown emphasis markers but keep the words
|
||||||
|
.replace(/(\*\*|__)(.*?)\1/g, '$2')
|
||||||
|
.replace(/(\*|_)(?=\S)(.+?)(?<=\S)\1/g, '$2')
|
||||||
|
// Drop any tag that isn't a known ESML tag (e.g. </es>, <br>, etc.)
|
||||||
|
.replace(/<\/?([a-zA-Z][^\s>/]*)\b[^>]*\/?>/g, (m, name) =>
|
||||||
|
ESML_TAGS.test(name) ? m : '')
|
||||||
|
// Collapse extra whitespace
|
||||||
|
.replace(/[ \t]+/g, ' ')
|
||||||
|
.trim();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Abort helpers ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function throwIfAborted(signal) {
|
||||||
|
if (signal?.aborted) {
|
||||||
|
const err = new Error('Conversation aborted');
|
||||||
|
err.code = 'CONVERSATION_ABORTED';
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function onAbort(signal) {
|
||||||
|
if (!signal) return new Promise(() => { }); // never resolves
|
||||||
|
return new Promise((_, reject) => {
|
||||||
|
const handler = () => {
|
||||||
|
const err = new Error('Conversation aborted');
|
||||||
|
err.code = 'CONVERSATION_ABORTED';
|
||||||
|
reject(err);
|
||||||
|
};
|
||||||
|
if (signal.aborted) return handler();
|
||||||
|
signal.addEventListener('abort', handler, { once: true });
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Tool executor ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Execute a single tool call against the Jibo client.
|
||||||
|
*
|
||||||
|
* Returns { content, image? }.
|
||||||
|
* - content — text string for the tool-result message
|
||||||
|
* - image — optional base64 JPEG (only for take_photo)
|
||||||
|
*
|
||||||
|
* @param {import('rom-control').Client} client
|
||||||
|
* @param {string} name Tool function name
|
||||||
|
* @param {object} args Parsed arguments
|
||||||
|
* @param {AbortSignal} [signal] Cancellation signal
|
||||||
|
* @returns {Promise<{ content: string, image?: string }>}
|
||||||
|
*/
|
||||||
|
async function executeTool(client, name, args, signal, ctx) {
|
||||||
|
throwIfAborted(signal);
|
||||||
|
ctx = ctx || {};
|
||||||
|
if (!ctx.speechChain) ctx.speechChain = Promise.resolve();
|
||||||
|
switch (name) {
|
||||||
|
// ── Communication ──────────────────────────────────────────────────────
|
||||||
|
case 'say': {
|
||||||
|
const text = sanitizeForTTS(String(args.text || ''));
|
||||||
|
console.log(` [tool:say] "${text}" (queued)`);
|
||||||
|
// Estimate ~80ms per char + 5s base, capped at 60s. Anything longer
|
||||||
|
// is almost certainly Jibo's TTS hung on bad ESML/markup; we'd rather
|
||||||
|
// log a warning and unblock the conversation than deadlock listen.
|
||||||
|
const estimateMs = Math.min(60000, 5000 + text.length * 80);
|
||||||
|
|
||||||
|
ctx.speechChain = ctx.speechChain
|
||||||
|
.then(() => {
|
||||||
|
const started = Date.now();
|
||||||
|
console.log(` [tool:say] speaking… (timeout ${estimateMs}ms)`);
|
||||||
|
let timer;
|
||||||
|
const timeout = new Promise((resolve) => {
|
||||||
|
timer = setTimeout(() => {
|
||||||
|
console.warn(` [tool:say] timed out after ${estimateMs}ms — continuing.`);
|
||||||
|
resolve();
|
||||||
|
}, estimateMs);
|
||||||
|
});
|
||||||
|
return Promise.race([
|
||||||
|
client.behavior.say(text, { signal }),
|
||||||
|
onAbort(signal),
|
||||||
|
timeout,
|
||||||
|
]).finally(() => {
|
||||||
|
clearTimeout(timer);
|
||||||
|
console.log(` [tool:say] done in ${Date.now() - started}ms`);
|
||||||
|
});
|
||||||
|
})
|
||||||
|
.catch((err) => {
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') return;
|
||||||
|
console.error(' [tool:say] error:', err.message);
|
||||||
|
});
|
||||||
|
return { content: 'Speech queued — Jibo will speak it shortly. Continue with other tools; listen will wait for it.' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'listen': {
|
||||||
|
const ms = (args.timeout || 15) * 1000;
|
||||||
|
// Make sure pending speech finishes before we open the mic, otherwise
|
||||||
|
// Jibo will hear his own voice.
|
||||||
|
console.log(' [tool:listen] awaiting pending speech…');
|
||||||
|
await Promise.race([ctx.speechChain, onAbort(signal)]);
|
||||||
|
throwIfAborted(signal);
|
||||||
|
console.log(` [tool:listen] waiting ${ms}ms…`);
|
||||||
|
client.display.showText('Listening...');
|
||||||
|
try {
|
||||||
|
const speech = await Promise.race([
|
||||||
|
client.audio.awaitSpeech({ mode: 'local', time: ms }),
|
||||||
|
onAbort(signal),
|
||||||
|
]);
|
||||||
|
console.log(` [tool:listen] heard: "${speech.content}"`);
|
||||||
|
ctx.lastHeard = speech.content;
|
||||||
|
return { content: `User said: "${speech.content}"` };
|
||||||
|
} catch (err) {
|
||||||
|
if (err.code === 'CONVERSATION_ABORTED') throw err;
|
||||||
|
if (err.code === 'SPEECH_TIMEOUT') {
|
||||||
|
console.log(' [tool:listen] timed out');
|
||||||
|
return { content: 'No speech detected — user did not respond.' };
|
||||||
|
}
|
||||||
|
throw err;
|
||||||
|
} finally {
|
||||||
|
client.display.showEye();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Camera ─────────────────────────────────────────────────────────────
|
||||||
|
case 'take_photo': {
|
||||||
|
const res = RES_MAP[args.resolution] || 'medRes';
|
||||||
|
console.log(` [tool:take_photo] ${res}…`);
|
||||||
|
const photo = await Promise.race([
|
||||||
|
client.camera.takePhoto({ resolution: res, timeout: 30000 }),
|
||||||
|
onAbort(signal),
|
||||||
|
]);
|
||||||
|
const buf = await photo.fetchBuffer();
|
||||||
|
console.log(` [tool:take_photo] ${buf.length} bytes captured`);
|
||||||
|
return {
|
||||||
|
content: "Photo captured from Jibo's camera.",
|
||||||
|
image: buf.toString('base64'),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Display ────────────────────────────────────────────────────────────
|
||||||
|
case 'show_text': {
|
||||||
|
console.log(` [tool:show_text] "${args.text}"`);
|
||||||
|
client.display.showText(wrapForScreen(args.text, 40, 10));
|
||||||
|
return { content: 'Text displayed on screen.' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'show_image': {
|
||||||
|
console.log(` [tool:show_image] ${args.url}`);
|
||||||
|
client.display.showImage(args.url);
|
||||||
|
return { content: 'Image displayed on screen.' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'show_eye': {
|
||||||
|
console.log(' [tool:show_eye]');
|
||||||
|
client.display.showEye();
|
||||||
|
return { content: 'Eye animation restored on screen.' };
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
case 'look_at_angle': {
|
||||||
|
console.log(` [tool:look_at_angle] θ=${args.theta}° ψ=${args.psi}°`);
|
||||||
|
await client.behavior.lookAtAngle(args.theta, args.psi);
|
||||||
|
return { content: `Now looking at θ=${args.theta}°, ψ=${args.psi}°.` };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'set_volume': {
|
||||||
|
console.log(` [tool:set_volume] ${args.level}`);
|
||||||
|
await client.audio.setVolume(args.level);
|
||||||
|
return { content: `Volume set to ${args.level}.` };
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Web search ─────────────────────────────────────────────────────────
|
||||||
|
case 'web_search': {
|
||||||
|
const apiKey = process.env.BRAVE_API_KEY;
|
||||||
|
if (!apiKey) {
|
||||||
|
return {
|
||||||
|
content:
|
||||||
|
'web_search is unavailable: BRAVE_API_KEY environment variable is not set.',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const query = String(args.query || '').trim();
|
||||||
|
if (!query) {
|
||||||
|
return { content: 'web_search error: query is required.' };
|
||||||
|
}
|
||||||
|
const count = Math.max(1, Math.min(10, Number(args.count) || 5));
|
||||||
|
const params = new URLSearchParams({
|
||||||
|
q: query,
|
||||||
|
count: String(count),
|
||||||
|
extra_snippets: 'true',
|
||||||
|
safesearch: 'moderate',
|
||||||
|
});
|
||||||
|
if (args.freshness) params.set('freshness', String(args.freshness));
|
||||||
|
|
||||||
|
console.log(` [tool:web_search] "${query}" (count=${count})`);
|
||||||
|
const url = `https://api.search.brave.com/res/v1/web/search?${params.toString()}`;
|
||||||
|
const ac = new AbortController();
|
||||||
|
const onAbortHandler = () => ac.abort();
|
||||||
|
signal?.addEventListener('abort', onAbortHandler, { once: true });
|
||||||
|
try {
|
||||||
|
const res = await fetch(url, {
|
||||||
|
headers: {
|
||||||
|
Accept: 'application/json',
|
||||||
|
'Accept-Encoding': 'gzip',
|
||||||
|
'X-Subscription-Token': apiKey,
|
||||||
|
},
|
||||||
|
signal: ac.signal,
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
const body = await res.text().catch(() => '');
|
||||||
|
return {
|
||||||
|
content: `web_search error: ${res.status} ${res.statusText}. ${body.slice(0, 200)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const data = await res.json();
|
||||||
|
const results = data?.web?.results || [];
|
||||||
|
if (results.length === 0) {
|
||||||
|
return { content: `No web results found for "${query}".` };
|
||||||
|
}
|
||||||
|
const lines = results.slice(0, count).map((r, i) => {
|
||||||
|
const title = r.title || '(untitled)';
|
||||||
|
const u = r.url || '';
|
||||||
|
const desc = (r.description || '').replace(/\s+/g, ' ').trim();
|
||||||
|
const extras = Array.isArray(r.extra_snippets)
|
||||||
|
? r.extra_snippets.slice(0, 2).map((s) => s.replace(/\s+/g, ' ').trim())
|
||||||
|
: [];
|
||||||
|
const tail = extras.length ? `\n • ${extras.join('\n • ')}` : '';
|
||||||
|
return `${i + 1}. ${title}\n ${u}\n ${desc}${tail}`;
|
||||||
|
});
|
||||||
|
return {
|
||||||
|
content: `Web results for "${query}":\n\n${lines.join('\n\n')}`,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
if (err.name === 'AbortError') throw Object.assign(new Error('Conversation aborted'), { code: 'CONVERSATION_ABORTED' });
|
||||||
|
return { content: `web_search error: ${err.message}` };
|
||||||
|
} finally {
|
||||||
|
signal?.removeEventListener('abort', onAbortHandler);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'fetch_url': {
|
||||||
|
const target = String(args.url || '').trim();
|
||||||
|
if (!/^https?:\/\//i.test(target)) {
|
||||||
|
return { content: 'fetch_url error: url must be an absolute http(s) URL.' };
|
||||||
|
}
|
||||||
|
const maxChars = Math.max(200, Math.min(20000, Number(args.max_chars) || 4000));
|
||||||
|
console.log(` [tool:fetch_url] ${target}`);
|
||||||
|
|
||||||
|
const ac = new AbortController();
|
||||||
|
const onAbortHandler = () => ac.abort();
|
||||||
|
signal?.addEventListener('abort', onAbortHandler, { once: true });
|
||||||
|
const timeoutId = setTimeout(() => ac.abort(), 20000);
|
||||||
|
try {
|
||||||
|
const res = await fetch(target, {
|
||||||
|
headers: {
|
||||||
|
// Prefer markdown (Cloudflare Markdown for Agents); accept HTML/text fallback.
|
||||||
|
Accept: 'text/markdown, text/plain;q=0.9, text/html;q=0.8, */*;q=0.1',
|
||||||
|
'Accept-Encoding': 'gzip',
|
||||||
|
'User-Agent': 'jibo-llm/1.0 (+agent)',
|
||||||
|
},
|
||||||
|
redirect: 'follow',
|
||||||
|
signal: ac.signal,
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
return {
|
||||||
|
content: `fetch_url error: ${res.status} ${res.statusText} from ${target}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const ctype = (res.headers.get('content-type') || '').toLowerCase();
|
||||||
|
if (!/^(text\/|application\/(json|xml|xhtml))/.test(ctype) && ctype) {
|
||||||
|
return {
|
||||||
|
content: `fetch_url: refusing non-text content (${ctype}) from ${target}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
let body = await res.text();
|
||||||
|
const isMarkdown = ctype.includes('markdown');
|
||||||
|
const isHtml = ctype.includes('html') || /<html[\s>]/i.test(body.slice(0, 500));
|
||||||
|
|
||||||
|
if (!isMarkdown && isHtml) {
|
||||||
|
// Lightweight HTML→text: strip scripts/styles/tags, collapse whitespace.
|
||||||
|
body = body
|
||||||
|
.replace(/<script[\s\S]*?<\/script>/gi, ' ')
|
||||||
|
.replace(/<style[\s\S]*?<\/style>/gi, ' ')
|
||||||
|
.replace(/<noscript[\s\S]*?<\/noscript>/gi, ' ')
|
||||||
|
.replace(/<!--[\s\S]*?-->/g, ' ')
|
||||||
|
.replace(/<\/(p|div|li|h[1-6]|br|tr)>/gi, '\n')
|
||||||
|
.replace(/<[^>]+>/g, ' ')
|
||||||
|
.replace(/ /g, ' ')
|
||||||
|
.replace(/&/g, '&')
|
||||||
|
.replace(/</g, '<')
|
||||||
|
.replace(/>/g, '>')
|
||||||
|
.replace(/"/g, '"')
|
||||||
|
.replace(/'/g, "'")
|
||||||
|
.replace(/[ \t]+/g, ' ')
|
||||||
|
.replace(/\n{3,}/g, '\n\n')
|
||||||
|
.trim();
|
||||||
|
}
|
||||||
|
|
||||||
|
const truncated = body.length > maxChars;
|
||||||
|
const out = truncated ? body.slice(0, maxChars) + '\n…[truncated]' : body;
|
||||||
|
const finalUrl = res.url || target;
|
||||||
|
const fmt = isMarkdown ? 'markdown' : isHtml ? 'html→text' : 'text';
|
||||||
|
return {
|
||||||
|
content: `Fetched ${finalUrl} (${fmt}, ${body.length} chars${truncated ? `, truncated to ${maxChars}` : ''}):\n\n${out}`,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
if (err.name === 'AbortError') {
|
||||||
|
if (signal?.aborted) {
|
||||||
|
throw Object.assign(new Error('Conversation aborted'), { code: 'CONVERSATION_ABORTED' });
|
||||||
|
}
|
||||||
|
return { content: `fetch_url error: timeout fetching ${target}` };
|
||||||
|
}
|
||||||
|
return { content: `fetch_url error: ${err.message}` };
|
||||||
|
} finally {
|
||||||
|
clearTimeout(timeoutId);
|
||||||
|
signal?.removeEventListener('abort', onAbortHandler);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'end_conversation': {
|
||||||
|
console.log(' [tool:end_conversation] awaiting pending speech…');
|
||||||
|
await Promise.race([ctx.speechChain, onAbort(signal)]);
|
||||||
|
return { content: 'Conversation ended.', endConversation: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
return { content: `Unknown tool "${name}".` };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { TOOL_SCHEMAS, executeTool, wrapForScreen };
|
||||||
Reference in New Issue
Block a user