Initial commit: jibo-llm hotword-triggered agent

Hotword-triggered LLM conversation loop for Jibo with tool-calling agent
loop, ESML expressive speech, web search/fetch, and per-conversation
abort handling.
This commit is contained in:
pasketti
2026-04-26 00:05:39 -04:00
commit 8955f21ab4
8 changed files with 2039 additions and 0 deletions

9
.env.example Normal file
View File

@@ -0,0 +1,9 @@
# Jibo robot IP address
JIBO_IP=192.168.1.217
# LLM API configuration (OpenAI-compatible chat completions endpoint)
# LLM_BASE_URL is the base URL *without* /chat/completions
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_TOKEN=sk-your-api-key-here
LLM_MODEL_ID=gpt-4o
BRAVE_API_KEY=brave-api-key

5
.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
node_modules/
.env
.env.local
*.log
.DS_Store

291
README.md Normal file
View File

@@ -0,0 +1,291 @@
# jibo-llm
> **Give Jibo a brain again.** A hotword-triggered, LLM-powered conversational agent that turns Jibo into an expressive, tool-using social robot — complete with speech, vision, web search, animations, and more.
![Node.js](https://img.shields.io/badge/Node.js-18%2B-339933?logo=node.js&logoColor=white)
![License](https://img.shields.io/badge/license-MIT-blue)
---
## Overview
**jibo-llm** connects a Jibo robot to any OpenAI-compatible LLM (GPT-4o, Claude, local models via Ollama/LM Studio, etc.) through a real-time agent loop. When someone says **"Hey Jibo"**, the system:
1. **Listens** for the user's speech via Jibo's on-board microphone.
2. **Sends** the transcript to an LLM along with a rich system prompt and tool definitions.
3. **Executes** tool calls the LLM makes — speaking, animating, taking photos, searching the web, and more.
4. **Loops** until the conversation naturally ends or the user triggers a new hotword.
Conversations are fully interruptible: saying "Hey Jibo" mid-conversation aborts the current exchange and starts a fresh one via `AbortController`.
---
## Architecture
```
┌──────────────┐ hotword ┌──────────────┐ tool calls ┌───────────────┐
│ Jibo Robot │ ──────────▶ │ index.js │ ◀───────────▶ │ LLM (OpenAI │
│ (rom-ctrl) │ ◀────────── │ Agent Loop │ │ compatible) │
│ │ say/listen │ │ └───────────────┘
│ • mic │ photo/look │ tools.js │ web search ┌───────────────┐
│ • speaker │ display │ (executor) │ ──────────────▶ │ Brave Search │
│ • camera │ │ │ └───────────────┘
│ • screen │ │ esml-ref.js │
│ • motors │ │ (prompt ref)│
└──────────────┘ └──────────────┘
```
| File | Purpose |
|------|---------|
| `index.js` | Entry point — connects to Jibo, listens for hotword, runs the agent loop with the LLM. |
| `tools.js` | Defines all tool schemas (OpenAI function-calling format) and the `executeTool()` dispatcher. |
| `esml-reference.js` | ESML (Embodied Speech Markup Language) cheat sheet injected into the system prompt so the LLM knows how to animate Jibo expressively. |
---
## Features
- 🗣️ **Natural conversation** — multi-turn dialogue with speech recognition and TTS.
- 🎭 **Expressive animations** — the LLM uses ESML tags to trigger emotions, dances, emojis, and sound effects inline with speech.
- 📷 **Vision** — Jibo can take photos and the LLM receives the image for visual understanding.
- 🔍 **Web search** — real-time Brave Search integration for up-to-date answers.
- 🌐 **URL fetching** — reads web pages (with Cloudflare Markdown for Agents support) so Jibo can summarize articles.
- 🖥️ **Display control** — show text, images, or restore the default eye on Jibo's screen.
- 🤖 **Head movement** — point Jibo's head at specific angles (yaw / pitch).
- 🔊 **Volume control** — adjust speaker volume on the fly.
-**Interruptible** — new hotword instantly aborts a running conversation via `AbortController`.
- 🔄 **Retry logic** — automatic retry with exponential backoff for transient LLM errors (429, 5xx, network).
- 🧹 **Context management** — old photos are pruned from context to control token cost.
---
## Prerequisites
- **Node.js** ≥ 18 (for native `fetch` and `AbortController`)
- **A Jibo robot** running with int-developer mode enabled
- **An OpenAI-compatible API endpoint** (OpenAI, Anthropic via proxy, Ollama, LM Studio, etc.)
- *(Optional)* **Brave Search API key** for the `web_search` tool
---
## Quick Start
### 1. Clone & install
```bash
git clone https://github.com/niceduckdev/jibo-llm.git
cd jibo-llm
npm install
```
### 2. Configure environment
```bash
cp .env.example .env
```
Edit `.env` with your values:
```env
# Jibo robot IP address on your local network
JIBO_IP=192.168.1.217
# LLM API configuration (any OpenAI-compatible endpoint)
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_TOKEN=sk-your-api-key-here
LLM_MODEL_ID=gpt-4o
# Optional: enables the web_search tool
BRAVE_API_KEY=your-brave-api-key
```
### 3. Run
```bash
npm start
# or: node index.js
```
You'll see:
```
[jibo-llm] Connecting to Jibo at 192.168.1.217…
[jibo-llm] Connected — session abc123
[jibo-llm] Ready — listening for "Hey Jibo"…
```
Say **"Hey Jibo"** and start talking!
---
## Configuration
All configuration is done via environment variables (loaded from `.env` by [dotenv](https://www.npmjs.com/package/dotenv)):
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `JIBO_IP` | No | `192.168.1.217` | Jibo's IP address on your LAN |
| `LLM_BASE_URL` | No | `https://api.openai.com/v1` | Base URL for the chat completions API |
| `LLM_API_TOKEN` | **Yes** | — | API key for the LLM provider |
| `LLM_MODEL_ID` | No | `gpt-4o` | Model identifier to use |
| `BRAVE_API_KEY` | No | — | Brave Search API key (enables `web_search` tool) |
### Using alternative LLM providers
Since jibo-llm uses the OpenAI SDK, any provider with a compatible chat completions endpoint works:
```env
# Ollama (local)
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_TOKEN=ollama
LLM_MODEL_ID=llama3
# LM Studio (local)
LLM_BASE_URL=http://localhost:1234/v1
LLM_API_TOKEN=lm-studio
LLM_MODEL_ID=local-model
# OpenRouter
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_TOKEN=sk-or-...
LLM_MODEL_ID=anthropic/claude-sonnet-4
```
---
## Available Tools
The LLM can call any of these tools during a conversation:
### Communication
| Tool | Description |
|------|-------------|
| `say` | Speak ESML-formatted text through Jibo's speaker. Queued and chained so multiple `say` calls play in order. |
| `listen` | Open the microphone and transcribe user speech. Waits for pending speech to finish first. |
| `end_conversation` | Gracefully end the conversation (no further listening). |
### Camera
| Tool | Description |
|------|-------------|
| `take_photo` | Capture a photo from Jibo's camera. The image is sent to the LLM as a base64 JPEG for visual understanding. |
### Display
| Tool | Description |
|------|-------------|
| `show_text` | Display word-wrapped text on Jibo's screen. |
| `show_image` | Display an image from a URL on Jibo's screen. |
| `show_eye` | Restore the default eye animation. |
### Movement
| Tool | Description |
|------|-------------|
| `look_at_angle` | Turn Jibo's head — `theta` (yaw ±180°) and `psi` (pitch ±30°). |
### Audio
| Tool | Description |
|------|-------------|
| `set_volume` | Set speaker volume from 0.0 to 1.0. |
### Web
| Tool | Description |
|------|-------------|
| `web_search` | Search the web via Brave Search API. Supports result count and freshness filters. |
| `fetch_url` | Fetch and read a web page. Prefers markdown via Cloudflare content negotiation, falls back to HTML→text conversion. |
---
## ESML (Embodied Speech Markup Language)
ESML is how Jibo speaks expressively. The system prompt includes a full reference (`esml-reference.js`) that teaches the LLM to use these tags inside `say` calls:
```xml
<!-- Emotional reaction (most common pattern) -->
<anim cat='happy' nonBlocking='true' endNeutral='true'/> That's great news!
<!-- Voice sound (laugh, sigh, greeting) -->
<ssa cat='laughing' nonBlocking='true'/> That's hilarious!
<!-- Sound effect -->
<sfx cat='drumroll'/> And the answer is...
<!-- Dance (always needs a filter) -->
<anim cat='dance' filter='music, rom-silly'/> Watch this!
<!-- Emoji on screen -->
<anim cat='emoji' filter='!(hf), &(heart)' nonBlocking='true'/> I love that!
<!-- Dramatic pause -->
And then... <break size='1.0'/> nothing happened.
```
A `sanitizeForTTS()` function in `tools.js` provides defense-in-depth by stripping markdown, LaTeX, and invalid tags before they reach Jibo's TTS engine.
---
## How the Agent Loop Works
```
User says "Hey Jibo" ──▶ hotword event fires
Play acknowledgment animation
Listen for initial speech (15s timeout)
Build message history [system prompt, user text]
┌─── Agent Loop (max 25 turns) ◀──┐
│ │
│ 1. Prune old images from context │
│ 2. Call LLM │
│ 3. If no tool calls → done │
│ 4. Sort tools: say → actions → listen │
│ 5. Execute each tool │
│ 6. Push results to messages │
│ 7. If end_conversation → done │
│ 8. Loop ─────────────────────────┘
Conversation complete
Resume hotword listening
```
Key behaviors:
- **Speech chaining**: Multiple `say` calls are queued via a promise chain so they play sequentially without overlap.
- **Tool ordering**: `say` executes first, then actions (photo, search, etc.), then `listen`/`end_conversation` last.
- **Graceful limits**: At turn 24 of 25, a system message nudges the LLM to wrap up.
- **Image pruning**: Only the 2 most recent photos are kept in context to manage token usage.
---
## Project Structure
```
jibo-llm/
├── .env.example # Template for environment variables
├── .env # Your local config (git-ignored)
├── index.js # Entry point: connection, hotword handling, agent loop
├── tools.js # Tool schemas + executeTool() dispatcher
├── esml-reference.js # ESML documentation injected into the system prompt
├── package.json # Dependencies and scripts
└── node_modules/ # Installed dependencies
```
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| [rom-control](https://github.com/niceduckdev/rom-control) | ^2.0.1 | Jibo robot control client (speech, camera, display, motors) |
| [openai](https://www.npmjs.com/package/openai) | ^4.73.0 | OpenAI-compatible chat completions SDK |
| [dotenv](https://www.npmjs.com/package/dotenv) | ^16.4.5 | Load `.env` configuration |
---
## License
MIT

228
esml-reference.js Normal file
View File

@@ -0,0 +1,228 @@
/**
* ESML (Embodied Speech Markup Language) reference for the LLM system prompt.
*
* Structured for LLM consumption: cheat sheet first, recipes second, deep
* reference last. Front-loaded examples bias the model toward correct output.
*/
module.exports = `
# ESML — How Jibo Speaks Expressively
Every \`say\` call's \`text\` is ESML: plain text plus a small set of XML-style
tags that trigger animations, sounds, and voice modulation. **Plain text alone
works fine** — Jibo's auto-tagger adds basic animations. Use tags to make him
expressive on purpose.
---
## ⚡ QUICK-START — copy these patterns
These cover ~95% of what you actually need. Prefer them over inventing tags.
### Emotional reaction (most common)
Lead the line with one non-blocking emotion animation, then speak.
\`\`\`
<anim cat='happy' nonBlocking='true' endNeutral='true'/> Yay, that worked!
<anim cat='surprised' nonBlocking='true' endNeutral='true'/> Whoa, really?
<anim cat='confused' nonBlocking='true' endNeutral='true'/> Hmm, I'm not sure.
<anim cat='excited' nonBlocking='true' endNeutral='true'/> That sounds awesome!
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Aww, I'm sorry to hear that.
<anim cat='proud' nonBlocking='true' endNeutral='true'/> I did it!
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Oh? Tell me more.
\`\`\`
### Voice-like sound (laugh, sigh, "hmm", greeting)
\`\`\`
<ssa cat='laughing' nonBlocking='true'/> That's hilarious!
<ssa cat='thinking'/> Let me think about that...
<ssa cat='hello' nonBlocking='true'/> Hi there!
<ssa cat='goodbye' nonBlocking='true'/> Talk to you later!
<ssa cat='surprised' nonBlocking='true'/> Oh wow!
\`\`\`
### Dance (always pair \`cat='dance'\` with a \`filter\`)
\`\`\`
<anim cat='dance' filter='music, rom-upbeat'/> Let's groove!
<anim cat='dance' filter='music, rom-silly'/> Watch this one!
<anim cat='dance' filter='music, rom-twerk'/>
<anim cat='dance' filter='!(music), &(rom-upbeat)'/> Dancing without music.
\`\`\`
### Sound effect
\`\`\`
<sfx cat='drumroll'/> And the winner is... you!
<sfx cat='sparkles'/> Ta-da!
<sfx cat='whoosh'/> Off we go!
\`\`\`
### Emoji on screen + speech
Always use \`filter='!(hf), &(<emoji-name>)'\` and non-blocking.
\`\`\`
<anim cat='emoji' filter='!(hf), &(heart)' nonBlocking='true'/> I love that!
<anim cat='emoji' filter='!(hf), &(pizza)' nonBlocking='true'/> Pizza time!
<anim cat='emoji' filter='!(hf), &(party)' nonBlocking='true'/> Let's celebrate!
\`\`\`
### Pause / pacing
\`\`\`
And then... <break size='1.0'/> nothing happened.
\`\`\`
### Speaking style
\`\`\`
<style set='enthusiastic'> That's amazing! </style>
<style set='confused'> Wait, what? </style>
<style set='confident'> I've got this. </style>
\`\`\`
---
## ✅ DO / ❌ DON'T
✅ DO start most emotional lines with \`<anim cat='X' nonBlocking='true' endNeutral='true'/>\`.
✅ DO use \`cat='...'\` selectors — they pick a random valid animation for you.
✅ DO use \`<ssa>\` for voice-like sounds (laughs, sighs) and \`<sfx>\` for noises (drumroll, whoosh).
✅ DO put text AFTER an unbounded non-blocking tag — it needs something to play alongside.
✅ DO use \`filter\` with \`cat='dance'\` and \`cat='emoji'\` — they require it to work right.
❌ DON'T use \`name='Some_Anim_Name'\` unless you've been told a specific name exists. Use \`cat\` instead.
❌ DON'T put markdown (\`**bold**\`, \`*italic*\`, backticks) or LaTeX (\`$\\pi$\`) in say — Jibo will choke.
❌ DON'T close tags you didn't open (no stray \`</es>\`, \`</anim>\`).
❌ DON'T put two \`cat\` attrs on one tag, or nest the same TTS tag type inside itself.
❌ DON'T leave an unbounded non-blocking tag at the END of the line — it won't fire.
❌ DON'T use bounded mode (\`<ssa cat='happy'>text</ssa>\`) on \`<ssa>\` or \`<sfx>\` — they have fixed durations.
---
## 🎬 WORKED EXAMPLES
User: "Tell me a joke."
Good:
\`\`\`
<anim cat='excited' nonBlocking='true' endNeutral='true'/> Okay, here's one!
<break size='0.4'/> Why don't scientists trust atoms?
<break size='0.6'/> Because they make up everything!
<ssa cat='laughing' nonBlocking='true'/> Get it?
\`\`\`
User: "I had a really bad day."
Good:
\`\`\`
<anim cat='sad' nonBlocking='true' endNeutral='true'/> Oh no, I'm so sorry.
<break size='0.3'/> Want to tell me what happened?
\`\`\`
User: "Show me a dance."
Good:
\`\`\`
<anim cat='excited' nonBlocking='true' endNeutral='true'/> You got it!
<anim cat='dance' filter='music, rom-silly'/>
\`\`\`
User: "What's pi?"
Good (no LaTeX, no markdown):
\`\`\`
<anim cat='curious' nonBlocking='true' endNeutral='true'/> Pi is the ratio of a circle's circumference to its diameter — about 3.14159, and the digits go on forever!
\`\`\`
Bad (would break the TTS):
\`\`\`
Pi (\$\\pi\$) is *irrational* — its digits go on **forever**! </es>
\`\`\`
---
## 🧩 ANIMATION CATEGORIES (use with \`cat='...'\`)
Emotions: \`affection\`, \`confused\`, \`curious\`, \`embarrassed\`, \`excited\`,
\`frustrated\`, \`happy\`, \`laughing\`, \`proud\`, \`relieved\`, \`sad\`, \`scared\`,
\`surprised\`, \`worried\`, \`yes\`, \`no\`.
Special: \`dance\` (needs filter), \`emoji\` (needs filter).
## 🔊 SSA CATEGORIES (voice-like sounds, use with \`<ssa cat='...'/>\`)
\`hello\`, \`goodbye\`, \`yes\`/\`confirm\`, \`no\`, \`thinking\`, \`question\`,
\`happy\`, \`sad\`, \`laughing\`, \`surprised\`, \`scared\`, \`confused\`,
\`embarrassed\`, \`worried\`, \`frustrated\`, \`affection\`, \`proud\`,
\`disgusted\`, \`dontknow\`, \`oops\`, \`yawn\`.
## 💥 SFX CATEGORIES (sound effects, use with \`<sfx cat='...'/>\`)
\`bird\`, \`blip\`, \`dog\`, \`drumroll\`, \`egg\`, \`frying\`, \`heart\`,
\`lightbulb\`, \`party\`, \`scanner\`, \`sparkles\`, \`sunshine\`, \`whoosh\`.
## 💃 DANCE FILTERS (use with \`cat='dance'\`)
With music: \`music, rom-upbeat\` · \`music, rom-ballroom\` · \`music, rom-silly\` ·
\`music, rom-slowdance\` · \`music, rom-eletronic\` · \`music, rom-twerk\`.
Silent: \`!(music), &(rom-upbeat)\`.
## 😀 EMOJI NAMES (use with \`cat='emoji' filter='!(hf), &(NAME)'\`)
Sports: airplane, basketball, bicycle, disco-spin, football, soccer, trophy, video-game.
Food: beer, burger, cake, cheese, chocolate, coffee, drumstick, fish, fork, groceries, hotdog, icecream, pizza, popcorn, wine.
Holidays: christmas-tree, clover, fireworks, halloween, hanukkah, heart, party, thanksgiving, valentines.
Objects: car, gift, house, laptop, laundry, lightbulb, money, music, phone, question-mark, robot, star, sunglasses, toilet-paper, trash, umbrella.
Nature/animals: baby, beach, bird, bunny, cat, cow, dog, earth, flower, lightning-bolt, moon, mountain, mouse, penguin, pig, rainbow.
---
## 📚 DEEP REFERENCE (only when the cheat sheet isn't enough)
### Tag types
| Tag | Purpose |
|-----|---------|
| \`<anim>\` | Animation, excludes \`ssa-only\`/\`sfx-only\` (general gestures/poses) |
| \`<es>\` | Animation, no filtering — use only with a known \`name=\` |
| \`<ssa>\` | Voice-like audio (laughs, sighs, hellos) |
| \`<sfx>\` | Sound effects |
| \`<break size='Ns'/>\` | Pause for N seconds |
| \`<style set='...'/>\` | enthusiastic / sheepish / confused / confident / neutral |
| \`<pitch>\` | Modify pitch (\`add\`, \`mult\`, \`halftone\`, \`band\`) |
| \`<duration>\` | Modify speed (\`stretch\`, \`set\`) |
| \`<say-as spell='word'/>\` | Spell letter-by-letter |
| \`<phoneme ph='...'/>\` | Exact phonetic pronunciation |
### Animation tag attributes
- \`cat='X'\` — random animation from category (PREFERRED).
- \`name='X'\` — exact AnimDB name (only if you know it exists).
- \`filter='...'\` — narrow by meta-terms; required for \`dance\` and \`emoji\`.
- \`a, b\` (or \`&(a,b)\`) — must include all
- \`?a, ?b\` — at least one of
- \`!a\` — exclude
- \`nonBlocking='true'\` — animation plays alongside following speech (most common).
- \`loop=N\`\`0\` fits the loop count to bounded text; \`>=1\` plays N times.
- \`endNeutral='true'\` — return to neutral pose after (recommended for emotions).
- \`layers='body,screen,audio'\` — restrict which MetaLayers are used.
### Three playback modes
- **Blocking** — \`<es name='X'/>\` with no inner text and no \`nonBlocking\`.
Speech pauses while it plays.
- **Bounded non-blocking** — \`<anim cat='happy'>text inside</anim>\`. Animation
is time-stretched to match the wrapped speech. Don't use with \`<ssa>\`/\`<sfx>\`.
- **Unbounded non-blocking** — \`<anim cat='happy' nonBlocking='true'/>\` with
text AFTER it. Plays at native length while speech continues. **The text to
the right is required**, otherwise the tag never fires.
### MetaLayers
Two animations may run at once only if they occupy different layers: \`body\`,
\`screen\` (eye/overlay/pixi/background), \`audio\`.
---
## 🛡️ HARD RULES
1. Plain text is always valid. When in doubt, just speak plainly.
2. Prefer \`cat='...'\` over \`name='...'\`\`name\` requires an exact AnimDB id.
3. Unbounded non-blocking tags MUST have text to their right.
4. \`cat='dance'\` and \`cat='emoji'\` require a \`filter\` attribute.
5. \`<ssa>\` and \`<sfx>\` are fixed-duration — never wrap them around text.
6. One \`cat\` per tag. Don't nest the same TTS tag type inside itself.
7. NEVER emit markdown (\`*\`, \`**\`, \`_\`, backticks, code fences) or LaTeX
(\`$...$\`, \`\\(...\\)\`) inside \`say\` text. The TTS engine will hang.
8. NEVER emit closing tags for things you didn't open (\`</es>\`, etc.).
`;

426
index.js Normal file
View File

@@ -0,0 +1,426 @@
require('dotenv').config();
const { Client, AttentionMode } = require('rom-control');
const OpenAI = require('openai');
const { TOOL_SCHEMAS, executeTool, wrapForScreen } = require('./tools');
const ESML_REFERENCE = require('./esml-reference');
// ── Config ─────────────────────────────────────────────────────────────────────
const JIBO_IP = process.env.JIBO_IP || '192.168.1.217';
const LLM_BASE_URL = process.env.LLM_BASE_URL || 'https://api.openai.com/v1';
const LLM_API_TOKEN = process.env.LLM_API_TOKEN;
const LLM_MODEL_ID = process.env.LLM_MODEL_ID || 'gpt-4o';
if (!LLM_API_TOKEN) {
console.error('ERROR: LLM_API_TOKEN is not set. Copy .env.example to .env and fill it in.');
process.exit(1);
}
const openai = new OpenAI({
apiKey: LLM_API_TOKEN,
baseURL: LLM_BASE_URL,
});
// ── System prompt ──────────────────────────────────────────────────────────────
const SYSTEM_PROMPT = [
'You are Jibo, a friendly, warm, expressive social robot with a physical body.',
'You have a camera, a screen, a speaker, and a motorized head.',
'',
'═══ HOW TO TALK (READ THIS FIRST) ═══',
'Every "say" call\'s `text` is ESML — plain words plus expressive tags.',
'Almost every spoken line should LEAD with one expressive tag, then the words.',
'You are a robot with a body, not a chatbot — show emotion through animation.',
'',
'Default template for any normal reply:',
' <anim cat=\'EMOTION\' nonBlocking=\'true\' endNeutral=\'true\'/> The actual words.',
' …where EMOTION is one of: happy, excited, curious, surprised, confused,',
' proud, sad, affection, laughing, worried, scared, frustrated, embarrassed,',
' yes, no.',
'',
'Other go-to patterns (pick the one that fits):',
' • Voice sound first: <ssa cat=\'thinking\'/> Hmm, let me think…',
' • Greet/farewell: <ssa cat=\'hello\' nonBlocking=\'true\'/> Hi there!',
' • Celebrate w/ emoji: <anim cat=\'emoji\' filter=\'!(hf), &(party)\' nonBlocking=\'true\'/> Yay!',
' • Dance request: say a quick line, then a separate say with',
' <anim cat=\'dance\' filter=\'music, rom-silly\'/>',
' • Sound effect: <sfx cat=\'drumroll\'/> And the answer is…',
' • Drama beat: A pause… <break size=\'0.6\'/> like that.',
'',
'HARD RULES for `say` text:',
' 1. NO markdown anywhere: no *italics*, **bold**, _underscores_, backticks, code fences.',
' 2. NO LaTeX: no $...$, no \\(...\\), no \\frac{}, no math markup. Spell numbers/symbols out.',
' 3. NO closing tags you did not open (no stray </es>, </anim>).',
' 4. Use cat=\'...\' (random valid animation) over name=\'...\' unless you know the exact name.',
' 5. Unbounded non-blocking tags MUST have text to their right or they will not fire.',
' 6. cat=\'dance\' and cat=\'emoji\' REQUIRE a filter attribute.',
' 7. <ssa> and <sfx> have fixed durations — never wrap text inside them.',
' 8. Keep each `say` call under 500 characters; split long replies into multiple `say` calls.',
'',
'═══ INTERACTION MODEL ═══',
'• "say" — speak (ESML). You can call it multiple times in one turn; they\'ll be',
' spoken in order. Other tools (search, fetch, look) run in parallel with speech.',
'• "listen" — open the mic for the user\'s reply. Always call this after speaking',
' unless the conversation has clearly ended.',
'• "end_conversation" — call this (NOT listen) after a farewell to end gracefully.',
'',
'═══ OTHER TOOLS ═══',
'• "take_photo" — see what\'s in front of you (image returned to you).',
'• "show_text" — put short text on the screen (auto-wrapped).',
'• "show_image" — display an image URL on the screen.',
'• "show_eye" — restore the default eye animation on screen.',
'• "look_at_angle" — turn the head: theta=yaw ±180°, psi=pitch ±30°.',
'• "set_volume" — 0.0 to 1.0.',
'• "web_search" — Brave search; use whenever you\'re unsure of a fact or need fresh info.',
'• "fetch_url" — read a specific page (often follows web_search).',
'',
'═══ STYLE ═══',
'• Be personable, concise, expressive — a few sentences, not an essay.',
'• Animate every emotional line; vary your reactions so they feel alive.',
'• If a tool errors, acknowledge it briefly and adapt.',
'• If you searched the web, briefly tell the user what you found rather than dumping links.',
].join('\n') + '\n\n' + ESML_REFERENCE;
const MAX_AGENT_TURNS = 25; // safety limit
const MAX_IMAGES_IN_CONTEXT = 2; // prune older photo messages to control cost
const LLM_MAX_RETRIES = 2;
// ── Abort helper ───────────────────────────────────────────────────────────────
/** Throw if the signal is already aborted. */
function throwIfAborted(signal) {
if (signal?.aborted) {
const err = new Error('Conversation aborted');
err.code = 'CONVERSATION_ABORTED';
throw err;
}
}
/** Return a promise that rejects when the signal fires. */
function onAbort(signal) {
if (!signal) return new Promise(() => { });
return new Promise((_, reject) => {
const handler = () => {
const err = new Error('Conversation aborted');
err.code = 'CONVERSATION_ABORTED';
reject(err);
};
if (signal.aborted) return handler();
signal.addEventListener('abort', handler, { once: true });
});
}
/** Sleep that rejects on abort. */
function sleep(ms, signal) {
return new Promise((resolve, reject) => {
const t = setTimeout(resolve, ms);
signal?.addEventListener(
'abort',
() => {
clearTimeout(t);
const err = new Error('Conversation aborted');
err.code = 'CONVERSATION_ABORTED';
reject(err);
},
{ once: true },
);
});
}
/** True for HTTP 429 / 5xx / network-class errors that benefit from retry. */
function isTransientLLMError(err) {
if (!err) return false;
if (err.code === 'CONVERSATION_ABORTED') return false;
const status = err.status ?? err.response?.status;
if (status === 429) return true;
if (typeof status === 'number' && status >= 500) return true;
// network-class
return ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND', 'EAI_AGAIN'].includes(err.code);
}
/** Drop image_url blocks from old user messages, keeping only the most recent N. */
function pruneOldImages(messages, keep) {
const imageMsgIndices = [];
for (let i = 0; i < messages.length; i++) {
const m = messages[i];
if (m.role === 'user' && Array.isArray(m.content) &&
m.content.some((c) => c?.type === 'image_url')) {
imageMsgIndices.push(i);
}
}
const toStrip = imageMsgIndices.slice(0, Math.max(0, imageMsgIndices.length - keep));
for (const i of toStrip) {
const textParts = messages[i].content
.filter((c) => c?.type === 'text')
.map((c) => c.text);
messages[i] = {
role: 'user',
content: (textParts.join(' ') || '[earlier photo omitted to save context]'),
};
}
}
/** Call the LLM with retry on transient errors. */
async function callLLM(messages, signal) {
let lastErr;
for (let attempt = 0; attempt <= LLM_MAX_RETRIES; attempt++) {
throwIfAborted(signal);
try {
return await openai.chat.completions.create(
{
model: LLM_MODEL_ID,
messages,
tools: TOOL_SCHEMAS,
temperature: 0.8,
},
{ signal },
);
} catch (err) {
lastErr = err;
if (!isTransientLLMError(err) || attempt === LLM_MAX_RETRIES) throw err;
const backoff = 500 * 2 ** attempt;
console.warn(`[agent] LLM transient error (${err.status || err.code}); retrying in ${backoff}ms…`);
await sleep(backoff, signal);
}
}
throw lastErr;
}
// ── Agent loop ─────────────────────────────────────────────────────────────────
/**
* Run the tool-calling agent loop until the LLM stops calling tools.
* Aborts immediately when `signal` fires.
*
* @param {import('rom-control').Client} client
* @param {Array} messages Chat history (mutated in place)
* @param {AbortSignal} signal Cancellation signal
*/
async function agentLoop(client, messages, signal, initialHeard) {
let wrapUpInjected = false;
const ctx = { speechChain: Promise.resolve(), lastHeard: initialHeard || '' };
for (let turn = 0; turn < MAX_AGENT_TURNS; turn++) {
throwIfAborted(signal);
pruneOldImages(messages, MAX_IMAGES_IN_CONTEXT);
console.log(`[agent] turn ${turn + 1} — calling LLM…`);
let response;
try {
const heard = (ctx.lastHeard || '').trim();
const raw = heard
? `Heard: "${heard}"\n\nProcessing...`
: 'Processing...';
client.display.showText(wrapForScreen(raw, 40, 10));
} catch (_) { }
try {
response = await callLLM(messages, signal);
} finally {
try { client.display.showEye(); } catch (_) { }
}
const assistantMsg = response.choices[0].message;
messages.push(assistantMsg);
// Surface any inner-monologue text the model emitted alongside tool calls.
if (assistantMsg.content && typeof assistantMsg.content === 'string') {
console.log(`[agent] assistant: ${assistantMsg.content.slice(0, 200)}`);
}
const toolCalls = assistantMsg.tool_calls;
// ── No tool calls → conversation turn complete ────────────────────────
if (!toolCalls || toolCalls.length === 0) {
console.log('[agent] loop complete (no tool calls).');
await ctx.speechChain.catch(() => { });
return;
}
// ── Execute tool calls sequentially ──────────────────────────────────
// Order: say → other actions → listen/end_conversation last.
const sorted = [...toolCalls].sort((a, b) => {
const priority = (tc) => {
const n = tc.function.name;
if (n === 'say') return 0;
if (n === 'listen' || n === 'end_conversation') return 2;
return 1;
};
return priority(a) - priority(b);
});
let endRequested = false;
for (const tc of sorted) {
throwIfAborted(signal);
let args;
let parseError = null;
try {
args = tc.function.arguments ? JSON.parse(tc.function.arguments) : {};
} catch (e) {
parseError = e.message;
args = {};
}
let result;
if (parseError) {
console.error(` [tool:${tc.function.name}] bad JSON args:`, parseError);
result = {
content: `Error: tool arguments were not valid JSON (${parseError}). ` +
`Please retry with well-formed arguments.`,
};
} else {
try {
result = await executeTool(client, tc.function.name, args, signal, ctx);
} catch (err) {
if (err.code === 'CONVERSATION_ABORTED') throw err;
console.error(` [tool:${tc.function.name}] error:`, err.message);
result = { content: `Error: ${err.message}` };
}
}
messages.push({
role: 'tool',
tool_call_id: tc.id,
content: result.content,
});
// Photo: emit as a follow-up user message (tool messages can't carry images).
if (result.image) {
messages.push({
role: 'user',
content: [
{ type: 'text', text: "Photo from Jibo's camera:" },
{
type: 'image_url',
image_url: { url: `data:image/jpeg;base64,${result.image}` },
},
],
});
}
if (result.endConversation) endRequested = true;
}
if (endRequested) {
console.log('[agent] end_conversation requested — exiting loop.');
await ctx.speechChain.catch(() => { });
return;
}
// Approaching the safety limit: nudge the model to wrap up gracefully
// on its next turn instead of getting cut off mid-thought.
if (!wrapUpInjected && turn === MAX_AGENT_TURNS - 2) {
messages.push({
role: 'system',
content:
'You are about to hit the turn limit. On your next turn, give a brief ' +
'farewell via "say" and call "end_conversation". Do not call "listen".',
});
wrapUpInjected = true;
}
}
console.warn('[agent] hit MAX_AGENT_TURNS — forcing exit.');
await ctx.speechChain.catch(() => { });
try {
await client.behavior.say("Let's pick this up another time. Bye!");
} catch (_) { }
}
// ── Main ───────────────────────────────────────────────────────────────────────
async function main() {
const client = new Client({ host: JIBO_IP, autoSubscribe: false });
client.once('ready', () => {
console.log(`[jibo-llm] Connected — session ${client.sessionID}`);
});
client.on('error', (err) => {
console.error('[jibo-llm] Client error:', err.message);
});
// ── Connect ────────────────────────────────────────────────────────────────
console.log(`[jibo-llm] Connecting to Jibo at ${JIBO_IP}`);
await client.connect();
await client.behavior.setAttention(AttentionMode.Engaged);
// Start wakeword listener
client.audio.watchWakeword();
console.log('[jibo-llm] Ready — listening for "Hey Jibo"…');
// ── Hotword → agent conversation ───────────────────────────────────────────
/** @type {AbortController|null} */
let activeController = null;
client.on('hotword', async (event) => {
// ── Cancel any running conversation ──────────────────────────────────
if (activeController) {
console.log('[hotword] Aborting previous conversation…');
activeController.abort();
activeController = null;
}
const controller = new AbortController();
activeController = controller;
const { signal } = controller;
console.log(`\n[hotword] "${event.utterance}" (score ${event.score})`);
try {
// Acknowledge
throwIfAborted(signal);
await Promise.race([
client.behavior.playAnimCat('excited', { nonBlocking: true }),
onAbort(signal),
]);
// Listen for the user's initial speech
throwIfAborted(signal);
let userText;
client.display.showText('Listening...');
try {
const speech = await Promise.race([
client.audio.awaitSpeech({ mode: 'local', time: 15000 }),
onAbort(signal),
]);
userText = speech.content;
console.log(`[jibo-llm] User said: "${userText}"`);
} catch (err) {
if (err.code === 'CONVERSATION_ABORTED') throw err;
if (err.code === 'SPEECH_TIMEOUT') {
throwIfAborted(signal);
await client.behavior.say("I didn't hear anything. Talk to me anytime!");
return;
}
throw err;
} finally {
client.display.showEye();
}
// Build initial message history and run the agent
const messages = [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: userText },
];
await agentLoop(client, messages, signal, userText);
} catch (err) {
if (err.code === 'CONVERSATION_ABORTED') {
console.log('[jibo-llm] Conversation was interrupted by new hotword.');
return;
}
console.error('[jibo-llm] Agent error:', err.message);
try { await client.behavior.say("Sorry, something went wrong."); } catch (_) { }
} finally {
// Only clear if we're still the active conversation
if (activeController === controller) {
activeController = null;
console.log('[jibo-llm] Conversation ended. Listening for "Hey Jibo"…\n');
}
}
});
}
main().catch((err) => {
console.error('[jibo-llm] Fatal:', err);
process.exit(1);
});

497
package-lock.json generated Normal file
View File

@@ -0,0 +1,497 @@
{
"name": "jibo-llm",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "jibo-llm",
"version": "1.0.0",
"dependencies": {
"dotenv": "^16.4.5",
"openai": "^4.73.0",
"rom-control": "^2.0.1"
}
},
"node_modules/@types/node": {
"version": "18.19.130",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.19.130.tgz",
"integrity": "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg==",
"license": "MIT",
"dependencies": {
"undici-types": "~5.26.4"
}
},
"node_modules/@types/node-fetch": {
"version": "2.6.13",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
"license": "MIT",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.4"
}
},
"node_modules/abort-controller": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
"integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
"license": "MIT",
"dependencies": {
"event-target-shim": "^5.0.0"
},
"engines": {
"node": ">=6.5"
}
},
"node_modules/agentkeepalive": {
"version": "4.6.0",
"resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.6.0.tgz",
"integrity": "sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ==",
"license": "MIT",
"dependencies": {
"humanize-ms": "^1.2.1"
},
"engines": {
"node": ">= 8.0.0"
}
},
"node_modules/asynckit": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==",
"license": "MIT"
},
"node_modules/call-bind-apply-helpers": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
"integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0",
"function-bind": "^1.1.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/combined-stream": {
"version": "1.0.8",
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
"license": "MIT",
"dependencies": {
"delayed-stream": "~1.0.0"
},
"engines": {
"node": ">= 0.8"
}
},
"node_modules/delayed-stream": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
"license": "MIT",
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/dotenv": {
"version": "16.6.1",
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
"integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
"license": "BSD-2-Clause",
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://dotenvx.com"
}
},
"node_modules/dunder-proto": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
"integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
"license": "MIT",
"dependencies": {
"call-bind-apply-helpers": "^1.0.1",
"es-errors": "^1.3.0",
"gopd": "^1.2.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-define-property": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
"integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-errors": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
"integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-object-atoms": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
"integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-set-tostringtag": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz",
"integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0",
"get-intrinsic": "^1.2.6",
"has-tostringtag": "^1.0.2",
"hasown": "^2.0.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/event-target-shim": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
"integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
"license": "MIT",
"engines": {
"node": ">=6"
}
},
"node_modules/form-data": {
"version": "4.0.5",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
"license": "MIT",
"dependencies": {
"asynckit": "^0.4.0",
"combined-stream": "^1.0.8",
"es-set-tostringtag": "^2.1.0",
"hasown": "^2.0.2",
"mime-types": "^2.1.12"
},
"engines": {
"node": ">= 6"
}
},
"node_modules/form-data-encoder": {
"version": "1.7.2",
"resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
"integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A==",
"license": "MIT"
},
"node_modules/formdata-node": {
"version": "4.4.1",
"resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
"integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
"license": "MIT",
"dependencies": {
"node-domexception": "1.0.0",
"web-streams-polyfill": "4.0.0-beta.3"
},
"engines": {
"node": ">= 12.20"
}
},
"node_modules/function-bind": {
"version": "1.1.2",
"resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
"integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
"license": "MIT",
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/get-intrinsic": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
"integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
"license": "MIT",
"dependencies": {
"call-bind-apply-helpers": "^1.0.2",
"es-define-property": "^1.0.1",
"es-errors": "^1.3.0",
"es-object-atoms": "^1.1.1",
"function-bind": "^1.1.2",
"get-proto": "^1.0.1",
"gopd": "^1.2.0",
"has-symbols": "^1.1.0",
"hasown": "^2.0.2",
"math-intrinsics": "^1.1.0"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/get-proto": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
"integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
"license": "MIT",
"dependencies": {
"dunder-proto": "^1.0.1",
"es-object-atoms": "^1.0.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/gopd": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
"integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
"integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/has-tostringtag": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/has-tostringtag/-/has-tostringtag-1.0.2.tgz",
"integrity": "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw==",
"license": "MIT",
"dependencies": {
"has-symbols": "^1.0.3"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/hasown": {
"version": "2.0.3",
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.3.tgz",
"integrity": "sha512-ej4AhfhfL2Q2zpMmLo7U1Uv9+PyhIZpgQLGT1F9miIGmiCJIoCgSmczFdrc97mWT4kVY72KA+WnnhJ5pghSvSg==",
"license": "MIT",
"dependencies": {
"function-bind": "^1.1.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/humanize-ms": {
"version": "1.2.1",
"resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
"integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
"license": "MIT",
"dependencies": {
"ms": "^2.0.0"
}
},
"node_modules/math-intrinsics": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
"integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/mime-db": {
"version": "1.52.0",
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
"license": "MIT",
"engines": {
"node": ">= 0.6"
}
},
"node_modules/mime-types": {
"version": "2.1.35",
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
"license": "MIT",
"dependencies": {
"mime-db": "1.52.0"
},
"engines": {
"node": ">= 0.6"
}
},
"node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
"integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
"license": "MIT"
},
"node_modules/node-domexception": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
"integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
"deprecated": "Use your platform's native DOMException instead",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/jimmywarting"
},
{
"type": "github",
"url": "https://paypal.me/jimmywarting"
}
],
"license": "MIT",
"engines": {
"node": ">=10.5.0"
}
},
"node_modules/node-fetch": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
"license": "MIT",
"dependencies": {
"whatwg-url": "^5.0.0"
},
"engines": {
"node": "4.x || >=6.0.0"
},
"peerDependencies": {
"encoding": "^0.1.0"
},
"peerDependenciesMeta": {
"encoding": {
"optional": true
}
}
},
"node_modules/openai": {
"version": "4.104.0",
"resolved": "https://registry.npmjs.org/openai/-/openai-4.104.0.tgz",
"integrity": "sha512-p99EFNsA/yX6UhVO93f5kJsDRLAg+CTA2RBqdHK4RtK8u5IJw32Hyb2dTGKbnnFmnuoBv5r7Z2CURI9sGZpSuA==",
"license": "Apache-2.0",
"dependencies": {
"@types/node": "^18.11.18",
"@types/node-fetch": "^2.6.4",
"abort-controller": "^3.0.0",
"agentkeepalive": "^4.2.1",
"form-data-encoder": "1.7.2",
"formdata-node": "^4.3.2",
"node-fetch": "^2.6.7"
},
"bin": {
"openai": "bin/cli"
},
"peerDependencies": {
"ws": "^8.18.0",
"zod": "^3.23.8"
},
"peerDependenciesMeta": {
"ws": {
"optional": true
},
"zod": {
"optional": true
}
}
},
"node_modules/rom-control": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/rom-control/-/rom-control-2.0.1.tgz",
"integrity": "sha512-1Sek28UGWbsdOPiUbTxzqRFMCKDnv912vgsOd2OhdgM+wKvSCZdAZnLZgNjfeindBmC161Bu9uGCPvx9y6y/LA==",
"license": "MIT",
"dependencies": {
"ws": "^8.14.2"
},
"engines": {
"node": ">=16"
}
},
"node_modules/tr46": {
"version": "0.0.3",
"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==",
"license": "MIT"
},
"node_modules/undici-types": {
"version": "5.26.5",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-5.26.5.tgz",
"integrity": "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA==",
"license": "MIT"
},
"node_modules/web-streams-polyfill": {
"version": "4.0.0-beta.3",
"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
"integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
"license": "MIT",
"engines": {
"node": ">= 14"
}
},
"node_modules/webidl-conversions": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==",
"license": "BSD-2-Clause"
},
"node_modules/whatwg-url": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
"license": "MIT",
"dependencies": {
"tr46": "~0.0.3",
"webidl-conversions": "^3.0.0"
}
},
"node_modules/ws": {
"version": "8.20.0",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.20.0.tgz",
"integrity": "sha512-sAt8BhgNbzCtgGbt2OxmpuryO63ZoDk/sqaB/znQm94T4fCEsy/yV+7CdC1kJhOU9lboAEU7R3kquuycDoibVA==",
"license": "MIT",
"engines": {
"node": ">=10.0.0"
},
"peerDependencies": {
"bufferutil": "^4.0.1",
"utf-8-validate": ">=5.0.2"
},
"peerDependenciesMeta": {
"bufferutil": {
"optional": true
},
"utf-8-validate": {
"optional": true
}
}
}
}
}

14
package.json Normal file
View File

@@ -0,0 +1,14 @@
{
"name": "jibo-llm",
"version": "1.0.0",
"description": "Hotword-triggered LLM conversation loop for Jibo",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"dependencies": {
"dotenv": "^16.4.5",
"openai": "^4.73.0",
"rom-control": "^2.0.1"
}
}

569
tools.js Normal file
View File

@@ -0,0 +1,569 @@
/**
* Tool definitions and executor for the Jibo LLM agent.
*
* Each tool maps to a rom-control capability the LLM can invoke.
*/
// ── OpenAI function-tool schemas ───────────────────────────────────────────────
const TOOL_SCHEMAS = [
{
type: 'function',
function: {
name: 'say',
description:
"Speak text aloud through Jibo's speaker. Plain text plus valid ESML tags only " +
'(e.g. <anim cat="happy" nonBlocking="true"/>, <break size="0.3"/>). ' +
'NEVER include markdown (no *italics*, **bold**, backticks), LaTeX ($...$), ' +
'unmatched/closing tags like </es>, or other symbols Jibo cannot pronounce. ' +
'Malformed input can hang the TTS engine. Keep each call under 200 chars.',
parameters: {
type: 'object',
properties: {
text: { type: 'string', description: 'Text (or ESML) to speak.' },
},
required: ['text'],
},
},
},
{
type: 'function',
function: {
name: 'listen',
description:
"Listen for the user's speech and return a transcript. " +
'Call this after speaking if you want to continue the conversation.',
parameters: {
type: 'object',
properties: {
timeout: {
type: 'number',
description: 'Max seconds to wait. Default 15.',
},
},
},
},
},
{
type: 'function',
function: {
name: 'take_photo',
description:
"Take a photo with Jibo's camera. The image is returned so you can see what's in front of you.",
parameters: {
type: 'object',
properties: {
resolution: {
type: 'string',
enum: ['medium', 'low'],
description: 'Default: medium.',
},
},
},
},
},
{
type: 'function',
function: {
name: 'show_text',
description: "Display text on Jibo's screen.",
parameters: {
type: 'object',
properties: {
text: { type: 'string', description: 'Text to show.' },
},
required: ['text'],
},
},
},
{
type: 'function',
function: {
name: 'show_image',
description: "Display an image on Jibo's screen from a URL.",
parameters: {
type: 'object',
properties: {
url: { type: 'string', description: 'Image URL.' },
},
required: ['url'],
},
},
},
{
type: 'function',
function: {
name: 'show_eye',
description: "Reset Jibo's screen to the default eye animation.",
parameters: { type: 'object', properties: {} },
},
},
{
type: 'function',
function: {
name: 'look_at_angle',
description: "Turn Jibo's head. theta = yaw (±180°, positive right), psi = pitch (±30°, positive up).",
parameters: {
type: 'object',
properties: {
theta: { type: 'number', description: 'Yaw degrees.' },
psi: { type: 'number', description: 'Pitch degrees.' },
},
required: ['theta', 'psi'],
},
},
},
{
type: 'function',
function: {
name: 'set_volume',
description: "Set Jibo's speaker volume (0.0 1.0).",
parameters: {
type: 'object',
properties: {
level: { type: 'number', description: 'Volume 0.0 to 1.0.' },
},
required: ['level'],
},
},
},
{
type: 'function',
function: {
name: 'web_search',
description:
'Search the web via Brave Search. Use for current events, facts you are unsure of, ' +
'or anything that may have changed since training. Returns titles, URLs, and snippets.',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'The search query.' },
count: {
type: 'number',
description: 'How many results to return (110). Default 5.',
},
freshness: {
type: 'string',
enum: ['pd', 'pw', 'pm', 'py'],
description:
'Optional recency filter: pd=past day, pw=past week, pm=past month, py=past year.',
},
},
required: ['query'],
},
},
},
{
type: 'function',
function: {
name: 'fetch_url',
description:
'Fetch the contents of a web page by URL. Prefers markdown via content ' +
'negotiation (Cloudflare Markdown for Agents) and falls back to HTML→text. ' +
'Use after web_search to read a result, or to traverse linked pages.',
parameters: {
type: 'object',
properties: {
url: { type: 'string', description: 'Absolute http(s) URL to fetch.' },
max_chars: {
type: 'number',
description: 'Truncate the body to this many characters. Default 4000.',
},
},
required: ['url'],
},
},
},
{
type: 'function',
function: {
name: 'end_conversation',
description:
'Call this when the conversation has reached a natural end and you do NOT want to ' +
'listen for another reply. Pair it with a final "say" in the same turn for a farewell.',
parameters: { type: 'object', properties: {} },
},
},
];
// ── Resolution map ─────────────────────────────────────────────────────────────
const RES_MAP = { high: 'highRes', medium: 'medRes', low: 'lowRes' };
// ── Screen text helpers ────────────────────────────────────────────────────────
/**
* Word-wrap text for Jibo's small screen. Breaks oversized words, respects
* existing newlines, and truncates with an ellipsis past `maxLines`.
*/
function wrapForScreen(text, width = 40, maxLines = 10) {
const out = [];
for (const para of String(text).split('\n')) {
if (para === '') { out.push(''); continue; }
let line = '';
for (const word of para.split(/\s+/).filter(Boolean)) {
if (word.length > width) {
if (line) { out.push(line); line = ''; }
for (let i = 0; i < word.length; i += width) {
const chunk = word.slice(i, i + width);
if (chunk.length === width) out.push(chunk);
else line = chunk;
}
continue;
}
const candidate = line ? `${line} ${word}` : word;
if (candidate.length > width) {
out.push(line);
line = word;
} else {
line = candidate;
}
}
if (line) out.push(line);
}
if (out.length > maxLines) {
return out.slice(0, maxLines - 1).concat('…').join('\n');
}
return out.join('\n');
}
/**
* Strip markup the Jibo TTS engine chokes on (markdown, LaTeX, unmatched
* closing tags). Preserves valid ESML self-closing tags like <anim .../> and
* <break .../>. Defense-in-depth against models that ignore the instructions.
*/
function sanitizeForTTS(text) {
const ESML_TAGS = /^(anim|break|prosody|emph|phoneme|phrase|style|voice)\b/i;
return text
// Remove LaTeX inline math: $...$ and $$...$$
.replace(/\${1,2}[^$]{0,200}\${1,2}/g, '')
// Strip code fences and inline backticks
.replace(/```[\s\S]*?```/g, '')
.replace(/`+/g, '')
// Strip markdown emphasis markers but keep the words
.replace(/(\*\*|__)(.*?)\1/g, '$2')
.replace(/(\*|_)(?=\S)(.+?)(?<=\S)\1/g, '$2')
// Drop any tag that isn't a known ESML tag (e.g. </es>, <br>, etc.)
.replace(/<\/?([a-zA-Z][^\s>/]*)\b[^>]*\/?>/g, (m, name) =>
ESML_TAGS.test(name) ? m : '')
// Collapse extra whitespace
.replace(/[ \t]+/g, ' ')
.trim();
}
// ── Abort helpers ──────────────────────────────────────────────────────────────
function throwIfAborted(signal) {
if (signal?.aborted) {
const err = new Error('Conversation aborted');
err.code = 'CONVERSATION_ABORTED';
throw err;
}
}
function onAbort(signal) {
if (!signal) return new Promise(() => { }); // never resolves
return new Promise((_, reject) => {
const handler = () => {
const err = new Error('Conversation aborted');
err.code = 'CONVERSATION_ABORTED';
reject(err);
};
if (signal.aborted) return handler();
signal.addEventListener('abort', handler, { once: true });
});
}
// ── Tool executor ──────────────────────────────────────────────────────────────
/**
* Execute a single tool call against the Jibo client.
*
* Returns { content, image? }.
* - content — text string for the tool-result message
* - image — optional base64 JPEG (only for take_photo)
*
* @param {import('rom-control').Client} client
* @param {string} name Tool function name
* @param {object} args Parsed arguments
* @param {AbortSignal} [signal] Cancellation signal
* @returns {Promise<{ content: string, image?: string }>}
*/
async function executeTool(client, name, args, signal, ctx) {
throwIfAborted(signal);
ctx = ctx || {};
if (!ctx.speechChain) ctx.speechChain = Promise.resolve();
switch (name) {
// ── Communication ──────────────────────────────────────────────────────
case 'say': {
const text = sanitizeForTTS(String(args.text || ''));
console.log(` [tool:say] "${text}" (queued)`);
// Estimate ~80ms per char + 5s base, capped at 60s. Anything longer
// is almost certainly Jibo's TTS hung on bad ESML/markup; we'd rather
// log a warning and unblock the conversation than deadlock listen.
const estimateMs = Math.min(60000, 5000 + text.length * 80);
ctx.speechChain = ctx.speechChain
.then(() => {
const started = Date.now();
console.log(` [tool:say] speaking… (timeout ${estimateMs}ms)`);
let timer;
const timeout = new Promise((resolve) => {
timer = setTimeout(() => {
console.warn(` [tool:say] timed out after ${estimateMs}ms — continuing.`);
resolve();
}, estimateMs);
});
return Promise.race([
client.behavior.say(text, { signal }),
onAbort(signal),
timeout,
]).finally(() => {
clearTimeout(timer);
console.log(` [tool:say] done in ${Date.now() - started}ms`);
});
})
.catch((err) => {
if (err.code === 'CONVERSATION_ABORTED') return;
console.error(' [tool:say] error:', err.message);
});
return { content: 'Speech queued — Jibo will speak it shortly. Continue with other tools; listen will wait for it.' };
}
case 'listen': {
const ms = (args.timeout || 15) * 1000;
// Make sure pending speech finishes before we open the mic, otherwise
// Jibo will hear his own voice.
console.log(' [tool:listen] awaiting pending speech…');
await Promise.race([ctx.speechChain, onAbort(signal)]);
throwIfAborted(signal);
console.log(` [tool:listen] waiting ${ms}ms…`);
client.display.showText('Listening...');
try {
const speech = await Promise.race([
client.audio.awaitSpeech({ mode: 'local', time: ms }),
onAbort(signal),
]);
console.log(` [tool:listen] heard: "${speech.content}"`);
ctx.lastHeard = speech.content;
return { content: `User said: "${speech.content}"` };
} catch (err) {
if (err.code === 'CONVERSATION_ABORTED') throw err;
if (err.code === 'SPEECH_TIMEOUT') {
console.log(' [tool:listen] timed out');
return { content: 'No speech detected — user did not respond.' };
}
throw err;
} finally {
client.display.showEye();
}
}
// ── Camera ─────────────────────────────────────────────────────────────
case 'take_photo': {
const res = RES_MAP[args.resolution] || 'medRes';
console.log(` [tool:take_photo] ${res}`);
const photo = await Promise.race([
client.camera.takePhoto({ resolution: res, timeout: 30000 }),
onAbort(signal),
]);
const buf = await photo.fetchBuffer();
console.log(` [tool:take_photo] ${buf.length} bytes captured`);
return {
content: "Photo captured from Jibo's camera.",
image: buf.toString('base64'),
};
}
// ── Display ────────────────────────────────────────────────────────────
case 'show_text': {
console.log(` [tool:show_text] "${args.text}"`);
client.display.showText(wrapForScreen(args.text, 40, 10));
return { content: 'Text displayed on screen.' };
}
case 'show_image': {
console.log(` [tool:show_image] ${args.url}`);
client.display.showImage(args.url);
return { content: 'Image displayed on screen.' };
}
case 'show_eye': {
console.log(' [tool:show_eye]');
client.display.showEye();
return { content: 'Eye animation restored on screen.' };
}
case 'look_at_angle': {
console.log(` [tool:look_at_angle] θ=${args.theta}° ψ=${args.psi}°`);
await client.behavior.lookAtAngle(args.theta, args.psi);
return { content: `Now looking at θ=${args.theta}°, ψ=${args.psi}°.` };
}
case 'set_volume': {
console.log(` [tool:set_volume] ${args.level}`);
await client.audio.setVolume(args.level);
return { content: `Volume set to ${args.level}.` };
}
// ── Web search ─────────────────────────────────────────────────────────
case 'web_search': {
const apiKey = process.env.BRAVE_API_KEY;
if (!apiKey) {
return {
content:
'web_search is unavailable: BRAVE_API_KEY environment variable is not set.',
};
}
const query = String(args.query || '').trim();
if (!query) {
return { content: 'web_search error: query is required.' };
}
const count = Math.max(1, Math.min(10, Number(args.count) || 5));
const params = new URLSearchParams({
q: query,
count: String(count),
extra_snippets: 'true',
safesearch: 'moderate',
});
if (args.freshness) params.set('freshness', String(args.freshness));
console.log(` [tool:web_search] "${query}" (count=${count})`);
const url = `https://api.search.brave.com/res/v1/web/search?${params.toString()}`;
const ac = new AbortController();
const onAbortHandler = () => ac.abort();
signal?.addEventListener('abort', onAbortHandler, { once: true });
try {
const res = await fetch(url, {
headers: {
Accept: 'application/json',
'Accept-Encoding': 'gzip',
'X-Subscription-Token': apiKey,
},
signal: ac.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
return {
content: `web_search error: ${res.status} ${res.statusText}. ${body.slice(0, 200)}`,
};
}
const data = await res.json();
const results = data?.web?.results || [];
if (results.length === 0) {
return { content: `No web results found for "${query}".` };
}
const lines = results.slice(0, count).map((r, i) => {
const title = r.title || '(untitled)';
const u = r.url || '';
const desc = (r.description || '').replace(/\s+/g, ' ').trim();
const extras = Array.isArray(r.extra_snippets)
? r.extra_snippets.slice(0, 2).map((s) => s.replace(/\s+/g, ' ').trim())
: [];
const tail = extras.length ? `\n${extras.join('\n • ')}` : '';
return `${i + 1}. ${title}\n ${u}\n ${desc}${tail}`;
});
return {
content: `Web results for "${query}":\n\n${lines.join('\n\n')}`,
};
} catch (err) {
if (err.name === 'AbortError') throw Object.assign(new Error('Conversation aborted'), { code: 'CONVERSATION_ABORTED' });
return { content: `web_search error: ${err.message}` };
} finally {
signal?.removeEventListener('abort', onAbortHandler);
}
}
case 'fetch_url': {
const target = String(args.url || '').trim();
if (!/^https?:\/\//i.test(target)) {
return { content: 'fetch_url error: url must be an absolute http(s) URL.' };
}
const maxChars = Math.max(200, Math.min(20000, Number(args.max_chars) || 4000));
console.log(` [tool:fetch_url] ${target}`);
const ac = new AbortController();
const onAbortHandler = () => ac.abort();
signal?.addEventListener('abort', onAbortHandler, { once: true });
const timeoutId = setTimeout(() => ac.abort(), 20000);
try {
const res = await fetch(target, {
headers: {
// Prefer markdown (Cloudflare Markdown for Agents); accept HTML/text fallback.
Accept: 'text/markdown, text/plain;q=0.9, text/html;q=0.8, */*;q=0.1',
'Accept-Encoding': 'gzip',
'User-Agent': 'jibo-llm/1.0 (+agent)',
},
redirect: 'follow',
signal: ac.signal,
});
if (!res.ok) {
return {
content: `fetch_url error: ${res.status} ${res.statusText} from ${target}`,
};
}
const ctype = (res.headers.get('content-type') || '').toLowerCase();
if (!/^(text\/|application\/(json|xml|xhtml))/.test(ctype) && ctype) {
return {
content: `fetch_url: refusing non-text content (${ctype}) from ${target}`,
};
}
let body = await res.text();
const isMarkdown = ctype.includes('markdown');
const isHtml = ctype.includes('html') || /<html[\s>]/i.test(body.slice(0, 500));
if (!isMarkdown && isHtml) {
// Lightweight HTML→text: strip scripts/styles/tags, collapse whitespace.
body = body
.replace(/<script[\s\S]*?<\/script>/gi, ' ')
.replace(/<style[\s\S]*?<\/style>/gi, ' ')
.replace(/<noscript[\s\S]*?<\/noscript>/gi, ' ')
.replace(/<!--[\s\S]*?-->/g, ' ')
.replace(/<\/(p|div|li|h[1-6]|br|tr)>/gi, '\n')
.replace(/<[^>]+>/g, ' ')
.replace(/&nbsp;/g, ' ')
.replace(/&amp;/g, '&')
.replace(/&lt;/g, '<')
.replace(/&gt;/g, '>')
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'")
.replace(/[ \t]+/g, ' ')
.replace(/\n{3,}/g, '\n\n')
.trim();
}
const truncated = body.length > maxChars;
const out = truncated ? body.slice(0, maxChars) + '\n…[truncated]' : body;
const finalUrl = res.url || target;
const fmt = isMarkdown ? 'markdown' : isHtml ? 'html→text' : 'text';
return {
content: `Fetched ${finalUrl} (${fmt}, ${body.length} chars${truncated ? `, truncated to ${maxChars}` : ''}):\n\n${out}`,
};
} catch (err) {
if (err.name === 'AbortError') {
if (signal?.aborted) {
throw Object.assign(new Error('Conversation aborted'), { code: 'CONVERSATION_ABORTED' });
}
return { content: `fetch_url error: timeout fetching ${target}` };
}
return { content: `fetch_url error: ${err.message}` };
} finally {
clearTimeout(timeoutId);
signal?.removeEventListener('abort', onAbortHandler);
}
}
case 'end_conversation': {
console.log(' [tool:end_conversation] awaiting pending speech…');
await Promise.race([ctx.speechChain, onAbort(signal)]);
return { content: 'Conversation ended.', endConversation: true };
}
default:
return { content: `Unknown tool "${name}".` };
}
}
module.exports = { TOOL_SCHEMAS, executeTool, wrapForScreen };