Files
Re-Commander/README.md
pasketti bba2301ba6 Update README and add screenshot
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 02:49:12 -04:00

218 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Re-Commander
A local web-based control interface for the Jibo social robot. Re-Commander connects directly to Jibo's on-device ROM (Robot Operating Module) over your local network, giving you a browser UI to control head movement, speech, listening, display, camera, and an LLM voice-AI loop — all without any cloud dependency.
![Re-Commander UI](screenshot.png)
---
## Requirements
- **Node.js** 18 or later
- **Jibo robot** on the same local network, running in **`int-developer` mode** (see below)
- A modern browser (Chrome, Firefox, Edge)
---
## Robot prerequisite — `int-developer` mode
> **This is required. Re-Commander will not work without it.**
Jibo must be placed into `int-developer` mode before the server can open a WebSocket session with it. In this mode the robot exposes its ROM WebSocket API on port 8160 and its local ASR service on port 8088.
---
## Installation
```bash
git clone https://github.com/youruser/re-commander.git
cd re-commander
npm install
```
---
## Configuration
### 1. Robot IP address
Open `server.js` and set `JIBO_HOST` to your robot's local IP:
```js
const JIBO_HOST = '192.168.1.217'; // ← change this
const JIBO_PORT = 8160; // leave as-is
```
Find your robot's IP in your router's device list, or check the Jibo app under **Settings → Wi-Fi**.
### 2. Environment variables (optional)
Create a `.env` file in the project root to configure the LLM integration and server port:
```env
# Port the web UI is served on (default: 3000)
PORT=3000
# OpenAI-compatible LLM endpoint (default: local Ollama)
LLM_ENDPOINT=http://localhost:11434/v1/chat/completions
# Model name passed to the endpoint
LLM_MODEL=google/gemma-4-e4b
# API key — set if your endpoint requires one (e.g. OpenAI, Anthropic proxy)
LLM_API_KEY=sk-...
```
If no `.env` is present the server defaults to port 3000 and a local Ollama instance. The LLM system prompt is embedded in `server.js` and pre-configured to make Jibo respond in ESML (Embodied Speech Markup Language) with animations and sound effects.
---
## Running
```bash
npm start
```
Then open `http://localhost:3000` in your browser.
The server immediately begins connecting to the robot. The status indicator in the top-left of the UI turns green once a session is established. If the robot is unreachable the server retries every 3 seconds automatically.
---
## UI overview
The interface is divided into three panels.
### Left panel — Controls
**Head Navigation**
- Arrow pad (↑ ↓ ← →): moves Jibo's head. Each press sends one step command and waits for the robot to acknowledge before sending the next, so there is no command queuing or drift.
- Clicking anywhere on the camera feed makes Jibo look at that point in the scene.
- The **Track** checkbox keeps the robot tracking that point continuously.
**Say**
- Type any text (plain or ESML markup) and press **▶ Say**.
- **✕ Stop** cancels mid-speech.
**Listen**
- Triggers Jibo's on-device ASR (port 8088, no cloud required).
- Configure the no-speech and max-speech timeouts before pressing **🎙 Listen**.
- The transcribed result appears below the buttons.
- **Auto-listen on hotword**: when checked, Re-Commander listens automatically every time the wake word ("Hey Jibo") is detected.
**Attention Mode**
- Sets Jibo's attention state: `OFF`, `IDLE`, `DISENGAGE`, `ENGAGED`, `SPEAKING`, `FIXATED`, `ATTRACTABLE`, `COMMAND`.
**Volume**
- Drag the slider and press **Set Volume** (0 100%).
**Voice AI**
- Enables a continuous listen → LLM → speak loop.
- The LLM receives the transcribed speech and returns ESML; Jibo speaks the reply with matching animations.
- Configure the endpoint, model, and system prompt in `.env` or via the UI fields.
### Center panel — Camera & photos
- The camera feed displays the live MJPEG stream from Jibo when video is active, or the most recent photo otherwise.
- Click anywhere on the feed to make Jibo look at that point.
- The **Photo Strip** below the feed shows all photos taken this session. Click any thumbnail to open it full-size.
### Right panel — Tabs
**Camera tab**
- **Start / Stop Video**: starts or stops the live MJPEG stream from Jibo's right camera.
- **Take Photo**: captures a still from the selected camera at the selected resolution. Photos are automatically saved to the `photos/` directory in the project folder and served from `/photos/<filename>`.
- **Subscriptions**: toggle Entity (person detection), Motion, and Head Touch event streams on/off.
**Display tab**
- **Show Eye**: displays Jibo's animated eye graphic on his screen.
- **Play Animation**: select and play any of Jibo's built-in eye animations (blinks, expressions, emoji, transitions, and more) from a curated dropdown.
- **Show Text / Show Image**: display a text string or a remote image URL on Jibo's screen.
**Entities tab**
- Live list of people Jibo's vision system has detected, with entity ID, confidence, and screen coordinates.
- Head Touch display shows which pads on Jibo's head are currently being touched.
**Log tab**
- Real-time event log of every message received from the robot (LookAt events, touch events, ASR results, errors, etc.).
---
## Keyboard shortcuts
| Key | Action |
|-----|--------|
| `↑` `↓` `←` `→` | Move Jibo's head |
| `Space` | Center Jibo's head |
Arrow keys are ignored when a text input is focused.
---
## Photos
Every photo taken is saved to `<project-root>/photos/` as `photo_<timestamp>.jpg`. The directory is created automatically on startup. Photos are served at `http://localhost:3000/photos/<filename>` and persist across server restarts.
---
## LLM / Voice AI integration
Re-Commander proxies LLM requests server-side so your API key never touches the browser. Any OpenAI-compatible endpoint works:
| Provider | `LLM_ENDPOINT` | Notes |
|----------|---------------|-------|
| Local Ollama | `http://localhost:11434/v1/chat/completions` | Default; no key needed |
| OpenAI | `https://api.openai.com/v1/chat/completions` | Set `LLM_API_KEY` |
| Anthropic (via proxy) | your proxy URL | Set `LLM_API_KEY` |
| Any OpenAI-compatible | any URL | Set `LLM_API_KEY` if required |
The built-in system prompt instructs the model to respond exclusively in ESML — Jibo's markup language that simultaneously drives speech, body animations, screen graphics, and audio effects. You can override it by setting `LLM_SYSTEM_PROMPT` in `.env`.
---
## Architecture
```
Browser (app.js)
│ WebSocket /ws REST /api/*
▼ ▼
server.js (Node/Express)
├─ JiboClient ──── WebSocket ──► Jibo ROM :8160
│ └─ WakewordWatcher ─── WebSocket ──► Jibo ASR :8088
└─ /photos (static file serving)
```
- The server maintains a persistent WebSocket to the robot and reconnects automatically.
- A heartbeat (`GetConfig` every 9 s) keeps the session alive past Jibo's 10 s inactivity timeout.
- The wakeword watcher maintains a separate persistent connection to the always-on ASR task and forwards hotphrase events to the browser.
- All robot events are broadcast to every connected browser tab over the `/ws` WebSocket.
---
## Troubleshooting
**"Connecting…" never turns green**
- Confirm the robot is in `int-developer` mode and on the same network.
- Check that `JIBO_HOST` in `server.js` matches the robot's IP.
- Try `curl http://<robot-ip>:8160/request` from the machine running the server.
**Listen / ASR does nothing**
- The local ASR service runs on port 8088. Confirm the robot is in `int-developer` mode (it exposes the ASR service only in that mode).
**LLM responses don't work**
- Check `LLM_ENDPOINT` and `LLM_MODEL` in `.env`.
- For local Ollama, make sure the model is pulled: `ollama pull llama3`.
- For cloud endpoints, verify `LLM_API_KEY` is set correctly.
**Photos are not appearing**
- The `photos/` directory is created automatically. Check the server console for `[photo] saved:` log lines.
- If the robot disconnects immediately after taking a photo, the fetch from port 8160 may time out — reconnect and try again.
---
## License
MIT