Go to file

pasketti 11d72f1e75 Initial release — Re-Commander v1.0

Local web-based control interface for the Jibo social robot via the ROM
WebSocket API (port 8160) and on-device ASR (port 8088). Features head
navigation via click-to-look and arrow keys, speech/listen/Voice-AI loop,
display control, camera/photo capture, and entity tracking — no cloud
dependency required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-19 02:40:41 -04:00

public

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

.env.example

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

.gitignore

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

package-lock.json

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

package.json

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

README.md

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

server.js

Initial release — Re-Commander v1.0

2026-04-19 02:40:41 -04:00

README.md

Re-Commander

A local web-based control interface for the Jibo social robot. Re-Commander connects directly to Jibo's on-device ROM (Robot Operating Module) over your local network, giving you a browser UI to control head movement, speech, listening, display, camera, and an LLM voice-AI loop — all without any cloud dependency.

Requirements

Node.js 18 or later
Jibo robot on the same local network, running in int-developer mode (see below)
A modern browser (Chrome, Firefox, Edge)

Robot prerequisite — `int-developer` mode

This is required. Re-Commander will not work without it.

Jibo must be placed into int-developer mode before the server can open a WebSocket session with it. In this mode the robot exposes its ROM WebSocket API on port 8160 and its local ASR service on port 8088.

To enable it:

On your phone, open the Jibo app and connect to your robot.
Navigate to Settings → Developer Options and enable Developer Mode.
On the robot itself (or via the app), switch the operating mode to int-developer.
The robot will reboot. Once it is back up and on Wi-Fi, its ROM API will be accessible at <robot-ip>:8160.

If you are unsure whether your robot is in the right mode, try curl http://<robot-ip>:8160/request — if you get a JSON response (even an error), the port is open.

Installation

git clone https://github.com/youruser/re-commander.git
cd re-commander
npm install

Configuration

1. Robot IP address

Open server.js and set JIBO_HOST to your robot's local IP:

const JIBO_HOST = '192.168.1.217';   // ← change this
const JIBO_PORT = 8160;              // leave as-is

Find your robot's IP in your router's device list, or check the Jibo app under Settings → Wi-Fi.

2. Environment variables (optional)

Create a .env file in the project root to configure the LLM integration and server port:

# Port the web UI is served on (default: 3000)
PORT=3000

# OpenAI-compatible LLM endpoint (default: local Ollama)
LLM_ENDPOINT=http://localhost:11434/v1/chat/completions

# Model name passed to the endpoint
LLM_MODEL=llama3

# API key — set if your endpoint requires one (e.g. OpenAI, Anthropic proxy)
LLM_API_KEY=sk-...

If no .env is present the server defaults to port 3000 and a local Ollama instance. The LLM system prompt is embedded in server.js and pre-configured to make Jibo respond in ESML (Embodied Speech Markup Language) with animations and sound effects.

Running

npm start

Then open http://localhost:3000 in your browser.

The server immediately begins connecting to the robot. The status indicator in the top-left of the UI turns green once a session is established. If the robot is unreachable the server retries every 3 seconds automatically.

UI overview

The interface is divided into three panels.

Left panel — Controls

Head Navigation

Arrow pad (↑ ↓ ← →): moves Jibo's head. Each press sends one step command and waits for the robot to acknowledge before sending the next, so there is no command queuing or drift.
Clicking anywhere on the camera feed makes Jibo look at that point in the scene.
The Track checkbox keeps the robot tracking that point continuously.

Say

Type any text (plain or ESML markup) and press ▶ Say.
✕ Stop cancels mid-speech.

Listen

Triggers Jibo's on-device ASR (port 8088, no cloud required).
Configure the no-speech and max-speech timeouts before pressing 🎙 Listen.
The transcribed result appears below the buttons.
Auto-listen on hotword: when checked, Re-Commander listens automatically every time the wake word ("Hey Jibo") is detected.

Attention Mode

Sets Jibo's attention state: OFF, IDLE, DISENGAGE, ENGAGED, SPEAKING, FIXATED, ATTRACTABLE, COMMAND.

Volume

Drag the slider and press Set Volume (0 – 100%).

Voice AI

Enables a continuous listen → LLM → speak loop.
The LLM receives the transcribed speech and returns ESML; Jibo speaks the reply with matching animations.
Configure the endpoint, model, and system prompt in .env or via the UI fields.

Center panel — Camera & photos

The camera feed displays the live MJPEG stream from Jibo when video is active, or the most recent photo otherwise.
Click anywhere on the feed to make Jibo look at that point.
The Photo Strip below the feed shows all photos taken this session. Click any thumbnail to open it full-size.

Right panel — Tabs

Camera tab

Start / Stop Video: starts or stops the live MJPEG stream from Jibo's right camera.
Take Photo: captures a still from the selected camera at the selected resolution. Photos are automatically saved to the photos/ directory in the project folder and served from /photos/<filename>.
Subscriptions: toggle Entity (person detection), Motion, and Head Touch event streams on/off.

Display tab

Show Eye: displays Jibo's animated eye graphic on his screen.
Play Animation: select and play any of Jibo's built-in eye animations (blinks, expressions, emoji, transitions, and more) from a curated dropdown.
Show Text / Show Image: display a text string or a remote image URL on Jibo's screen.

Entities tab

Live list of people Jibo's vision system has detected, with entity ID, confidence, and screen coordinates.
Head Touch display shows which pads on Jibo's head are currently being touched.

Log tab

Real-time event log of every message received from the robot (LookAt events, touch events, ASR results, errors, etc.).

Keyboard shortcuts

Key	Action
`↑` `↓` `←` `→`	Move Jibo's head
`Space`	Center Jibo's head

Arrow keys are ignored when a text input is focused.

Photos

Every photo taken is saved to <project-root>/photos/ as photo_<timestamp>.jpg. The directory is created automatically on startup. Photos are served at http://localhost:3000/photos/<filename> and persist across server restarts.

LLM / Voice AI integration

Re-Commander proxies LLM requests server-side so your API key never touches the browser. Any OpenAI-compatible endpoint works:

Provider	`LLM_ENDPOINT`	Notes
Local Ollama	`http://localhost:11434/v1/chat/completions`	Default; no key needed
OpenAI	`https://api.openai.com/v1/chat/completions`	Set `LLM_API_KEY`
Anthropic (via proxy)	your proxy URL	Set `LLM_API_KEY`
Any OpenAI-compatible	any URL	Set `LLM_API_KEY` if required

The built-in system prompt instructs the model to respond exclusively in ESML — Jibo's markup language that simultaneously drives speech, body animations, screen graphics, and audio effects. You can override it by setting LLM_SYSTEM_PROMPT in .env.

Architecture

Browser (app.js)
    │  WebSocket /ws          REST /api/*
    ▼                          ▼
server.js (Node/Express)
    │
    ├─ JiboClient ──── WebSocket ──► Jibo ROM :8160
    │       └─ WakewordWatcher ─── WebSocket ──► Jibo ASR :8088
    │
    └─ /photos  (static file serving)

The server maintains a persistent WebSocket to the robot and reconnects automatically.
A heartbeat (GetConfig every 9 s) keeps the session alive past Jibo's 10 s inactivity timeout.
The wakeword watcher maintains a separate persistent connection to the always-on ASR task and forwards hotphrase events to the browser.
All robot events are broadcast to every connected browser tab over the /ws WebSocket.

Troubleshooting

"Connecting…" never turns green

Confirm the robot is in int-developer mode and on the same network.
Check that JIBO_HOST in server.js matches the robot's IP.
Try curl http://<robot-ip>:8160/request from the machine running the server.

Listen / ASR does nothing

The local ASR service runs on port 8088. Confirm the robot is in int-developer mode (it exposes the ASR service only in that mode).

LLM responses don't work

Check LLM_ENDPOINT and LLM_MODEL in .env.
For local Ollama, make sure the model is pulled: ollama pull llama3.
For cloud endpoints, verify LLM_API_KEY is set correctly.

Photos are not appearing

The photos/ directory is created automatically. Check the server console for [photo] saved: log lines.
If the robot disconnects immediately after taking a photo, the fetch from port 8160 may time out — reconnect and try again.

License

MIT

README.md Unescape Escape

Re-Commander

Requirements

Robot prerequisite — int-developer mode

Installation

Configuration

1. Robot IP address

2. Environment variables (optional)

Running

UI overview

Left panel — Controls

Center panel — Camera & photos

Right panel — Tabs

Keyboard shortcuts

Photos

LLM / Voice AI integration

Architecture

Troubleshooting

License

README.md

Robot prerequisite — `int-developer` mode