README.md

# Re-Commander

A local web-based control interface for the Jibo social robot. Re-Commander connects directly to Jibo's on-device ROM (Robot Operating Module) over your local network, giving you a browser UI to control head movement, speech, listening, display, camera, and an LLM voice-AI loop — all without any cloud dependency.

![Re-Commander UI](screenshot.png)

---

## Requirements

- **Node.js** 18 or later
- **Jibo robot** on the same local network, running in **`int-developer` mode** (see below)
- A modern browser (Chrome, Firefox, Edge)

---

## Robot prerequisite — `int-developer` mode

> **This is required. Re-Commander will not work without it.**

Jibo must be placed into `int-developer` mode before the server can open a WebSocket session with it. In this mode the robot exposes its ROM WebSocket API on port 8160 and its local ASR service on port 8088.

---

## Installation

```bash
git clone https://github.com/youruser/re-commander.git
cd re-commander
npm install
```

---

## Configuration

### 1. Robot IP address

Open `server.js` and set `JIBO_HOST` to your robot's local IP:

```js
const JIBO_HOST = '192.168.1.217';   // ← change this
const JIBO_PORT = 8160;              // leave as-is
```

Find your robot's IP in your router's device list, or check the Jibo app under **Settings → Wi-Fi**.

### 2. Environment variables (optional)

Create a `.env` file in the project root to configure the LLM integration and server port:

```env
# Port the web UI is served on (default: 3000)
PORT=3000

# OpenAI-compatible LLM endpoint (default: local Ollama)
LLM_ENDPOINT=http://localhost:11434/v1/chat/completions

# Model name passed to the endpoint
LLM_MODEL=google/gemma-4-e4b

# API key — set if your endpoint requires one (e.g. OpenAI, Anthropic proxy)
LLM_API_KEY=sk-...
```

If no `.env` is present the server defaults to port 3000 and a local Ollama instance. The LLM system prompt is embedded in `server.js` and pre-configured to make Jibo respond in ESML (Embodied Speech Markup Language) with animations and sound effects.

---

## Running

```bash
npm start
```

Then open `http://localhost:3000` in your browser.

The server immediately begins connecting to the robot. The status indicator in the top-left of the UI turns green once a session is established. If the robot is unreachable the server retries every 3 seconds automatically.

---

## UI overview

The interface is divided into three panels.

### Left panel — Controls

**Head Navigation**
- Arrow pad (↑ ↓ ← →): moves Jibo's head. Each press sends one step command and waits for the robot to acknowledge before sending the next, so there is no command queuing or drift.
- Clicking anywhere on the camera feed makes Jibo look at that point in the scene.
- The **Track** checkbox keeps the robot tracking that point continuously.

**Say**
- Type any text (plain or ESML markup) and press **▶ Say**.
- **✕ Stop** cancels mid-speech.

**Listen**
- Triggers Jibo's on-device ASR (port 8088, no cloud required).
- Configure the no-speech and max-speech timeouts before pressing **🎙 Listen**.
- The transcribed result appears below the buttons.
- **Auto-listen on hotword**: when checked, Re-Commander listens automatically every time the wake word ("Hey Jibo") is detected.

**Attention Mode**
- Sets Jibo's attention state: `OFF`, `IDLE`, `DISENGAGE`, `ENGAGED`, `SPEAKING`, `FIXATED`, `ATTRACTABLE`, `COMMAND`.

**Volume**
- Drag the slider and press **Set Volume** (0 – 100%).

**Voice AI**
- Enables a continuous listen → LLM → speak loop.
- The LLM receives the transcribed speech and returns ESML; Jibo speaks the reply with matching animations.
- Configure the endpoint, model, and system prompt in `.env` or via the UI fields.

### Center panel — Camera & photos

- The camera feed displays the live MJPEG stream from Jibo when video is active, or the most recent photo otherwise.
- Click anywhere on the feed to make Jibo look at that point.
- The **Photo Strip** below the feed shows all photos taken this session. Click any thumbnail to open it full-size.

### Right panel — Tabs

**Camera tab**
- **Start / Stop Video**: starts or stops the live MJPEG stream from Jibo's right camera.
- **Take Photo**: captures a still from the selected camera at the selected resolution. Photos are automatically saved to the `photos/` directory in the project folder and served from `/photos/<filename>`.
- **Subscriptions**: toggle Entity (person detection), Motion, and Head Touch event streams on/off.

**Display tab**
- **Show Eye**: displays Jibo's animated eye graphic on his screen.
- **Play Animation**: select and play any of Jibo's built-in eye animations (blinks, expressions, emoji, transitions, and more) from a curated dropdown.
- **Show Text / Show Image**: display a text string or a remote image URL on Jibo's screen.

**Entities tab**
- Live list of people Jibo's vision system has detected, with entity ID, confidence, and screen coordinates.
- Head Touch display shows which pads on Jibo's head are currently being touched.

**Log tab**
- Real-time event log of every message received from the robot (LookAt events, touch events, ASR results, errors, etc.).

---

## Keyboard shortcuts

| Key | Action |
|-----|--------|
| `↑` `↓` `←` `→` | Move Jibo's head |
| `Space` | Center Jibo's head |

Arrow keys are ignored when a text input is focused.

---

## Photos

Every photo taken is saved to `<project-root>/photos/` as `photo_<timestamp>.jpg`. The directory is created automatically on startup. Photos are served at `http://localhost:3000/photos/<filename>` and persist across server restarts.

---

## LLM / Voice AI integration

Re-Commander proxies LLM requests server-side so your API key never touches the browser. Any OpenAI-compatible endpoint works:

| Provider | `LLM_ENDPOINT` | Notes |
|----------|---------------|-------|
| Local Ollama | `http://localhost:11434/v1/chat/completions` | Default; no key needed |
| OpenAI | `https://api.openai.com/v1/chat/completions` | Set `LLM_API_KEY` |
| Anthropic (via proxy) | your proxy URL | Set `LLM_API_KEY` |
| Any OpenAI-compatible | any URL | Set `LLM_API_KEY` if required |

The built-in system prompt instructs the model to respond exclusively in ESML — Jibo's markup language that simultaneously drives speech, body animations, screen graphics, and audio effects. You can override it by setting `LLM_SYSTEM_PROMPT` in `.env`.

---

## Architecture

```
Browser (app.js)
    │  WebSocket /ws          REST /api/*
    ▼                          ▼
server.js (Node/Express)
    │
    ├─ JiboClient ──── WebSocket ──► Jibo ROM :8160
    │       └─ WakewordWatcher ─── WebSocket ──► Jibo ASR :8088
    │
    └─ /photos  (static file serving)
```

- The server maintains a persistent WebSocket to the robot and reconnects automatically.
- A heartbeat (`GetConfig` every 9 s) keeps the session alive past Jibo's 10 s inactivity timeout.
- The wakeword watcher maintains a separate persistent connection to the always-on ASR task and forwards hotphrase events to the browser.
- All robot events are broadcast to every connected browser tab over the `/ws` WebSocket.

---

## Troubleshooting

**"Connecting…" never turns green**
- Confirm the robot is in `int-developer` mode and on the same network.
- Check that `JIBO_HOST` in `server.js` matches the robot's IP.
- Try `curl http://<robot-ip>:8160/request` from the machine running the server.

**Listen / ASR does nothing**
- The local ASR service runs on port 8088. Confirm the robot is in `int-developer` mode (it exposes the ASR service only in that mode).

**LLM responses don't work**
- Check `LLM_ENDPOINT` and `LLM_MODEL` in `.env`.
- For local Ollama, make sure the model is pulled: `ollama pull llama3`.
- For cloud endpoints, verify `LLM_API_KEY` is set correctly.

**Photos are not appearing**
- The `photos/` directory is created automatically. Check the server console for `[photo] saved:` log lines.
- If the robot disconnects immediately after taking a photo, the fetch from port 8160 may time out — reconnect and try again.

---

## License

MIT
-												Initial release — Re-Commander v1.0

Local web-based control interface for the Jibo social robot via the ROM
WebSocket API (port 8160) and on-device ASR (port 8088). Features head
navigation via click-to-look and arrow keys, speech/listen/Voice-AI loop,
display control, camera/photo capture, and entity tracking — no cloud
dependency required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-04-19 02:40:41 -04:00
+								# Re-Commander
 								A local web-based control interface for the Jibo social robot. Re-Commander connects directly to Jibo's on-device ROM (Robot Operating Module) over your local network, giving you a browser UI to control head movement, speech, listening, display, camera, and an LLM voice-AI loop — all without any cloud dependency.
-												Update README and add screenshot

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-04-19 02:49:12 -04:00
+								![Re-Commander UI](screenshot.png)
-												Initial release — Re-Commander v1.0

Local web-based control interface for the Jibo social robot via the ROM
WebSocket API (port 8160) and on-device ASR (port 8088). Features head
navigation via click-to-look and arrow keys, speech/listen/Voice-AI loop,
display control, camera/photo capture, and entity tracking — no cloud
dependency required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-04-19 02:40:41 -04:00
 								---
 								## Requirements
 								- **Node.js** 18 or later
 								- **Jibo robot** on the same local network, running in **`int-developer` mode** (see below)
 								- A modern browser (Chrome, Firefox, Edge)
 								---
 								## Robot prerequisite — `int-developer` mode
 								> **This is required. Re-Commander will not work without it.**
 								Jibo must be placed into `int-developer` mode before the server can open a WebSocket session with it. In this mode the robot exposes its ROM WebSocket API on port 8160 and its local ASR service on port 8088.
 								---
 								## Installation
 								```bash
 								git clone https://github.com/youruser/re-commander.git
 								cd re-commander
 								npm install
 								```
 								---
 								## Configuration
 								### 1. Robot IP address
 								Open `server.js` and set `JIBO_HOST` to your robot's local IP:
 								```js
 								const JIBO_HOST = '192.168.1.217';   // ← change this
 								const JIBO_PORT = 8160;              // leave as-is
 								```
 								Find your robot's IP in your router's device list, or check the Jibo app under **Settings → Wi-Fi**.
 								### 2. Environment variables (optional)
 								Create a `.env` file in the project root to configure the LLM integration and server port:
 								```env
 								# Port the web UI is served on (default: 3000)
 								PORT=3000
 								# OpenAI-compatible LLM endpoint (default: local Ollama)
 								LLM_ENDPOINT=http://localhost:11434/v1/chat/completions
 								# Model name passed to the endpoint
-												Update README and add screenshot

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-04-19 02:49:12 -04:00
+								LLM_MODEL=google/gemma-4-e4b
-												Initial release — Re-Commander v1.0

Local web-based control interface for the Jibo social robot via the ROM
WebSocket API (port 8160) and on-device ASR (port 8088). Features head
navigation via click-to-look and arrow keys, speech/listen/Voice-AI loop,
display control, camera/photo capture, and entity tracking — no cloud
dependency required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

											
										
										
											2026-04-19 02:40:41 -04:00
 								# API key — set if your endpoint requires one (e.g. OpenAI, Anthropic proxy)
 								LLM_API_KEY=sk-...
 								```
 								If no `.env` is present the server defaults to port 3000 and a local Ollama instance. The LLM system prompt is embedded in `server.js` and pre-configured to make Jibo respond in ESML (Embodied Speech Markup Language) with animations and sound effects.
 								---
 								## Running
 								```bash
 								npm start
 								```
 								Then open `http://localhost:3000` in your browser.
 								The server immediately begins connecting to the robot. The status indicator in the top-left of the UI turns green once a session is established. If the robot is unreachable the server retries every 3 seconds automatically.
 								---
 								## UI overview
 								The interface is divided into three panels.
 								### Left panel — Controls
 								**Head Navigation**
 								- Arrow pad (↑ ↓ ← →): moves Jibo's head. Each press sends one step command and waits for the robot to acknowledge before sending the next, so there is no command queuing or drift.
 								- Clicking anywhere on the camera feed makes Jibo look at that point in the scene.
 								- The **Track** checkbox keeps the robot tracking that point continuously.
 								**Say**
 								- Type any text (plain or ESML markup) and press **▶ Say**.
 								- **✕ Stop** cancels mid-speech.
 								**Listen**
 								- Triggers Jibo's on-device ASR (port 8088, no cloud required).
 								- Configure the no-speech and max-speech timeouts before pressing **🎙 Listen**.
 								- The transcribed result appears below the buttons.
 								- **Auto-listen on hotword**: when checked, Re-Commander listens automatically every time the wake word ("Hey Jibo") is detected.
 								**Attention Mode**
 								- Sets Jibo's attention state: `OFF`, `IDLE`, `DISENGAGE`, `ENGAGED`, `SPEAKING`, `FIXATED`, `ATTRACTABLE`, `COMMAND`.
 								**Volume**
 								- Drag the slider and press **Set Volume** (0 – 100%).
 								**Voice AI**
 								- Enables a continuous listen → LLM → speak loop.
 								- The LLM receives the transcribed speech and returns ESML; Jibo speaks the reply with matching animations.
 								- Configure the endpoint, model, and system prompt in `.env` or via the UI fields.
 								### Center panel — Camera & photos
 								- The camera feed displays the live MJPEG stream from Jibo when video is active, or the most recent photo otherwise.
 								- Click anywhere on the feed to make Jibo look at that point.
 								- The **Photo Strip** below the feed shows all photos taken this session. Click any thumbnail to open it full-size.
 								### Right panel — Tabs
 								**Camera tab**
 								- **Start / Stop Video**: starts or stops the live MJPEG stream from Jibo's right camera.
 								- **Take Photo**: captures a still from the selected camera at the selected resolution. Photos are automatically saved to the `photos/` directory in the project folder and served from `/photos/<filename>`.
 								- **Subscriptions**: toggle Entity (person detection), Motion, and Head Touch event streams on/off.
 								**Display tab**
 								- **Show Eye**: displays Jibo's animated eye graphic on his screen.
 								- **Play Animation**: select and play any of Jibo's built-in eye animations (blinks, expressions, emoji, transitions, and more) from a curated dropdown.
 								- **Show Text / Show Image**: display a text string or a remote image URL on Jibo's screen.
 								**Entities tab**
 								- Live list of people Jibo's vision system has detected, with entity ID, confidence, and screen coordinates.
 								- Head Touch display shows which pads on Jibo's head are currently being touched.
 								**Log tab**
 								- Real-time event log of every message received from the robot (LookAt events, touch events, ASR results, errors, etc.).
 								---
 								## Keyboard shortcuts
 								| Key | Action |
 								|-----|--------|
 								| `↑` `↓` `←` `→` | Move Jibo's head |
 								| `Space` | Center Jibo's head |
 								Arrow keys are ignored when a text input is focused.
 								---
 								## Photos
 								Every photo taken is saved to `<project-root>/photos/` as `photo_<timestamp>.jpg`. The directory is created automatically on startup. Photos are served at `http://localhost:3000/photos/<filename>` and persist across server restarts.
 								---
 								## LLM / Voice AI integration
 								Re-Commander proxies LLM requests server-side so your API key never touches the browser. Any OpenAI-compatible endpoint works:
 								| Provider | `LLM_ENDPOINT` | Notes |
 								|----------|---------------|-------|
 								| Local Ollama | `http://localhost:11434/v1/chat/completions` | Default; no key needed |
 								| OpenAI | `https://api.openai.com/v1/chat/completions` | Set `LLM_API_KEY` |
 								| Anthropic (via proxy) | your proxy URL | Set `LLM_API_KEY` |
 								| Any OpenAI-compatible | any URL | Set `LLM_API_KEY` if required |
 								The built-in system prompt instructs the model to respond exclusively in ESML — Jibo's markup language that simultaneously drives speech, body animations, screen graphics, and audio effects. You can override it by setting `LLM_SYSTEM_PROMPT` in `.env`.
 								---
 								## Architecture
 								```
 								Browser (app.js)
 								    │  WebSocket /ws          REST /api/*
 								    ▼                          ▼
 								server.js (Node/Express)
 								    │
 								    ├─ JiboClient ──── WebSocket ──► Jibo ROM :8160
 								    │       └─ WakewordWatcher ─── WebSocket ──► Jibo ASR :8088
 								    │
 								    └─ /photos  (static file serving)
 								```
 								- The server maintains a persistent WebSocket to the robot and reconnects automatically.
 								- A heartbeat (`GetConfig` every 9 s) keeps the session alive past Jibo's 10 s inactivity timeout.
 								- The wakeword watcher maintains a separate persistent connection to the always-on ASR task and forwards hotphrase events to the browser.
 								- All robot events are broadcast to every connected browser tab over the `/ws` WebSocket.
 								---
 								## Troubleshooting
 								**"Connecting…" never turns green**
 								- Confirm the robot is in `int-developer` mode and on the same network.
 								- Check that `JIBO_HOST` in `server.js` matches the robot's IP.
 								- Try `curl http://<robot-ip>:8160/request` from the machine running the server.
 								**Listen / ASR does nothing**
 								- The local ASR service runs on port 8088. Confirm the robot is in `int-developer` mode (it exposes the ASR service only in that mode).
 								**LLM responses don't work**
 								- Check `LLM_ENDPOINT` and `LLM_MODEL` in `.env`.
 								- For local Ollama, make sure the model is pulled: `ollama pull llama3`.
 								- For cloud endpoints, verify `LLM_API_KEY` is set correctly.
 								**Photos are not appearing**
 								- The `photos/` directory is created automatically. Check the server console for `[photo] saved:` log lines.
 								- If the robot disconnects immediately after taking a photo, the fetch from port 8160 may time out — reconnect and try again.
 								---
 								## License
 								MIT