Files
rom-control/API.md

607 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# rom-control
Discord.js-style OOP client for the Jibo ROM WebSocket API (port 8160).
**Requires:** Node.js ≥ 16, `ws` ^8.14.2
```js
// CommonJS
const { Client, AttentionMode } = require('rom-control');
// ESM
import { Client, AttentionMode } from 'rom-control';
```
---
## Quick Start
```js
const { Client, AttentionMode } = require('rom-control');
const client = new Client({ host: '192.168.1.217' });
client.once('ready', () => {
console.log('Connected, session:', client.sessionID);
});
client.on('trackCreate', async (track) => {
await track.lookAt();
});
await client.connect();
await client.behavior.setAttention(AttentionMode.Engaged);
await client.behavior.say("Hello! I'm Jibo.");
try {
const speech = await client.audio.awaitSpeech({ mode: 'local', time: 15000 });
await client.behavior.say(`You said: ${speech.content}`);
} catch {
// SPEECH_TIMEOUT — no input detected
}
client.destroy();
```
---
## `new Client(options?)`
| Option | Type | Default | Description |
|---|---|---|---|
| `host` | `string` | `'192.168.1.217'` | Robot IP address |
| `port` | `number` | `8160` | ROM WebSocket port |
| `appId` | `string` | `'ImmaLittleTeapot'` | ACO app identifier |
| `autoReconnect` | `boolean` | `true` | Reconnect on disconnect |
| `reconnectDelay` | `number` | `3000` | ms between reconnect attempts |
| `heartbeatInterval` | `number` | `9000` | ms between GetConfig keepalives |
| `autoHeartbeat` | `boolean` | `true` | Send GetConfig keepalives automatically |
| `autoSubscribe` | `boolean` | `true` | Subscribe Entity/Motion/HeadTouch/ScreenGesture on connect |
### Instance properties
| Property | Type | Description |
|---|---|---|
| `connected` | `boolean` | WebSocket is open and session established |
| `sessionID` | `string` | Current ROM session ID (`''` when disconnected) |
| `currentAngles` | `[number, number]` | Last `[theta_deg, psi_deg]` sent via `lookAtAngle` |
| `videoStreamActive` | `boolean` | A video stream command is currently active |
| `tracks` | `Map<number, Track>` | Live map of tracked entities keyed by EntityID |
| `user` | `BehaviorManager` | Attention, speech, head motion, animations |
| `audio` | `AudioManager` | Microphone input, volume, wakeword |
| `camera` | `CameraManager` | Still photos and video streaming |
| `display` | `DisplayManager` | Screen output |
| `assets` | `AssetManager` | Remote asset caching |
---
## Lifecycle
### `client.connect()` → `Promise<void>`
Posts the ACO `/request` to unlock ROM commands, opens the WebSocket, and waits for the session to be established. Emits `'ready'` on success.
```js
await client.connect();
```
### `client.disconnect()`
Closes the WebSocket and stops auto-reconnect. The instance can be reconnected with `connect()` again.
### `client.destroy()`
Disconnects, stops the wakeword listener, and removes all event listeners. The instance cannot be reused.
---
## Events
All events are emitted on the `Client` instance (extends `EventEmitter`).
| Event | Args | Description |
|---|---|---|
| `'ready'` | — | Connected and session established |
| `'disconnect'` | — | WebSocket closed; `client.tracks` is cleared |
| `'error'` | `err: Error` | Connection or protocol error |
| `'trackCreate'` | `track: Track` | Entity first detected |
| `'trackUpdate'` | `oldTrack: Track, newTrack: Track` | Entity position updated |
| `'trackDelete'` | `track: Track` | Entity lost |
| `'motionDetected'` | `motion: Motion` | Motion detected |
| `'headTouch'` | `event: HeadTouchEvent` | Head pad touched |
| `'gesture'` | `event: GestureEvent` | Screen tapped or swiped |
| `'hotword'` | `event: HotwordEvent` | "Hey Jibo" detected |
```js
client.on('trackCreate', async (track) => {
console.log('Saw entity', track.id, 'at', track.screenCoords);
await track.lookAt();
});
client.on('headTouch', (event) => {
console.log('Touched pads:', event.activePads);
});
client.on('gesture', (event) => {
if (event.isTap) console.log('Tapped at', event.coordinate);
if (event.isSwipe) console.log('Swiped', event.direction);
});
```
---
## `client.behavior` — BehaviorManager
Controls Jibo's persona: attention, speech, head motion, and animations. All methods return Promises that resolve when the action physically completes.
### `client.behavior.setAttention(mode)` → `Promise<void>`
Set Jibo's engagement mode.
```js
await client.behavior.setAttention(AttentionMode.Engaged);
```
See [AttentionMode](#attentionmode) for all values.
---
### `client.behavior.say(text, options?)` → `Promise<void>`
Speak text or ESML. Automatically sanitizes input and chunks long text. Resolves when speech finishes physically.
| Option | Type | Default | Description |
|---|---|---|---|
| `maxChunkLen` | `number` | `450` | Max chars per ROM `Say` command |
| `maxTotal` | `number` | `3000` | Max total chars; excess trimmed with `…` |
| `chunkDelay` | `number` | `600` | ms pause between chunks |
| `signal` | `AbortSignal` | `null` | Cancel mid-speech |
```js
// Simple
await client.behavior.say("Hello! I'm Jibo.");
// With ESML tags
await client.behavior.say("<anim cat='excited' nonBlocking='true'/> Great to meet you!");
// Cancellable
const controller = new AbortController();
setTimeout(() => controller.abort(), 3000);
await client.behavior.say(longText, { signal: controller.signal });
```
Throws `{ code: 'SAY_TIMEOUT' }` if the robot stops responding mid-speech and it wasn't cancelled via signal.
---
### `client.behavior.lookAtAngle(theta, psi, options?)` → `Promise<void>`
Look at an angle in degrees. Resolves when `onLookAtAchieved` fires. If a look is already in-flight, the new angle is queued and the promise resolves immediately.
- **theta** — yaw, degrees. Positive = right. Clamped ±180°.
- **psi** — pitch, degrees. Positive = up. Clamped ±30°.
| Option | Type | Default |
|---|---|---|
| `track` | `boolean` | `false` |
| `timeout` | `number` | `5000` ms |
```js
await client.behavior.lookAtAngle(30, 10); // look right and slightly up
await client.behavior.lookAtAngle(0, 0); // center
```
### `client.behavior.nudge(dTheta, dPsi)` → `Promise<void>`
Nudge head by a relative delta from the current position.
```js
await client.behavior.nudge(15, 0); // rotate 15° right
```
### `client.behavior.lookAtScreen(x, y)` → `Promise<void>`
Look at a pixel coordinate on the camera image (640×480).
### `client.behavior.lookAtPosition(x, y, z)` → `Promise<void>`
Look at a world-relative 3D position in millimetres.
### `client.behavior.lookAtEntity(entityId, track?)` → `Promise<void>`
Look at a tracked entity by ID. `track` defaults to `true`. When tracking, the promise resolves after the first `onLookAtAchieved` rather than waiting for tracking to end.
```js
client.on('trackCreate', async (track) => {
await track.lookAt(); // shorthand — calls this internally
});
```
### `client.behavior.lookAt(target, options?)` → `Promise<void>`
Raw LookAt for advanced use. `target` is the ROM `LookAtTarget` shape:
| Shape | Description |
|---|---|
| `{ Angle: [theta_rad, psi_rad] }` | Radians |
| `{ ScreenCoords: [x, y] }` | Camera pixels |
| `{ Position: [x, y, z] }` | World mm |
| `{ Entity: id }` | Entity ID |
Options: `track` (boolean, default `false`), `levelHead` (boolean, default `false`), `timeout` (ms, default `5000`).
---
### `client.behavior.playAnim(animName)` → `Promise<void>`
Play a named animation. Resolves when the animation finishes.
```js
await client.behavior.playAnim('pleased_01');
```
### `client.behavior.playAnimCat(cat, options?)` → `Promise<void>`
Play an animation by emotional category.
| Option | Type | Default | Description |
|---|---|---|---|
| `filter` | `string\|null` | `null` | e.g. `'music, rom-upbeat'` |
| `nonBlocking` | `boolean` | `false` | If true, resolves immediately |
```js
await client.behavior.playAnimCat('excited');
await client.behavior.playAnimCat('dance', { filter: 'music, rom-upbeat' });
client.behavior.playAnimCat('happy', { nonBlocking: true }); // fire and forget
```
---
## `client.audio` — AudioManager
### `client.audio.awaitSpeech(options?)` → `Promise<SpeechResult>`
Listen for speech and resolve with the transcript. Rejects with `{ code: 'SPEECH_TIMEOUT' }` if no speech is detected within `time`.
| Option | Type | Default | Description |
|---|---|---|---|
| `mode` | `'local' \| 'cloud'` | `'local'` | `'local'` uses on-robot ASR (no cloud required) |
| `time` | `number` | `15000` | Max ms to wait for speech |
| `noSpeechTime` | `number` | `5000` | Max ms of silence before giving up (cloud mode) |
| `languageCode` | `string` | `'en-US'` | |
```js
try {
const speech = await client.audio.awaitSpeech({ mode: 'local', time: 10000 });
console.log(speech.content); // transcript string
console.log(speech.languageCode); // 'en-US'
} catch (err) {
if (err.code === 'SPEECH_TIMEOUT') console.log('Nothing heard.');
}
```
### `client.audio.setVolume(level)` → `Promise<void>`
Set audio mixer volume. `level` is `0.0``1.0`.
### `client.audio.watchWakeword(asrPort?)`
Connect to the always-on resident ASR wakeword stream (port 8088). Fires `'hotword'` events on the `Client` with a `HotwordEvent` payload. No-op if already watching.
```js
client.audio.watchWakeword();
client.on('hotword', (event) => {
console.log(`"${event.utterance}" — score ${event.score}`);
});
```
### `client.audio.stopWakeword()`
Stop the wakeword listener.
---
## `client.camera` — CameraManager
### `client.camera.takePhoto(options?)` → `Promise<Photo>`
Take a still photo. Resolves with a `Photo` object once the robot signals ready.
| Option | Type | Default | Options |
|---|---|---|---|
| `camera` | `string` | `'Right'` | `Camera.Left`, `Camera.Right` |
| `resolution` | `string` | `'HighRes'` | `Resolution.*` values |
| `distortion` | `boolean` | `false` | |
| `timeout` | `number` | `15000` ms | |
```js
const photo = await client.camera.takePhoto({ resolution: Resolution.MedRes });
const buffer = await photo.fetchBuffer();
fs.writeFileSync('shot.jpg', buffer);
```
### `client.camera.startVideo(options?)` → `Promise<VideoStream>`
Start a video stream. Resolves with a `VideoStream` once the robot signals ready.
| Option | Type | Default |
|---|---|---|
| `type` | `string` | `VideoType.Normal` (`'NORMAL'`) |
| `timeout` | `number` | `10000` ms |
```js
const stream = await client.camera.startVideo();
console.log(stream.uri); // Jibo URI for the MJPEG stream
await stream.pipe(fs.createWriteStream('out.mjpeg'));
stream.stop();
```
### `client.camera.stopVideo()`
Stop the active video stream.
---
## `client.display` — DisplayManager
All display methods are fire-and-forget (no await needed).
### `client.display.showEye(name?)`
Show Jibo's eye animation. Default: `'default'`.
### `client.display.showText(text, name?)`
Display text on Jibo's screen. `name` is the view slot name (default `'view'`).
### `client.display.showImage(src, name?)`
Display an image on Jibo's screen. `src` is a URL.
---
## `client.assets` — AssetManager
### `client.assets.fetch(uri, name, timeout?)` → `Promise<void>`
Download a remote file and cache it on the robot under `name`. Rejects with `{ code: 'ASSET_FAILED' }` on error or `{ code: 'ASSET_TIMEOUT' }` after `timeout` ms (default 30000).
```js
await client.assets.fetch('https://example.com/sound.mp3', 'mysound');
```
### `client.assets.unload(name)`
Remove a cached asset from the robot by name.
---
## Structures
Rich objects emitted by events or returned from manager methods. All have a `_client` back-reference for calling methods.
### `Track`
Emitted by `'trackCreate'`, `'trackUpdate'`, `'trackDelete'` and stored in `client.tracks`.
| Property | Type | Description |
|---|---|---|
| `id` | `number` | ROM EntityID |
| `screenCoords` | `{ x, y } \| null` | Position on camera image |
| `worldCoords` | `{ x, y, z } \| null` | 3D world position in mm |
```js
track.lookAt(track = true) // → Promise<void>
```
### `SpeechResult`
Resolved by `client.audio.awaitSpeech()`.
| Property | Type |
|---|---|
| `content` | `string` — transcript |
| `languageCode` | `string` — e.g. `'en-US'` |
### `Photo`
Resolved by `client.camera.takePhoto()`.
| Property | Type |
|---|---|
| `uri` | `string` — Jibo-internal URI |
| `name` | `string` |
| `angleTarget` | `object \| null` |
| `positionTarget` | `object \| null` |
```js
photo.fetchBuffer() // → Promise<Buffer>
photo.pipe(writableStream) // → Promise<void>
```
### `VideoStream`
Resolved by `client.camera.startVideo()`.
| Property | Type |
|---|---|
| `uri` | `string` — Jibo-internal URI for the MJPEG stream |
| `active` | `boolean` |
```js
stream.pipe(writableStream) // → Promise<void>
stream.stop()
```
### `Motion`
Emitted by `'motionDetected'`.
| Property | Type |
|---|---|
| `zones` | `MotionZone[]` |
Each `MotionZone`: `{ screenCoords: {x,y}|null, worldCoords: {x,y,z}|null, intensity: number|null }`.
### `HeadTouchEvent`
Emitted by `'headTouch'`.
| Property | Type |
|---|---|
| `pads` | `boolean[]` — all 6 pads in order |
| `activePads` | `string[]` — names of currently-pressed pads |
Pad order / names: `frontLeft`, `middleLeft`, `backLeft`, `frontRight`, `middleRight`, `backRight`.
```js
event.isTouched('frontLeft') // → boolean
```
### `GestureEvent`
Emitted by `'gesture'`.
| Property | Type |
|---|---|
| `type` | `'Tap' \| 'Swipe'` |
| `coordinate` | `{ x, y } \| null` — tap position |
| `direction` | `'Up' \| 'Down' \| 'Left' \| 'Right' \| null` — swipe direction |
| `isTap` | `boolean` |
| `isSwipe` | `boolean` |
### `HotwordEvent`
Emitted by `'hotword'`.
| Property | Type |
|---|---|
| `utterance` | `string` — e.g. `'hey jibo'` |
| `score` | `number` |
| `timestamp` | `string` — ISO 8601 |
---
## Constants
```js
const {
AttentionMode, // Off, Idle, Disengage, Engaged, Speaking, Fixated, Attractable, Menu, Command
Camera, // Left, Right
Resolution, // HighRes, MedRes, LowRes, MicroRes
VideoType, // Normal ('NORMAL'), Debug ('DEBUG')
GestureType, // Tap, SwipeDown, SwipeUp, SwipeRight, SwipeLeft
HEAD_TOUCH_PADS // ['frontLeft','middleLeft','backLeft','frontRight','middleRight','backRight']
} = require('rom-control');
```
### AttentionMode
| Value | Description |
|---|---|
| `Off` | Disengage all attention systems |
| `Idle` | Low-power idle |
| `Disengage` | Stop tracking, return to neutral |
| `Engaged` | Actively track and engage |
| `Speaking` | Speaking mode |
| `Fixated` | Lock gaze on current target |
| `Attractable` | Look toward movement and sound |
| `Menu` | Menu interaction mode |
| `Command` | Command input mode |
---
## ESML Utilities
Exported for callers that build raw ESML strings. Used internally by `client.behavior.say()`.
### `sanitizeEsml(text)` → `string`
Strip characters rejected by Jibo's ROM parser: emoji, bare `&`, non-ASCII, markdown formatting, `<ssa>` tags, and newlines. Preserves valid ESML tags (`<anim>`, `<break>`, `<style>`, etc.).
### `chunkEsml(text, maxLen?)` → `string[]`
Split sanitized ESML into chunks of at most `maxLen` chars (default 450), cutting at sentence boundaries then word boundaries. Every chunk is guaranteed to contain at least one XML tag (required by Jibo's TTS parser).
---
## Using `client._conn` (advanced)
`client._conn` is the internal `RomConnection` instance — the raw txId-based layer. It is intentionally not part of the public API, but is accessible when you need capabilities the managers don't cover:
- **Raw event firehose** — `client._conn.on('event', (txId, body) => ...)` receives every robot message unfiltered, useful for forwarding events to a UI.
- **Cancel by txId** — `client._conn.cancel(txId)` when you hold a txId from a fire-and-forget call.
- **Fire-and-forget with txId** — `client._conn.listenLocalASR()`, `client._conn.takePhoto()`, etc. when you need the txId to correlate async events arriving through a separate channel (e.g. a WebSocket broadcast to a browser).
- **Low-level LookAt** — `client._conn.lookAt(target, trackFlag)` for tracking screen coordinates.
```js
// Example: forward all raw events to connected browser clients
client._conn.on('event', (txId, body) => {
broadcast({ type: 'jiboEvent', txId, body });
});
// Example: fire listen and return txId to a REST caller for WebSocket correlation
const txId = client._conn.listenLocalASR(5000, 10000);
res.json({ txId });
```
---
## Complete Example
```js
const { Client, AttentionMode, Resolution } = require('rom-control');
async function main() {
const client = new Client({ host: '192.168.1.217' });
client.once('ready', () => {
console.log('Connected, session:', client.sessionID);
});
// Track entities in client.tracks automatically
client.on('trackCreate', async (track) => {
console.log('Saw person at', track.screenCoords);
await track.lookAt();
});
client.on('headTouch', (event) => {
if (event.isTouched('frontLeft')) {
client.behavior.say('Ouch, that tickles!');
}
});
client.on('hotword', () => {
client.behavior.playAnimCat('excited', { nonBlocking: true });
});
await client.connect();
// Greet
await client.behavior.setAttention(AttentionMode.Engaged);
await client.behavior.say("<anim cat='happy' nonBlocking='true'/> Hello, I'm Jibo!");
// Take a photo
const photo = await client.camera.takePhoto({ resolution: Resolution.HighRes });
const buf = await photo.fetchBuffer();
require('fs').writeFileSync('jibo-shot.jpg', buf);
// Listen for a response
try {
const speech = await client.audio.awaitSpeech({ mode: 'local', time: 12000 });
await client.behavior.say(`You said: ${speech.content}`);
} catch {
await client.behavior.say("I didn't catch that.");
}
// Watch for wakeword in the background
client.audio.watchWakeword();
// Clean up after 60 seconds
setTimeout(() => client.destroy(), 60_000);
}
main().catch(console.error);
```