238 lines
12 KiB
Markdown
238 lines
12 KiB
Markdown
# Protocol Inventory
|
|
|
|
## Purpose
|
|
|
|
This document tracks the currently observed cloud surface area for Jibo and helps keep the .NET port aligned with real behavior captured by the Node prototype.
|
|
|
|
It is not a claim that the current Node server covers all Jibo endpoints or behaviors. It reflects only the portions mapped so far.
|
|
|
|
Confidence levels:
|
|
|
|
- `high`: observed in code and currently represented in the .NET scaffold
|
|
- `medium`: observed in the Node oracle and documented, but not fully ported yet
|
|
- `low`: expected or inferred, needs more robot validation
|
|
|
|
## Known Hosts
|
|
|
|
| Host | Purpose | Confidence | Notes |
|
|
| --- | --- | --- | --- |
|
|
| `api.jibo.com` | HTTPS API target for `X-Amz-Target` operations | high | Main request dispatch path in the Node prototype |
|
|
| `api-socket.jibo.com` | token-authenticated WebSocket path | medium | Node accepts tokenized connections and intentionally sends no greeting |
|
|
| `neo-hub.jibo.com` | listen and proactive WebSocket traffic | medium | Path-driven split between listen and `/v1/proactive` |
|
|
|
|
## Region Configuration
|
|
|
|
Current robot findings suggest the preferred OpenJibo bootstrap path is to inject a new region configuration rather than treat host overrides as the only integration seam.
|
|
|
|
Confirmed or strongly observed files:
|
|
|
|
- `/etc/jibo-jetstream-service.json`
|
|
- `/var/jibo/credentials.json`
|
|
- `/etc/jibo-ssm/*.json`
|
|
- `/skills/jibo/Jibo/Skills/@be/be/node_modules/language-subtag-registry/data/json/registry.json`
|
|
- `/skills/jibo/Jibo/Skills/oobe-config/config.json`
|
|
|
|
The first two are the clearest current OpenJibo injection points. The others should remain on the audit list while endpoint and behavior mapping continues.
|
|
|
|
## HTTP Dispatch Families
|
|
|
|
Observed from `open-jibo-link.js`:
|
|
|
|
| Service family | Example operations | Confidence | Current .NET status |
|
|
| --- | --- | --- | --- |
|
|
| `Account_*` | `CreateHubToken`, `CreateAccessToken`, `Login`, `Get` | high | initial dispatch implemented |
|
|
| `Notification_*` | `NewRobotToken` | high | initial dispatch implemented |
|
|
| `Loop_*` | `List`, `ListLoops` | medium | initial dispatch implemented |
|
|
| `Robot_*` | `GetRobot`, `UpdateRobot` | medium | initial dispatch implemented |
|
|
| `Update_*` | `ListUpdates`, `ListUpdatesFrom`, `GetUpdateFrom`, `CreateUpdate`, `RemoveUpdate` | medium | list/get scaffolding implemented |
|
|
| `Media_20160725` | `List`, `Get`, `Create`, `Remove` | medium | implemented in current parity scaffold |
|
|
| `Log_*` | `PutEvents`, `PutEventsAsync`, `PutBinaryAsync`, `PutAsrBinary` | medium | async upload metadata and placeholder upload endpoints implemented |
|
|
| `Key_*` | `ShouldCreate`, `CreateSymmetricKey`, `GetRequest` | medium | implemented in current parity scaffold |
|
|
| `Person_*` | `ListHolidays` | low | implemented in current parity scaffold |
|
|
| `Backup_*` | `List` | low | implemented in current parity scaffold |
|
|
|
|
## WebSocket Flows
|
|
|
|
| Host/path | Flow | Confidence | Current .NET status |
|
|
| --- | --- | --- | --- |
|
|
| `api-socket.jibo.com/{token}` | token-authenticated socket for API-side signaling | medium | stub endpoint implemented |
|
|
| `neo-hub.jibo.com/{listen-path}` | listen turn flow with JSON and binary audio traffic | medium | fixture-backed synthetic turn flow implemented for `LISTEN`, `CONTEXT`, `CLIENT_NLU`, `CLIENT_ASR`, `EOS`, and first chat/joke skill responses |
|
|
| `neo-hub.jibo.com/v1/proactive` | proactive connection flow | medium | stub endpoint implemented |
|
|
|
|
### Current WebSocket Parity Slice
|
|
|
|
The current .NET pass covers only a narrow, explicitly synthetic subset of observed Neo-Hub behavior:
|
|
|
|
- token/session tracking across websocket turns
|
|
- explicit per-turn state tracking for transID, rules, context, buffered audio, and finalize attempts
|
|
- buffered audio accounting and turn-pending state
|
|
- auto-finalize triggering for raw audio once `LISTEN`, `CONTEXT`, and minimum buffered-audio thresholds are present
|
|
- `LISTEN` message handling with synthetic `LISTEN` result payload shaping
|
|
- `CONTEXT` capture for turn/session state
|
|
- `CLIENT_NLU` turn completion using remembered listen/session metadata
|
|
- `CLIENT_ASR` turn completion, including a synthetic STT seam for buffered-audio replay
|
|
- `EOS` emission after completed turns
|
|
- delayed `SKILL_ACTION` emission after `EOS` on completed turn flows to better match the Node oracle timing
|
|
- first richer vertical slice for joke/chat `SKILL_ACTION` playback
|
|
- fixture-backed joke-turn payload fidelity for `CLIENT_ASR -> LISTEN -> EOS -> delayed SKILL_ACTION`, including Node-like `EOS` envelope fields and the currently observed joke `SKILL_ACTION` metadata shape
|
|
|
|
This does not yet mean parity for:
|
|
|
|
- real binary audio buffering and finalization
|
|
- real STT provider integration and external ASR lifecycle timing
|
|
- early-EOS behavior
|
|
- multi-step skill lifecycles beyond the current synthetic playback response
|
|
- broad `SKILL_ACTION` payload coverage outside the currently observed joke/chat playback slice
|
|
- broader interaction, animation, or ESML command families
|
|
|
|
### Successful Joke Turn: What Is Grounded Now
|
|
|
|
The highest-confidence websocket vertical slice after the starter parity pass is now:
|
|
|
|
- inbound `CLIENT_ASR` carrying `"tell me a joke"`
|
|
- outbound synthetic `LISTEN` result with joke intent and remembered rules
|
|
- outbound `EOS` carrying `ts`, `msgID`, `transID`, and an empty `data` object
|
|
- outbound `SKILL_ACTION` about 75 ms later
|
|
- joke `SKILL_ACTION` payload shape aligned with the Node oracle for:
|
|
- `data.skill.id = "@be/joke"`
|
|
- `data.action.config.jcp.type = "SLIM"`
|
|
- `data.action.config.jcp.config.play.meta.prompt_id = "RUNTIME_PROMPT"`
|
|
- `data.action.config.jcp.config.play.meta.prompt_sub_category = "AN"`
|
|
- `data.action.config.jcp.config.play.meta.mim_id = "runtime-joke"`
|
|
- `data.action.config.jcp.config.play.meta.mim_type = "announcement"`
|
|
|
|
What remains intentionally unclaimed for that slice:
|
|
|
|
- whether the joke payload is complete beyond those fields
|
|
- whether other successful skills use the same payload shape
|
|
- whether additional websocket messages appear in other successful skill paths
|
|
- whether any timing gaps besides the observed 75 ms `EOS -> SKILL_ACTION` delay matter
|
|
|
|
### Latest Live Capture Additions From April 16, 2026
|
|
|
|
The newest repo-root websocket capture at [captures/websocket/20260416.events.ndjson](/C:/Projects/JiboExperiments/captures/websocket/20260416.events.ndjson) adds more grounded websocket discovery without implying broad protocol coverage.
|
|
|
|
Observed `CLIENT_ASR` transcript-bearing turns now include:
|
|
|
|
- `tell me a joke`
|
|
- `do a dance`
|
|
- `surprise me`
|
|
- `personal report`
|
|
- `tell me about the weather`
|
|
- `tell me about my calendar`
|
|
- `what does my commute look like`
|
|
- `tell me about the news`
|
|
|
|
Observed menu-driven `CLIENT_NLU` intents now include:
|
|
|
|
- `loadMenu`
|
|
- `askForTime`
|
|
- `askForDate`
|
|
- `start`
|
|
- `timerValue`
|
|
- `set`
|
|
- `alarmValue`
|
|
|
|
Observed entity/rule shapes from those menu flows include:
|
|
|
|
- `askForTime` with `entities.domain = "clock"` and `rules = ["clock/clock_menu"]`
|
|
- `askForDate` with the same `clock` menu rule family
|
|
- `timerValue` with timer duration entities
|
|
- `alarmValue` with alarm time entities such as `ampm` and `time`
|
|
|
|
Current `.NET` parity for that new slice is still intentionally partial:
|
|
|
|
- menu-side `CLIENT_NLU` replies now preserve the observed inbound intent/rules/entities in the synthetic outbound `LISTEN` payload
|
|
- `askForTime` and `askForDate` are now fixture-backed as mapped menu intents
|
|
- `do a dance` is now recognized as a distinct chat/dance intent in the current synthetic path
|
|
|
|
Still unknown:
|
|
|
|
- whether `surprise me`, `personal report`, weather, calendar, commute, and news should map to richer skill-specific websocket payloads
|
|
- whether menu-side clock/timer/alarm flows require additional websocket messages beyond the currently observed `LISTEN` and `EOS`
|
|
- how much of those flows are actually completed robot-side versus merely acknowledged by the cloud
|
|
|
|
### Buffered Audio / ASR Direction
|
|
|
|
The `.NET` hosted implementation now has two STT lanes:
|
|
|
|
- existing synthetic transcript-hint replay for fixture-driven parity work
|
|
- a new opt-in local buffered-audio path that preserves websocket Ogg/Opus frames and can invoke external `ffmpeg` plus `whisper.cpp`
|
|
|
|
That local tool-based path is intentionally experimental and disabled by default. Its purpose is to let us iterate on real buffered-audio decoding in `.NET` without changing the stable cloud-first architecture or claiming production ASR parity yet.
|
|
|
|
Future provider options still under consideration:
|
|
|
|
- local decode/transcribe in `.NET` using preserved websocket audio plus external tools
|
|
- Azure Speech as a hosted STT option for the long-term cloud path
|
|
- direct managed Opus decode later if a library proves stable enough for the hosted deployment target
|
|
|
|
Current raw-audio fallback behavior remains explicitly synthetic:
|
|
|
|
- when a buffered-audio turn can be resolved through the synthetic transcript-hint seam, `.NET` now auto-finalizes and emits `LISTEN` + `EOS` + `SKILL_ACTION`
|
|
- when the turn crosses the finalize threshold without a usable transcript, `.NET` now emits a fallback `LISTEN` + `EOS` + generic `SKILL_ACTION` rather than leaving the robot hanging on an unfinished turn
|
|
- that fallback is a compatibility measure inspired by the Node oracle, not a claim of real ASR understanding
|
|
|
|
### Internal ASR Direction
|
|
|
|
The current .NET websocket layer now separates:
|
|
|
|
- robot-facing websocket compatibility
|
|
- long-lived cloud session state
|
|
- per-turn websocket state
|
|
- transcript resolution / STT selection
|
|
- turn-to-response mapping
|
|
|
|
That separation is intentional. The synthetic STT path currently exists only to support fixture-driven replay while parity work continues. It should be treated as an internal compatibility seam, not as the final production ASR design.
|
|
|
|
## Upload Paths
|
|
|
|
| Path | Purpose | Confidence | Current .NET status |
|
|
| --- | --- | --- | --- |
|
|
| `/upload/asr-binary` | async audio/log upload target | medium | placeholder endpoint accepted |
|
|
| `/upload/log-events` | async log upload target | medium | placeholder endpoint accepted |
|
|
| `/upload/log-binary` | async binary upload target | medium | placeholder endpoint accepted |
|
|
|
|
## First Live .NET Capture Findings
|
|
|
|
The first real `.NET` robot run has confirmed only an early startup slice so far:
|
|
|
|
- `api.jibo.com` startup HTTP requests are reaching the `.NET` cloud
|
|
- `Notification.NewRobotToken` is active in the robot startup sequence
|
|
- `api-socket.jibo.com/{token}` is being accepted live
|
|
|
|
The first live run has not yet shown full startup parity with the working Node server. In particular, the successful Node run continues into additional health/log cadence after token issuance and socket acceptance, while the current `.NET` run has not yet reproduced that full progression consistently.
|
|
|
|
## First Core Revive Slice
|
|
|
|
The first .NET hosted milestone should fully support:
|
|
|
|
- `Account.CreateHubToken`
|
|
- `Notification.NewRobotToken`
|
|
- `Loop.List` and `Loop.ListLoops`
|
|
- `Robot.GetRobot`
|
|
- `Update.ListUpdates`, `Update.ListUpdatesFrom`, `Update.GetUpdateFrom`
|
|
- root probe and health checks
|
|
- basic listen/proactive WebSocket acceptance
|
|
- normalized turn and reply mapping for simple chat
|
|
|
|
## Known Beyond Current Node Coverage
|
|
|
|
The platform scope is broader than the endpoints currently modeled in `open-jibo-link.js`. Known areas that still need mapping include:
|
|
|
|
- broader skill launch and lifecycle behavior
|
|
- interactivity command families beyond the joke starter path
|
|
- richer animation and expression control
|
|
- ESML and embodied speech features
|
|
- additional service families and region-specific endpoint behavior
|
|
- startup and configuration differences across Jibo software variants
|
|
|
|
Useful external references for future mapping:
|
|
|
|
- [Speak-Tweak Docs](https://hri2024.jibo.media.mit.edu/Speak-Tweak-Docs)
|
|
- [ESML PDF](https://hri2024.jibo.media.mit.edu/attachments/SDK-SDK---ESML-121023-203758.pdf)
|
|
|
|
## Fixture Source
|
|
|
|
Sanitized fixtures live under [src/Jibo.Cloud/node/fixtures](/OpenJibo/src/Jibo.Cloud/node/fixtures) and should be expanded as real traffic is captured.
|