dev plan and backlog updates, one bug fix

This commit is contained in:
Jacob Dubin
2026-04-25 23:06:49 -05:00
parent 773e768898
commit 527ddb1bfc
3 changed files with 521 additions and 592 deletions

View File

@@ -2,243 +2,159 @@
## Summary
This document is the working implementation plan after the initial hosted-cloud scaffold.
This document is the current working plan for the OpenJibo hosted cloud.
It is intentionally broader than the current Node server. The Node server is a protocol oracle and discovery tool, not the complete map of Jibo.
The production lane is the `.NET` cloud in `src/Jibo.Cloud/dotnet`. The Node server remains the protocol oracle, capture harness, and fast reverse-engineering lab, but it is no longer the long-term hosted architecture.
Day-to-day feature sequencing now lives in [feature-backlog.md](/C:/Projects/JiboExperiments/OpenJibo/docs/feature-backlog.md).
Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.
Cloud release hygiene:
## Current Release Snapshot
- keep a visible OpenJibo Cloud version string
- expose it through diagnostics such as `/health` and the spoken `cloud version` command
- bump the shared version constant whenever we deploy a meaningful hosted-cloud change
- Current OpenJibo Cloud release constant: `1.0.18`
- Source of truth: [OpenJiboCloudBuildInfo.cs](../src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/OpenJiboCloudBuildInfo.cs)
- Spoken diagnostic: `Open Jibo Cloud version 1 dot 0 dot 18.`
- HTTP diagnostic: `/health` returns the same version
- Startup diagnostic: the API logs the same version on boot
- .NET target framework: `net10.0` across the cloud projects and cloud test project
## Current Scope
Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm and photo/gallery behavior on stock OS `1.9`, with a few small feature slices added while the test loop is warm.
- stable .NET cloud scaffold
- Azure-oriented architecture and data ownership
- normalized runtime contracts for cloud-to-runtime handoff
- bootstrap documentation for region injection and targeted device patching
- starter endpoint coverage for account, notification, robot, loop, update, uploads, and core WebSocket acceptance
- starter xUnit coverage for the .NET application layer
## Release Rhythm
## Next Implementation Scope
This is the working pattern for each hosted-cloud release:
- expand HTTP `X-Amz-Target` coverage from observed traffic and fixtures
- grow WebSocket compatibility from stub acceptance into realistic turn orchestration
- keep websocket parity fixture-driven, starting with exact sequencing and payload-shape fidelity for the successful joke vertical slice before claiming broader skill coverage
- replace in-memory state with Azure SQL-backed persistence
- add structured fixture replay tests
- harden region/bootstrap docs by software version
1. Pick a narrow source-backed feature or compatibility slice.
2. Confirm the stock payload shape from captures, Pegasus, the JiboOS reference tree, or live logs.
3. Implement the smallest `.NET` path that can be tested honestly.
4. Add focused tests around routing, websocket payload shape, and state behavior.
5. Run the stock robot live test, collect captures, and record the result before moving on.
6. Keep regressions and bug fixes in the current release; roll larger follow-up work into the next version.
## Discovery Scope
For `1.0.18`, the remaining release work should stay small: finish one or two feature slices, run the live regression pass, and only patch bugs found in that pass before calling the version complete. `1.0.19` should then reopen the broader feature queue.
We still need to map more than the current Node server expresses. Priority discovery areas:
## Current Code Truth
- all hostnames and service prefixes observed in real startup and turn traffic
- skill launch and skill lifecycle flows
- interactivity command families beyond the current joke flow
- richer embodied speech and animation behaviors
- upload, logging, backup, and key-sharing flows
- per-version configuration differences and region handling
The hosted `.NET` cloud is a modular monolith:
## Current WebSocket Discovery Focus
```text
Jibo.Cloud.Api -> Jibo.Cloud.Application -> Jibo.Cloud.Domain -> Jibo.Cloud.Infrastructure
```
The next fixture-driven websocket work should continue to separate three buckets:
Current API and protocol scope:
- discovered behavior
Grounded by the Node oracle, sanitized fixtures, and live captures
- implemented parity
Only the narrow slices currently replayed and tested in `.NET`
- future hypotheses
Ideas to investigate later, but not behaviors to silently bake into the hosted cloud
- HTTP `X-Amz-Target` dispatch through `JiboCloudProtocolService`
- `/health` diagnostics
- WebSocket acceptance for `api-socket.jibo.com`, `neo-hub.jibo.com` listen, and `neo-hub.jibo.com/v1/proactive`
- token/session issuance for account, hub, and robot startup flows
- starter account, notification, loop, media, key, person, backup, robot, update, and upload/log handling
- media lookup through `/media/{path}`
- no placeholder no-op update from `GetUpdateFrom` when no staged update exists
Right now the strongest implemented vertical slice beyond basic listen completion is the successful joke turn:
Current websocket scope:
- `CLIENT_ASR` transcript-carrying turn completion
- synthetic `LISTEN` result shaping
- `EOS`
- delayed joke `SKILL_ACTION`
- long-lived cloud session state separated from per-turn websocket state
- `LISTEN`, `CONTEXT`, `CLIENT_NLU`, `CLIENT_ASR`, and binary-audio handling
- pending listen setup packets kept pending instead of finalized as turns
- buffered Ogg/Opus audio preservation per turn
- synthetic transcript hint support for fixture-driven parity
- opt-in local `ffmpeg` plus `whisper.cpp` STT path for discovery
- auto-finalize thresholds for buffered audio after a real listen phase
- late-audio ignore windows after completed turns
- no-input local completion for constrained prompts
- unknown inbound websocket types dropped silently instead of echoing stock-OS-unknown OpenJibo events
- file telemetry and fixture export for HTTP, websocket, and turn captures
That should remain the model for future websocket work: capture first, fixture second, parity third.
Current state and persistence scope:
The latest live captures also support a second discovery track:
- `InMemoryCloudStateStore` remains the runtime store
- a local JSON persistence bridge is enabled by default at `App_Data/cloud-state.json`
- persisted state currently covers staged updates, media metadata, and backup metadata
- this is a bridge toward Azure SQL and Blob Storage, not the final hosted storage architecture
- menu-driven `CLIENT_NLU` parity for clock, timer, and alarm flows
- richer transcript-bearing `CLIENT_ASR` discovery beyond jokes
- buffered-audio preservation for eventual real ASR in `.NET`
## Implemented In Current `1.0.18` Source
Near-term ASR work should stay staged:
The following behavior is present in source and covered by focused tests:
1. preserve and replay the websocket audio payloads honestly
2. validate a local tool-based decode/transcribe loop in `.NET`
3. compare that against Azure-hosted STT before choosing a default production path
- `cloud version` speech and `/health` version reporting share `OpenJiboCloudBuildInfo.Version`
- apostrophes are no longer escaped to `&apos;` in spoken ESML, while `&`, `<`, `>`, and `"` remain escaped
- radio voice launch supports `open the radio` and genre launch such as `play country music`, using local `@be/radio` `menu` payloads, `SKILL_REDIRECT`, and silent completion
- news has a first Nimbus-shaped cloud path using `match.cloudSkill = news` and a `news` `SKILL_ACTION` with synthetic briefing content
- stock-shaped clock handoffs cover time, date, day, clock open, timer/alarm menu, timer/alarm value, timer/alarm clarification, and timer/alarm delete
- alarm parsing covers forms such as `7:30 am`, `830`, `8 30`, `10-25`, `10:25 pm`, and `10 25 p m`
- ambiguous alarm times can prefer the next local occurrence when the robot context includes `runtime.location.iso`
- `CLIENT_NLU intent=set` with only `domain=alarm` stays on the local clock clarification path instead of defaulting to a fabricated time
- `CLIENT_NLU intent=cancel` on `clock/alarm_timer_query_menu` can reuse the last active clock domain
- photo flows route `open photo gallery` to `@be/gallery`, `snap a picture` to `@be/create/createOnePhoto`, and `open photobooth` to `@be/create/createSomePhotos`
- passive gallery/create context does not reopen a stale cloud turn
- media metadata persists across store recreation and `/media/{path}` can serve the current text-body placeholder payload
- constrained yes/no handling covers `create/is_it_a_keeper`, `shared/yes_no`, `settings/download_now_later`, `surprises-date/offer_date_fact`, `surprises-ota/want_to_download_now`, and `$YESNO` hints
- outbound constrained yes/no responses strip unrelated `globals/*` rules so stock OS stays local
- no-input fallback for constrained yes/no prompts emits local `LISTEN`/`EOS` instead of relaunching generic Nimbus speech
- Word of the Day launch, spoken guesses, structured `CLIENT_NLU` guesses, hint-order guesses, fuzzy hint matching, right-word cleanup, and late audio cleanup are covered in the websocket layer
That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
## Reference Sources
## Latest Capture Findings
Use these sources as evidence, not as code to copy blindly:
The latest live test round tightened up three priorities:
- OpenJibo Node oracle: [open-jibo-link.js](../src/Jibo.Cloud/node/open-jibo-link.js)
- Current hosted `.NET` cloud: [src/Jibo.Cloud/dotnet](../src/Jibo.Cloud/dotnet)
- Live captures and robot logs: `C:\Projects\JiboExperiments\artifact-output`
- Original Pegasus cloud source: `C:\Users\JacobDubin\Downloads\jibo\jibo copy\pegasus`
- Original SDK and skill source snapshot: `C:\Users\JacobDubin\Downloads\jibo\jibo copy\sdk`
- JiboOS reference tree: `C:\Projects\JiboOS`
- JiboOS `V3.1` skill snapshot: `C:\Projects\JiboOS\V3.1\build\opt\jibo\Jibo\Skills\@be`
- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
The Pegasus tree is especially useful for cloud service intent: `packages/hub` documents `/v1/listen`, `/nlu`, and `/asr`; `packages/lasso` documents credential and provider aggregation; `packages/history` and the architecture materials are useful for future memory and proactivity work.
Evidence from the latest `2026-04-18` captures:
The JiboOS trees are especially useful for local skill ownership and payload shape: `@be/clock`, `@be/gallery`, `@be/create`, `@be/radio`, `@be/nimbus`, `@be/settings`, `@be/surprises*`, `@be/restore`, `@be/who-am-i`, and `@be/idle`.
- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
When sources disagree, prefer the newest live stock-OS capture for runtime behavior, then stock robot source for local ownership, then Pegasus for original cloud intent, then Node for known working compatibility behavior.
Evidence from the latest word-of-the-day capture round:
## `1.0.18` Closeout Gates
- yes/no photo confirmation improved and now completes through the constrained follow-up path
- `CLIENT_NLU` menu navigation is surfacing richer `destination` entities such as `snapshot`, `fun`, and `word-of-the-day`
- word-of-the-day guesses can arrive as structured `CLIENT_NLU` turns with `intent=guess`, `rules=["word-of-the-day/puzzle"]`, and `entities.guess=<word>`
- those structured turns should be treated as first-class cloud inputs even when no free-form transcript is present
Before calling `1.0.18` complete, prove or explicitly defer these:
Evidence from the continued `2026-04-18` word-of-the-day and time captures:
- Run the focused `.NET` cloud test suite after the last feature slice.
- Confirm the running robot build reports cloud version `1.0.18`.
- Regression test alarm flows: set with explicit time, set with compact/spoken time, clarify missing time, cancel alarm, and local cleanup prompts.
- Regression test photo/gallery flows: open gallery, answer the stock `shared/yes_no` prompt, hand into create, take one photo, and avoid blue-ring stale turns.
- Live-test radio launch: `open the radio` and `play country music`.
- Live-test first news path: `tell me the news` should use the Nimbus cloud-skill lane instead of generic chat.
- Recheck constrained yes/no prompts for update/backup/share/gallery without leaking global rules.
- Recheck that stock OS no longer logs OpenJibo-only websocket events such as synthetic pending/context/ack packets from the current build.
- Treat remaining `ffmpeg` / `whisper.cpp` transcript failures as STT work unless the capture proves a separate turn-routing regression.
## Known Gaps
These are not blockers for calling `1.0.18` complete unless the live test shows a regression in a current release path:
- spoken "start word of the day" style requests should route into the same word-of-the-day launch path as the menu destination
- spoken puzzle answers like `pastoral` should be treated as valid guesses whenever the active listen rules show `word-of-the-day/puzzle`
- spoken numeric line picks like `two` should resolve through the active word-of-the-day hint order instead of being treated as generic chat
- after a successful word-of-the-day completion, late empty same-turn audio should be ignored instead of generating a stale blank-audio follow-up
- post-game hotphrase blank-audio turns should be treated as cleanup noise, not a new cloud conversation turn
- clock replies should use the user-facing hour format without a leading zero
Evidence from the smaller `2026-04-18/19` hotphrase and word-of-the-day verification bundle:
- hotphrase silence can still auto-finalize into a generic `heyJibo` fallback, which sounds confused on-robot compared with a dedicated greeting path
- voice-triggered `loadMenu + destination=word-of-the-day` reaches Nimbus successfully, but Nimbus still expects a follow-up cloud skill response and times out if launch stops at `LISTEN` + `EOS`
- the newer `jibo test 2` bundle shows voice launch now reaches Nimbus and receives a cloud response, but a generic `SLIM/RUNTIME_PROMPT` just says "starting word of the day" instead of performing the menu-style redirect the on-screen path uses
- the `jibo test 3` bundle confirms Nimbus rejects `REDIRECT` in that cloud-skill slot, so the better next experiment is to hint the on-robot target skill directly on the synthetic `LISTEN` result and skip Nimbus `SKILL_ACTION` entirely for word-of-the-day launch
- the same bundle also shows `word-of-the-day/right_word` cleanup turns need a short ignore window for trailing audio or the robot can stay stuck in a blue-ring listening state
- the `jibo test 4` bundle exposed a broader websocket issue: inbound robot `LISTEN` setup packets were still being routed through turn finalization instead of just priming pending state, which can corrupt menu and word-of-the-day flows by treating setup turns like resolved intents
- the `jibo test 5` bundle suggests the remaining WOD launch and post-win cleanup bugs share the same root cause: we were leaving the robot-side `cloudSkillResponse` promise unresolved on `word_of_the_day`, `word_of_the_day_guess`, and `word-of-the-day/right_word`, so the latest .NET pass now emits a completion-only silent `SKILL_ACTION` for those paths instead of stopping at `LISTEN` + `EOS` or going fully silent
- the `jibo test 6` bundle plus the attached `@be` source snapshot refine that diagnosis: Nimbus does accept the silent completion response, but treats it as a normal `SLIM/RUNTIME_PROMPT` instead of a skill redirect, while the successful on-robot path is built around `menu + domain=word-of-the-day` skill switching through `SkillSwitchScheduler`
- the attached `be-framework.js` adds one more strong clue: the Be relaunch hook reads `skillData.nlu.skill`, so synthetic cloud launch turns for word-of-the-day should carry the explicit target skill name in the outbound NLU payload instead of expecting the robot to infer it from `intent/domain` alone
- the `JiboOs/V3.1` Nimbus source confirms the hotphrase/global launch path still routes through `@be/nimbus` and waits on `listenResult.cloudSkillResponse`, while Nimbus only supports a narrow set of cloud JCP behaviors and does not use cloud `REDIRECT` to jump into local skills; by contrast, the post-win `word-of-the-day/right_word` turn is a local `Optional-Response`, so the cleaner robot-side closeout is to synthesize an immediate empty `LISTEN + EOS` no-response result rather than replying with only `SKILL_ACTION`
- the same `jibo test 6` capture also shows the blue-ring cleanup loop was partly self-inflicted in `.NET`: after `word-of-the-day/right_word` we stopped the active turn, but later stray binary audio on the same transID could still re-arm buffering even without a fresh `LISTEN`, so the next pass now requires a real listen phase before post-turn audio can reopen buffered completion
- the local buffered-audio seam is still producing repeated `whisper.cpp returned no transcript` and `ffmpeg ... Codec not found` failures, so lightweight waveform or energy screening is worth considering once the core launch flow is stable
Near-term interaction work should now prioritize:
1. preserve and interpret yes/no turn constraints from observed listen rules
2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
5. start separating laptop-local capture storage from the eventual hosted retention/export path so group testing does not depend on repo-local zip handling
## Capture Storage Direction
Repo-local NDJSON plus zipped capture bundles are still good enough for current reverse-engineering and single-operator testing.
For hosted group testing, the next direction should be:
1. keep local file sinks for dev and laptop workflows
2. add a cleaner export/archive boundary so noteworthy sessions can be promoted without copying raw capture trees around manually
3. plan for hosted durable storage separately from the runtime node that is serving live robot traffic
4. keep fixture generation and sanitized replay artifacts as the stable handoff format between local testing and hosted debugging
## Working Cloud Framework
The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
1. local Jibo behavior observed by the cloud
The robot or its local skill stack already interpreted the turn and the cloud mainly tracks, acknowledges, or lightly completes it.
2. local Jibo behavior overridden or redirected by the cloud
The robot reports the turn state, but the cloud chooses a different synthetic reply path.
3. raw audio interpreted by the cloud
The robot sends buffered audio and the cloud performs transcript resolution before sending back `LISTEN`, `EOS`, and ESML-driven playback.
Those are the right primary buckets for now. Additional side channels may still emerge later, especially around proactive traffic, direct skill/service sockets, or future on-device OS changes, but they should be treated as extensions to this model until captures prove otherwise.
Latest stock-OS WOD findings:
- `word-of-the-day/right_word` closeout should not emit a synthetic `match`; otherwise Jetstream promotes it into `globalTurnResult` and Global Service relaunches Nimbus a few seconds later with a `Cloud Skill Response Timeout`.
- Voice `play word of the day` hotphrase launch still enters Global Service first, so a synthetic `LISTEN` result alone is not enough. The next-most-correct transport hint is a direct `SKILL_REDIRECT` event aimed at `@be/word-of-the-day`, alongside the menu-shaped `LISTEN` payload.
- Stock OS also keeps the original hotphrase/global launch cloud response promise alive even after the redirect succeeds, so voice WOD launch needs an explicit silent `SKILL_ACTION` completion on the same transID to avoid later cloud-response culling and an interrupted game state.
- Auto-dismissing `word-of-the-day/right_word` with a no-input `LISTEN`/`EOS` stops the listening ring, but it does not close the WOD UI by itself. Pairing that no-input closeout with an explicit redirect back to `@be/idle` is the current cleanest approximation.
- OTA/update yes-no prompts can advertise `$YESNO` only through ASR hints rather than `listenRules`, so short denials like `no` need to be recognized from `listenAsrHints` too.
- Spoken WOD guesses should preferentially snap to the closest offered hint when Whisper lands very close to one of the menu words, since near-misses like `haglet` for `aglet` are common in live testing.
- The stock robot still misroutes constrained local turns if the cloud echoes `globals/*` rules back on the reply. For spoken WOD guesses and settings/update `no`, we should only return the local rule (`word-of-the-day/puzzle`, `settings/download_now_later`, etc.) so Global Service does not relaunch Nimbus.
Latest radio discovery findings:
- `@be/radio` is a true local skill, not a cloud placeholder.
- Its `open(result, refresh, previousSkillName)` path treats `result.nlu.intent === "menu"` as a `play` launch.
- `result.nlu.entities.station` is the genre selector, and `Country` is a real supported station key from the robot's `genres.json`.
- The smallest stock-shaped cloud handoff for voice launch is therefore a local `SKILL_REDIRECT` to `@be/radio` with `nlu.intent = "menu"`, optional `entities.station`, and a silent completion to settle the hotphrase cloud response.
Latest news discovery findings:
- Nimbus explicitly treats `match.cloudSkill === "news"` like the GQA path and waits on `cloudSkillResponse`.
- The first OpenJibo news pass should therefore use a real cloud-skill shape, not a generic placeholder chat reply.
- For now, the content can stay synthetic while the protocol is grounded: `match.cloudSkill = "news"` plus a supported `SLIM` announcement response is enough to validate the robot path before provider-backed headlines arrive later.
Latest clock discovery findings:
- `@be/clock` is a real local skill with `clock`, `timer`, and `alarm` domains.
- Menu launches use `intent = "menu"` with `entities.domain` set to the target sub-area.
- The `jibo test 15` bundle shows stock OS 1.9 rejecting our older top-level `timerValue` launch with `found no matching transition`, so the safer cloud contract is a stock-style `start` intent with the timer/alarm entities attached.
- The same bundle also shows local follow-up rules like `clock/timer_set_value`, so bare replies such as `five minutes` or `ten twenty five` need to be parsed when the robot is already collecting a timer/alarm value.
- The newest `.NET` pass now routes `open the clock` into the direct `askForTime` clock-view path, moves plain time/date/day questions onto stock-shaped local `@be/clock` handoffs, and keeps malformed timer/alarm requests on a clarification reply path instead of generic chat echo.
- The `jibo test 17` bundle shows two remaining clock realities on stock OS 1.9: some alarm misses are genuine STT loss before the cloud ever sees the minutes, and empty cleanup turns like `clock/alarm_timer_okay` must stay local instead of degrading into `heyJibo`/Nimbus.
- When the robot context includes a usable local `runtime.location.iso`, ambiguous alarm times now prefer the next real local occurrence rather than defaulting blindly.
Latest photo discovery findings:
- `@be/gallery` is the local gallery browser and opens from `intent = "menu"`.
- `snapshot` and `photobooth` are not gallery submodes; stock main-menu logic remaps them into `@be/create` with `createOnePhoto` and `createSomePhotos`.
- The newest `.NET` pass keeps that routing, adds local-file persistence for media metadata, and serves stored media URLs back through `/media/{path}` as a first hosted-gallery slice.
- The remaining gap is binary fidelity: the current HTTP capture path stores request bodies as text, which is enough to preserve metadata and a placeholder payload, but may still be too lossy for perfect thumbnails/original fetches.
- The `jibo test 17` gallery blue-ring report is at least partly tangled up with the gallery-empty path: stock `@be/gallery` says `there's nothing in the gallery yet. want to take a picture now?`, so lingering mic state there is not purely a launch-routing issue.
- The `jibo test 18` bundle shows the more direct failure mode: short local replies like `yes` can stall if buffered-audio auto-finalize waits too long, and the old `OPENJIBO_AUDIO_RECEIVED` compatibility event only added robot-side warning noise while the ring stayed blue.
Latest update and state findings:
- unstaged update queries should not fabricate placeholder no-op manifests, because stock settings logic can treat any returned object like a pending update
- the hosted `.NET` cloud now persists update/media/backups state to a local state file by default, which is a better bridge toward Azure SQL / Blob storage than the old process-memory-only behavior
- The `jibo test 17` session also includes a real on-robot backup announcement and temporary settings connectivity turbulence, so not all sluggishness from that run should be attributed to the newer cloud protocol changes.
## Speech, Animation, And ESML
The current joke flow is only a small foothold into Jibo expressiveness.
Future work should map:
- direct speech modifiers
- animation selection and filtering
- embodied speech behaviors
- ESML and SSML subsets
- interactions between speech, visuals, and timing
Useful external references:
- [Speak-Tweak Docs](https://hri2024.jibo.media.mit.edu/Speak-Tweak-Docs)
- [ESML PDF](https://hri2024.jibo.media.mit.edu/attachments/SDK-SDK---ESML-121023-203758.pdf)
## Future Scope
- full endpoint inventory beyond the current Node mapping
- OTA-driven recovery
- paid hosted plans or donation-supported hosting
- deeper on-device bridge and OS modernization
- more capable skill/runtime integration
- possible LLM or tool-use patterns inspired by workshop-era experimentation
## Latest Notes
- The `jibo test 19` bundle confirmed that gallery follow-up confirmation uses the stock local `shared/yes_no` rule family, not just the create/settings/surprise yes-no families. Spoken `yes` was being heard correctly, but leaking the global rules back into Nimbus instead of staying local.
- The same bundle also confirmed some `OPENJIBO_AUDIO_RECEIVED` noise was still coming from an older running build, because the current `.NET` source no longer emits that synthetic websocket event. When a live session still shows it, operator workflow should treat that as a rebuild/restart sanity-check clue before assuming a new regression.
- Spoken `cancel alarm` should map into stock `@be/clock` `delete` semantics, not generic chat. The current cloud path now mirrors that local intent so voice cancel can follow the same lane as the robot's clock skill.
- The `jibo test 20` bundle suggests gallery itself is mostly okay in the latest pass, but clock and protocol polish still matter: stock `CLIENT_NLU intent="set"` with only `domain="alarm"` should stay on the local clarification path instead of defaulting the cloud payload to `7:00`, and stock `CLIENT_NLU intent="cancel"` on `clock/alarm_timer_query_menu` should reuse the last active clock domain so delete actually lands on alarm/timer instead of generic chat.
- The same `jibo test 20` robot logs also showed `OPENJIBO_TURN_PENDING` and `OPENJIBO_CONTEXT_ACK` are just unknown-event noise on stock OS 1.9, so the compatibility layer now keeps that turn state internally and stops sending those synthetic websocket event types to the robot.
- The `jibo test 21` bundle confirms the first gallery path is healthy enough to open `@be/gallery`, ask the stock `shared/yes_no` follow-up, and hand into `@be/create` for a photo; the remaining alarm pain in that round was mostly the transcript collapsing to `set an alarm for suddenly` / `set an alarm for...`, which means the right fix is to keep `alarm_clarify` local by handing straight into `@be/clock` with `intent="set"` and `domain="alarm"` instead of asking the clarification through Nimbus-only cloud speech.
- The same bundle also showed stock OS 1.9 still logs fallback `OPENJIBO_ACK` packets as unknown-event noise, so the websocket compatibility layer now drops unrecognized inbound message types silently instead of replying with a synthetic ack the robot does not understand.
- That bundle still contains real `ffmpeg` / `whisper.cpp` failures in the buffered-audio STT seam, and it also includes a genuine `jibo-server-service` broken-pipe / server-connection-lost episode, so not every freeze in that round should be blamed on cloud turn routing alone.
## MCP-Like Ideas
Recent MIT workshop materials suggest experimentation around modern AI tooling for Jibo, including an MCP-oriented idea. We should treat that as inspiration for future OpenJibo directions, not as a present dependency or supported integration.
- local `whisper.cpp` STT remains a discovery seam, not production ASR
- media upload/body handling is not binary-safe enough for final gallery originals and thumbnails
- state persistence is local JSON, not Azure SQL / Blob Storage
- update, backup, and restore are not end-to-end proven
- news content is synthetic
- weather, calendar, commute, personal report, identity, memory, and proactivity are still mostly discovery or placeholder content paths
- volume, stop, robot age, and command-versus-question personality routing are not implemented yet
## `1.0.19` Direction
After `1.0.18` is tested and tagged, `1.0.19` should move back into feature work:
- one lightweight device-control feature, most likely stop or volume
- end-to-end update/backup/restore proof
- STT reliability improvements, including noise screening and a managed STT comparison
- provider-backed first content path, likely news or weather
- hosted capture/export boundary for group testing
- continued Pegasus/JiboOS-backed mapping for proactivity, memory/history, Lasso-style aggregation, and identity
## Azure Direction
The target hosted footprint remains:
- Azure App Service for HTTP and WebSocket traffic
- Azure SQL for accounts, devices, sessions, host mappings, updates, media metadata, and provisioning records
- Azure Blob Storage for media bodies, upload artifacts, update payloads, and curated capture bundles
- Azure Key Vault for secrets and certificates
- Application Insights for diagnostics and live-test observability
Local JSON persistence is only a stepping stone. Do not design new feature slices as if local file state were the final hosted store.

View File

@@ -2,440 +2,454 @@
## Purpose
This backlog turns the current discovery work into a concrete implementation queue for the hosted `.NET` cloud.
This backlog turns discovery into implementation slices for the hosted `.NET` cloud.
Use it as the source of truth for the next feature slice instead of continuing the same investigation in chat each time.
## How To Use This Backlog
1. Pick one slice.
2. Confirm the target payload shape from captures and robot source.
3. Implement the smallest working parity path in `.NET`.
4. Test it live on stock OS `1.9`.
5. Update this file with results, regressions, and next guesses before moving on.
Use it as the working queue when picking the next feature or bug-fix slice. The release pattern is: implement a narrow slice, test it on stock OS `1.9`, update this file with what happened, then either close the release or roll the next larger idea forward.
Status key:
- `implemented`: present in current source and covered by focused tests
- `polish`: implemented enough to test, but still needs live proof or small cleanup
- `ready`: grounded enough to implement now
- `discovery`: more robot-source or capture work needed first
- `polish`: behavior exists but needs cleanup
- `discovery`: more Pegasus, JiboOS, capture, or log work needed first
- `blocked`: waiting on infrastructure, provider choice, or a risky unknown
Parallel tags:
Tags:
- `protocol`: websocket / turn-shape work
- `content`: provider or cloud content work
- `docs`: runbook / operator guidance
- `stt`: transcript reliability work
- `protocol`: websocket, HTTP, or stock payload shape
- `content`: provider data or response content
- `docs`: operator docs, runbooks, or capture process
- `stt`: transcript reliability
- `storage`: persistence, media, backups, or hosted export
## Immediate Queue
## Current `1.0.18` Snapshot
Current cloud version: `1.0.18`
Runtime truth:
- hosted `.NET` projects and cloud tests target `net10.0`
- version source of truth is [OpenJiboCloudBuildInfo.cs](../src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/OpenJiboCloudBuildInfo.cs)
- `/health`, startup logging, and spoken `cloud version` are aligned with that constant
Current release theme:
- alarm and photo/gallery quirks have received the main bug-fix attention
- Word of the Day cleanup, constrained yes/no routing, unknown websocket event suppression, and local state persistence are already in the current code
- radio, ESML apostrophe cleanup, and first news are implemented in source/tests and need live confidence before the version is called complete
## Immediate `1.0.18` Queue
### 1. Radio Resume And Genre Launch
- Status: `polish`
- Tags: `protocol`
- Why now: the code path is implemented and test-backed, and it is a low-risk local-skill expansion after Word of the Day.
- Current code:
- `open the radio` maps to `@be/radio` with `intent = menu`
- `play country music` maps to `@be/radio` with `entities.station = Country`
- websocket output includes `LISTEN`, `EOS`, local `SKILL_REDIRECT`, and silent completion
- Evidence:
- JiboOS `@be/radio` treats `menu` as a play launch and reads `result.nlu.entities.station`
- `Country` is a supported station key in the inspected genre metadata
- Exit criteria:
- live `open the radio` resumes or opens radio without generic chat speech
- live `play country music` opens a country station
- no new stock-OS unknown-event noise appears in the radio launch path
- Next action:
- run this in the `1.0.18` live regression pass and capture both websocket payloads and robot logs
### 2. News Through Nimbus
- Status: `polish`
- Tags: `protocol`, `content`
- Why now: the first Nimbus-compatible cloud path is implemented and test-backed; content can stay synthetic for `1.0.18`.
- Current code:
- `tell me the news` maps to `IntentName = news`
- outbound listen match includes `cloudSkill = news`
- `SKILL_ACTION` uses skill id `news` and `mim_id = runtime-news`
- Evidence:
- JiboOS Nimbus checks `match.cloudSkill === "news"` and waits for a cloud response
- Exit criteria:
- live `tell me the news` reaches a non-placeholder Nimbus path
- the robot behavior feels like a cloud skill response, not generic chat playback
- Next action:
- live-test the first pass; provider-backed headlines can wait for `1.0.19`
### 3. Backup / OTA / Share Yes-No Reliability
- Status: `polish`
- Tags: `protocol`, `stt`
- Why now: constrained yes/no behavior affects daily-use prompts and was tangled with the alarm/photo/gallery work.
- Current code:
- yes/no detection reads `listenRules`, `clientRules`, and `$YESNO` hints
- covered prompt families include `settings/download_now_later`, `surprises-ota/want_to_download_now`, `surprises-date/offer_date_fact`, `shared/yes_no`, and `create/is_it_a_keeper`
- outbound replies strip global rules and keep the local rule
- no-input fallback for constrained prompts emits local `LISTEN`/`EOS`
- Exit criteria:
- spoken `yes` and `no` work on update, backup, share/offer, and gallery/create prompts
- empty or missed short replies retry locally instead of relaunching Nimbus or generic chat
- Next action:
- include these prompt families in the `1.0.18` live regression pass
### 4. Alarm And Photo Gallery Release Regression
- Status: `ready`
- Tags: `protocol`, `stt`
- Why now: this is the main bug-fix theme for `1.0.18`.
- Current code:
- alarm values parse explicit, compact, spaced, hyphenated, and local-context ambiguous times
- missing alarm times stay in local `@be/clock` clarification
- alarm cancel can reuse the last active clock domain
- gallery opens as `@be/gallery`; snapshot and photobooth open through `@be/create`
- passive gallery/create context no longer reopens stale cloud turns
- Exit criteria:
- gallery opens, offers to take a picture if empty, accepts `yes`, and hands into create
- alarm set, clarify, and cancel flows behave locally without blue-ring stale turns
- failures caused by collapsed STT transcripts are logged as STT issues rather than misdiagnosed as payload bugs
- Next action:
- run a stock OS `1.9` regression bundle before declaring `1.0.18` complete
### 5. Optional Small Feature Before `1.0.18` Freeze
- Status: `ready`
- Tags: `protocol`
- Why now: `@be/radio` is a real local skill and is the clearest low-risk expansion after Word of the Day.
- User goals:
- `open the radio` should resume the current or last station
- `play country music` should open a country station on iHeartRadio
- Current evidence:
- [index.js](C:/Projects/JiboOs/V3.1/build/opt/jibo/Jibo/Skills/@be/be/node_modules/@be/radio/index.js) resumes from `lastStation`
- the same file treats `menu` as a `play` launch and reads `result.nlu.entities.station`
- the same file confirms `menu + no station` is the clean resume path and `menu + station=Country` becomes a direct genre launch
- Implementation notes:
- add phrase routing for radio open/resume and genre launch
- inspect radio genre and station metadata before locking the outbound entity values
- prefer the same payload shape the menu path uses instead of a generic cloud speech reply
- Exit criteria:
- voice `open the radio` launches radio successfully
- voice `play country music` launches a country station
- no fallback cloud placeholder reply is spoken on success
- Why now: the user wants one or two features before `1.0.18` is called complete, but the release should not take on a risky subsystem.
- Preferred candidates:
- Stop command
- Volume up / volume down voice control
- How old are you / robot age persona
- Guidance:
- pick only one if the live regression pass finds bugs
- pick at most two if the current bug-fix paths stay stable
- keep the implementation source-backed and easy to revert or defer
### 2. ESML Apostrophe Encoding Bug
## Implemented In Current Source
- Status: `ready`
### ESML Apostrophe Encoding Bug
- Status: `implemented`
- Tags: `polish`
- Why now: this is a small, high-confidence speech quality bug affecting many paths.
- Current evidence:
- [ResponsePlanToSocketMessagesMapper.cs](C:/Projects/JiboExperiments/OpenJibo/src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/ResponsePlanToSocketMessagesMapper.cs) currently escapes `'` to `&apos;`
- the robot is pronouncing the encoded form instead of treating it as natural text
- Implementation notes:
- stop encoding apostrophes in spoken ESML text unless a capture proves a narrower escaping rule is needed
- keep escaping for `&`, `<`, and `>`
- Exit criteria:
- contractions and possessives sound natural again in live speech
- Result:
- apostrophes remain natural in spoken ESML
- `&`, `<`, `>`, and `"` are still escaped
- covered by `ResponsePlanMapper_EscapesSpeechWithoutEncodingApostrophes`
- Follow-up:
- none unless a live capture proves another ESML escaping edge case
### 3. Backup / OTA Yes-No Reliability
### Radio First Pass
- Status: `ready`
- Tags: `protocol`, `stt`
- Why now: the update and backup prompts are real daily-use system flows and still feel fragile.
- Current evidence:
- `surprises-ota` is a real robot-side skill family in [index.js](C:/Projects/JiboOs/V3.1/build/opt/jibo/Jibo/Skills/@be/be/node_modules/@be/surprises-ota/index.js)
- we already improved constrained yes-no routing, but live tests still show some turns collapse into empty transcript or generic speech
- Implementation notes:
- keep local rules only on constrained replies
- improve empty-turn retry behavior for settings and OTA prompts
- capture whether stock OS uses a different yes-no prompt shape in backup versus update flows
- investigate why the current cloud wiring appears to make the robot think updates are constantly available
- Exit criteria:
- spoken `yes` and `no` reliably work on backup and update prompts
- empty or missed turns retry locally without relaunching Nimbus
### 4. Proactive Share / Offer Yes-No Reliability
- Status: `ready`
- Tags: `protocol`, `stt`
- Why now: the latest capture bundle shows a second yes-no family where the robot asks whether it can share something, and spoken `yes` is still being handled like unconstrained speech instead of a reply to the active prompt.
- Current evidence:
- the attached `jibo test 13` session includes both examples in one bundle:
- a proactive or share-style prompt where spoken `yes` was treated as generic speech
- a later update prompt where spoken `no` was accepted correctly
- the share prompt uses `surprises-date/offer_date_fact` with `$YESNO`, and the failing reply leaked `globals/*` rules back into a Nimbus relaunch
- Implementation notes:
- compare the active listen rules, ASR hints, and local skill ownership for the share-style prompt versus OTA prompts
- make constrained yes-no detection cover this prompt family without regressing the already-working update `no` path
- prefer local retry or local completion behavior over falling back into generic chat or Nimbus
- Exit criteria:
- spoken `yes` and `no` work on share / offer prompts with the same reliability as the OTA path
- constrained yes-no handling is generalized by prompt family instead of hard-coded only for updates
## Near-Term Queue
### 5. News Through Nimbus / Personal Report
- Status: `ready`
- Tags: `protocol`, `content`
- Why now: Nimbus already exposes a `news` cloud hook, so this is the next best cloud-first skill after radio.
- Current evidence:
- [ProcessCloud.ts](C:/Projects/JiboOs/V3.1/build/opt/jibo/Jibo/Skills/@be/be/node_modules/@be/nimbus/src/states/ProcessCloud.ts) checks for `cloudSkill === 'news'`
- Nimbus analytics and assets also reference `personal-report`
- Implementation notes:
- decide whether the first pass is a simple headline summary or a closer personal-report style payload
- confirm whether stock OS expects `news` as a dedicated cloud skill or under the broader personal-report family
- Latest progress:
- first pass should use Nimbus's supported cloud path by setting `match.cloudSkill = news` and returning a supported `SLIM` announcement
- provider-backed headlines can follow later under the `Lasso / Knowledge And Event Aggregation` track
- Exit criteria:
- `tell me the news` reaches a non-placeholder live path
- robot behavior feels Nimbus-native rather than generic chat playback
### 6. Clock Family Audit
- Status: `in_progress`
- Status: `implemented`
- Tags: `protocol`
- Why now: clock, date, timer, and alarm menu hooks are already visible in captures and the robot repo has a real `@be/clock` skill.
- Current evidence:
- [protocol-inventory.md](C:/Projects/JiboExperiments/OpenJibo/docs/protocol-inventory.md) already tracks menu intents for `askForTime`, `askForDate`, `timerValue`, and `alarmValue`
- `@be/clock` exists in the robot skill inventory
- `JiboOs` shows `@be/clock` branches on `entities.domain = clock | timer | alarm`, uses `intent = menu` for menu launches, and has distinct local value-collection rules such as `clock/timer_set_value`
- [artifact-output/jibo-test-15](C:/Projects/JiboExperiments/artifact-output/jibo-test-15) shows stock OS 1.9 rejecting our older `timerValue` top-level launch with `found no matching transition`, which points to a stock-style `start` flow plus local follow-up value rules instead
- Implementation notes:
- compare our custom time/date path against actual menu payloads
- keep direct clock/date/day local, but treat timer and alarm as a two-part flow: stock start intent plus bare follow-up parsing on `clock/*_set_value`
- decide whether timer and alarm should stay robot-local with cloud acknowledgement, or whether cloud needs to shape the launch and follow-up turns
- Progress so far:
- voice `open the clock` now routes to the direct local `askForTime` clock-view path instead of the broader clock menu
- voice `what time is it`, `what's today's date`, and `what day is it` now use stock-shaped local `@be/clock` handoffs instead of custom cloud-only speech
- voice `set a timer for five minutes`, `set an alarm for 7:30 am`, `set an alarm for 830`, and `set an alarm for 8 30` now emit direct `timerValue` / `alarmValue` payloads with the entities the local skill expects
- partial timer/alarm requests such as `set a timer` and `set an alarm` now stay on a controlled clarification reply path instead of drifting into Nimbus/chat echo
- Exit criteria:
- time/date behavior stays correct
- timer and alarm launch or set correctly from both menu and voice where applicable
- Result:
- phrase routing and websocket redirect/completion are implemented for radio resume/open and genre launch
- Follow-up:
- live validation remains in the immediate queue
### 7. Photo Family Audit
### News First Pass
- Status: `in_progress`
- Tags: `protocol`, `docs`
- Why now: photo confirmation improved already, and the robot skill inventory includes `gallery`.
- Current evidence:
- `@be/gallery` exists in the robot skill inventory
- current captures already show `snapshot` and related menu destinations
- `JiboOs` shows `@be/gallery` opens from `intent = menu`, while `snapshot` and `photobooth` actually map into `@be/create` with `createOnePhoto` and `createSomePhotos`
- Implementation notes:
- separate three flows:
- snap a picture
- photo gallery
- photobooth
- document whether each one is local-only, cloud-assisted, or upload-backed
- Progress so far:
- voice `open photo gallery` now launches local `@be/gallery` with a stock-shaped `menu` handoff
- voice `snap a picture` now launches local `@be/create` with `createOnePhoto`
- voice `open photobooth` now launches local `@be/create` with `createSomePhotos`
- media and update metadata now persist to a local state file in the hosted `.NET` path, so gallery and staged update state are no longer strictly process-memory-only
- `Media.Create` now retains uploaded metadata plus a best-effort raw body placeholder and serves the same media URL back through `/media/{path}`
- Open questions:
- whether stock Jibo treats captured media as a short-lived local cache until cloud upload completes
- what binary upload path and metadata are needed so gallery content persists instead of aging out locally
- whether hosted OpenJibo should store originals, thumbnails, or both
- whether the current lossy HTTP body capture is enough for stock gallery thumbnails, or whether we need a binary-safe upload persistence path next
- Exit criteria:
- known photo menu and voice phrases map to the correct local path
- capture storage expectations are documented for laptop versus hosted testing
### 8. Update, Backup, And Restore End-To-End Proof
- Status: `ready`
- Tags: `protocol`, `docs`
- Why now: prompt routing is only part of the lifecycle; we still need to prove a realistic maintenance and recovery story.
- Current evidence:
- `@be/settings` contains update flows and explicit `jibo.kb.loop.hasKeyBackup(...)` checks for key-backup state
- `@be/restore` is a dedicated local skill that waits for a UGC key, runs `jibo.systemManager.restore(...)`, and reboots on completion or failure
- live behavior suggests the current cloud may be advertising updates too eagerly, leaving the robot thinking updates are always pending
- Implementation notes:
- inspect how OpenJibo advertises update manifests so the robot does not repeatedly think an update exists when nothing meaningful is pending
- prove one successful backup path, one successful update delivery path, and one successful restore path
- document the operator steps, risk boundaries, and recovery expectations before broader rollout
- Latest progress:
- unstaged update queries no longer fabricate a placeholder no-op manifest, which should reduce the phantom `always has updates` behavior during normal operation
- real staged updates can still be created explicitly through the protocol layer when we are ready to prove end-to-end delivery
- Exit criteria:
- no phantom "always has updates" behavior in normal operation
- one controlled update can be delivered successfully
- one controlled backup can be taken successfully
- restore behavior is understood and documented well enough to recover a test robot intentionally
## Discovery Queue
### 9. Weather As Cloud Report Plus Local Presentation
- Status: `discovery`
- Status: `implemented`
- Tags: `protocol`, `content`
- Why later: there is strong evidence for weather assets under Nimbus, but not for a standalone local skill package.
- Current evidence:
- Nimbus assets include personal-report weather content
- no standalone `@be/weather` package is present in the inspected Be skill inventory
- Questions to answer:
- is weather a dedicated cloud skill, a personal-report branch, or both
- what payload shape triggers the local animation / embodiment layer
- whether the first pass should be cloud speech only or forecast plus presentation metadata
- Result:
- Nimbus-shaped `news` cloud-skill lane is implemented with synthetic briefing content
- Follow-up:
- live validation remains in the immediate queue
- provider-backed headlines belong in `1.0.19` or later
### 10. Proactivity Selector And Surprise Offers
### Clock / Alarm Family
- Status: `discovery`
- Tags: `protocol`, `content`, `docs`
- Why later: the original architecture and recent proactive captures suggest proactivity is a first-class cloud subsystem, not just ordinary chat that starts itself.
- Current evidence:
- the attached original Jibo architecture diagram shows a cloud-side `Proactivity Selector`, `Proactivity Catalog`, and robot-side proactive trigger plumbing
- [jibo test 13.txt](C:/Projects/JiboExperiments/artifact-output/jibo-test-13/jibo%20test%2013.txt) and its websocket artifacts show a proactive-style `I have something to share with you` offer and later proactive `TRIGGER` traffic
- `@be/surprises`, `@be/surprises-date`, and `@be/surprises-ota` already exist as local robot-side building blocks
- Questions to answer:
- what minimum cloud-side selector we need for stock-OS-compatible surprise offers
- how proactive `TRIGGER` traffic should map into a hosted OpenJibo proactivity service
- whether `surprises-date/offer_date_fact` should be the first end-to-end proactive offer we intentionally support
- Implementation notes:
- model proactivity as its own orchestrator separate from ordinary conversational turn routing
- include offer, constrained yes/no, fulfillment, and dismissal behavior in the design
- preserve the artifact linkage to the original architecture diagram and `jibo-test-13`
- Status: `implemented`
- Tags: `protocol`
- Result:
- time/date/day and clock open route through local `@be/clock`
- timer/alarm menu, value, clarify, and delete are implemented
- compact and spoken alarm parsing has focused tests
- client NLU alarm clarify/cancel cases from `jibo test 20` and `jibo test 21` are reflected in source
- Follow-up:
- live regression remains in the immediate queue
### 11. Surprises Routing
### Photo / Gallery / Create Family
- Status: `discovery`
- Tags: `protocol`, `content`
- Why later: `@be/surprises` is a router, not a single experience, so we should not wire this blindly.
- Current evidence:
- [SurpriseSkill.ts](C:/Projects/JiboOs/V3.1/build/opt/jibo/Jibo/Skills/@be/be/node_modules/@be/surprises/src/SurpriseSkill.ts) selects among surprise categories
- `surprises-date` and `surprises-ota` show category-specific branches already exist
- Questions to answer:
- should `surprise me` enter the top-level surprise router
- which categories still depend on cloud services versus fully local logic
- whether stock OS `1.9` differs materially from the `3.1` source snapshot here
- Status: `implemented`
- Tags: `protocol`, `storage`
- Result:
- gallery, snapshot, and photobooth voice paths route to the correct local skills
- media metadata persists locally
- `/media/{path}` serves the current text-body placeholder payload
- Follow-up:
- live regression remains in the immediate queue
- binary-safe media storage remains future work
### 12. History / Memory Layer
### Word Of The Day Cleanup
- Status: `discovery`
- Tags: `content`, `docs`
- Why later: the original architecture explicitly calls out `History`, and that likely maps to the kind of durable personal memory we want for names, preferences, and remembered facts.
- Current evidence:
- the attached original Jibo architecture diagram includes a dedicated `History` component in cloud storage
- stock Jibo behavior historically included awareness of names, birthdays, holidays, and special dates
- Questions to answer:
- what data belongs in memory versus account/profile versus skill-specific storage
- how much of the original behavior was robot-local versus cloud-backed
- what the first safe OpenJibo memory slice should be
- Implementation notes:
- plan for person identity, preferred name, birthday, relationship facts, and notable dates
- keep the first design privacy-aware and easy to host
- treat this as shared infrastructure that other skills can consume rather than a standalone feature
- Status: `implemented`
- Tags: `protocol`
- Result:
- voice launch uses menu-shaped local payload plus redirect/completion
- structured and spoken guesses complete correctly
- line-number guesses use hint order
- close hint matching handles near misses
- `right_word` cleanup can no-input close and redirect to `@be/idle`
- late same-turn audio is ignored during cleanup
- Follow-up:
- keep this in regression coverage because it shares turn-state machinery with gallery and alarm flows
### 13. Lasso / Knowledge And Event Aggregation
### Unknown OpenJibo Event Noise
- Status: `discovery`
- Tags: `content`
- Why later: the original architecture diagram suggests `Lasso` sits between the hub and outside data sources, which likely explains how Jibo knew about news, calendar items, holidays, and other structured world events.
- Current evidence:
- the attached original Jibo architecture diagram shows `Lasso` connected to 3rd-party data such as AP News, Dark Sky, GCalendar, Wolfram, and other external sources
- stock Jibo behavior historically covered holidays, birthdays, special events, and topical knowledge
- Questions to answer:
- whether `Lasso` should be recreated as a single aggregation service or as several focused providers behind a shared interface
- which parts are needed for news, weather, calendar, commute, astrology/date facts, and holidays
- what subset is practical for a hosted OpenJibo v1
- Implementation notes:
- treat holidays and special dates as first-class backlog scope here
- use this item to drive future provider work for news, weather, calendar, commute, and event awareness
- Status: `implemented`
- Tags: `protocol`
- Result:
- current websocket service drops unknown inbound message types silently
- synthetic `OPENJIBO_TURN_PENDING`, `OPENJIBO_CONTEXT_ACK`, and fallback `OPENJIBO_ACK` should no longer be emitted by current source
- Follow-up:
- if live logs show those event types, first verify the deployed process is actually the current build
### 14. Personal Report, Calendar, And Commute
### Update Phantom Manifest Fix
- Status: `discovery`
- Tags: `protocol`, `content`
- Why later: these are already stubbed in `.NET`, but the robot-side ownership still needs clearer mapping.
- Current evidence:
- current `.NET` placeholders live in [InMemoryJiboExperienceContentRepository.cs](C:/Projects/JiboExperiments/OpenJibo/src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Content/InMemoryJiboExperienceContentRepository.cs)
- Nimbus has personal-report hooks, but the exact cloud contract still needs confirmation
- Questions to answer:
- should calendar and commute be independent feature paths or sections inside personal report
- what minimum provider data shape lets Jibo present these naturally
### 15. Who Am I / Identity Management
- Status: `implemented`
- Tags: `protocol`, `storage`
- Result:
- `GetUpdateFrom` returns an empty object when no update is staged
- staged updates can still be created explicitly
- Follow-up:
- end-to-end update delivery and restore proof remains future work
- Status: `discovery`
- Tags: `protocol`, `content`, `docs`
- Why later: there is a real local `@be/who-am-i` skill, which likely covers user identification, name capture, and enrollment cues that matter for a modern identity layer.
- Current evidence:
- `@be/who-am-i` exists in the stock skill inventory
- the skill source references `jibo.kb.loop`, loop owner / loop member lookup, enrollment state, hypothesis views, and a `Who Am I_ Collect Name` flow
- Questions to answer:
- whether `who am I` is primarily recognition, enrollment, or profile correction
- how name, face, and voice enrollment were originally split between robot-local state and cloud services
- what the minimum hosted-cloud contract is to make identity feel native again
- Implementation notes:
- tie this work back to the broader `History / Memory Layer`
- capture whether the first useful slice is recognition-only, rename-only, or full enrollment support
## Near-Term `1.0.19` Queue
### 16. Onboarding, Loop Management, And Fresh Start
### 6. Stop Command
- Status: `discovery`
- Tags: `protocol`, `docs`
- Why later: stock Jibo onboarding and household management were app-driven, and a hosted OpenJibo path will need a replacement for adding/removing people and setting ownership cleanly.
- Current evidence:
- `@be/first-contact`, `@be/introductions`, `@be/tutorial`, and `@be/restore` all exist in the stock skill inventory
- `@be/who-am-i` and `@be/chitchat` both reference `jibo.kb.loop`, loop owner, and loop members
- `@be/restore` and `@be/settings` show explicit wipe / restore / reboot behavior, which suggests there is a meaningful "fresh start" lifecycle to support
- Questions to answer:
- how a new owner or household should be provisioned without the original mobile app
- how to add, remove, and re-enroll loop members safely
- whether the right replacement is a lightweight web app, an operator-only admin flow, or both
- Implementation notes:
- include ownership transfer, fresh start, and post-restore re-onboarding in scope
- figure out what minimum loop-management UI or API a hosted OpenJibo v1 needs
### 17. Stop Command
- Status: `ready`
- Tags: `protocol`
- Why later: Jibo can be interrupted by any command, but it would be nice to have a dedicated "stop" type of command.
- Current evidence:
- `@be/idle` exists in the stock skill inventory, so there is at least a natural local resting target
- Questions to answer:
- Can we find in the original source evidence for this skill or stop word phrase?
- User goals:
- `stop`
- `stop that`
- `never mind`
- Evidence:
- `@be/idle` exists and is already used as a cleanup redirect target
- Questions:
- whether stock source has a dedicated stop/cancel intent beyond idle redirect
- whether stop should interrupt active local skills or only cloud speech paths in the first pass
- Exit criteria:
- a spoken stop command settles the robot locally without a generic chat reply
### 7. Volume Up / Volume Down Voice Control
### 18. Volume Up / Volume Down Voice Control
- Status: `ready`
- Tags: `protocol`
- Why later: this is a simple, high-value device-control command that should feel native once the local payload shape is confirmed.
- User goals:
- `turn it up`
- `turn it down`
- `increase the volume`
- `decrease the volume`
- Current evidence:
- stock Jibo exposes volume control through the robot UX, so there should be an existing local path or service contract we can mirror
- this belongs with the other lightweight voice device controls rather than generic cloud chat
- Implementation notes:
- inspect the stock `@be` inventory and captures for volume-related intents, rules, or settings hooks
- prefer a local robot control payload over synthetic cloud speech
- decide whether first pass should support relative changes only, or also absolute requests like `set volume to 5`
- Evidence:
- stock Jibo exposes volume control through robot UX, so there should be a local control or settings path to mirror
- Questions:
- exact local payload shape for relative volume changes
- whether first pass should support absolute values such as `set volume to 5`
- Exit criteria:
- voice increase and decrease commands adjust the robot volume reliably
- the behavior feels local and immediate, not like a chat reply
- relative voice volume commands adjust volume without generic cloud speech
### 19. How Old Are You / Robot Age Persona
- Status: `discovery`
- Tags: `protocol`, `content`
- Why later: this is a strong personality/detail feature, but it may depend on first-power-up metadata or a stock persona path we have not mapped yet.
- User goals:
- `how old are you`
- age replies that sound like stock Jibo, including first-boot date and zodiac/personality flavor when available
- Current evidence:
- observed stock-style response from a YouTube transcript:
- `I was first powered up on January 31st, 2018, which makes me five days old. I'm an Aquarius.`
- this suggests the answer may be based on a stored first-powered-up date, not just a fixed build timestamp
- Implementation notes:
- inspect the stock `@be` inventory and captures for age, birthday, zodiac, or first-contact metadata hooks
- decide whether the first OpenJibo slice should:
- use stored robot first-boot / first-cloud-seen metadata
- compute age dynamically from that date
- optionally add zodiac flavor from the same date
- if no stock path is found, provide a cloud-powered fallback that still sounds native
- Exit criteria:
- `how old are you` returns a stable, personality-consistent answer
- the answer is grounded in stored robot lifecycle data instead of a hard-coded line
### 20. Command Vs Question Reply Style
- Status: `ready`
- Tags: `content`, `polish`
- Why later: Jibo historically responded differently when you commanded a skill versus when you asked about liking or wanting to do that skill, and that conversational nuance is part of what made him feel smart.
- User goals:
- `dance` or `do a dance` should sound like a willing action reply, then perform the skill
- `do you like to dance` should sound like an answer to the question first, not the same canned command reply
- Current evidence:
- observed behavior from stock Jibo:
- command-style `dance` -> something like `I like to dance` then dance
- question-style `do you like to dance?` -> something like `You bet I do`
- current OpenJibo skill replies are mostly canned by skill, without distinguishing question intent versus imperative intent
- Implementation notes:
- evolve simple reply collections into structured variants such as:
- `commandReplies`
- `questionReplies`
- optional `confirmationReplies`
- add a lightweight classifier for imperative versus question tone before reaching for a full LLM
- start with `dance`, then reuse the pattern for other expressive skills where stock Jibo clearly answered differently depending on phrasing
- keep the first version rule-based and cheap so it still works well before a future LLM-backed layer exists
- Exit criteria:
- at least one skill family such as `dance` gives distinct replies for command versus question forms
- the approach is reusable for other skill reply families without a large rewrite
## Support Tracks
### 21. Hosted Capture And Storage Plan
### 8. Update, Backup, And Restore End-To-End Proof
- Status: `ready`
- Tags: `docs`
- Why now: repo-local zip bundles are fine for solo testing but not for group rollout.
- Implementation notes:
- define a clean boundary between local capture sinks and hosted archival/export
- document how group testers should submit sessions without touching repo paths directly
- Tags: `protocol`, `storage`, `docs`
- Why next:
- prompt routing is improved, but lifecycle proof is still missing
- Current evidence:
- `@be/settings` contains update and backup flows
- `@be/restore` waits for a UGC key, runs restore, and reboots
- no-op update fabrication has been removed from `.NET`
- Exit criteria:
- no phantom "always has updates" behavior
- one controlled update can be staged and delivered
- one controlled backup can be taken
- restore behavior is documented well enough to recover a test robot intentionally
### 22. STT Upgrade And Noise Screening
### 9. STT Upgrade And Noise Screening
- Status: `ready`
- Tags: `stt`
- Why now: feature work is moving again, but missed short replies still block otherwise-correct flows.
- Why next:
- feature paths are now often correct when a transcript exists, but short replies and low-quality audio still block otherwise-correct flows
- Current evidence:
- local buffered STT still fails on some turns with `ffmpeg` / `whisper.cpp` issues
- low-energy or background-noise turns are still being sent down paths that should probably short-circuit earlier
- live captures still show `ffmpeg` and `whisper.cpp` failures
- yes/no and alarm flows are especially sensitive to short or collapsed transcripts
- Implementation notes:
- evaluate lightweight waveform or energy gating before transcription
- compare a managed STT provider against the current local toolchain
- add lightweight waveform or energy screening before transcription
- compare managed STT against the local toolchain
- keep synthetic transcript hints for fixture replay
## Suggested Order Of Execution
### 10. Hosted Capture And Storage Plan
1. Radio resume and genre launch
2. ESML apostrophe fix
3. Backup / OTA yes-no reliability
4. Proactive share / offer yes-no reliability
5. News
6. Clock family
7. Photo family
8. Update, backup, and restore proof
9. Weather
10. Proactivity selector and surprise offers
11. Surprises
12. History / memory layer
13. Lasso / knowledge and event aggregation
14. Personal report, calendar, and commute
15. Who Am I / identity management
16. Onboarding / loop management / fresh start
17. Stop command
18. Volume up / volume down voice control
19. How old are you / robot age persona
20. Command vs question reply style
21. Hosted capture/storage and STT improvements as parallel tracks
- Status: `ready`
- Tags: `docs`, `storage`
- Why next:
- repo-local captures work for single-operator testing, but group testing needs a cleaner archival/export boundary
- Implementation notes:
- define local capture sinks versus hosted retention
- decide how testers submit noteworthy sessions
- preserve sanitized fixtures as the durable parity artifact
### 11. Binary-Safe Media Storage
- Status: `ready`
- Tags: `storage`, `protocol`
- Why next:
- the first gallery bridge stores metadata and text-body placeholders, but final gallery support needs originals and thumbnails
- Questions:
- whether stock gallery expects originals, thumbnails, or both
- what upload metadata must survive for gallery refresh
- how to map this cleanly to Blob Storage
## Discovery Queue
### 12. Weather As Cloud Report Plus Local Presentation
- Status: `discovery`
- Tags: `protocol`, `content`
- Evidence:
- Nimbus and Pegasus contain personal-report weather assets and Lasso provider hooks
- no standalone `@be/weather` package has been confirmed in the inspected Be skill inventory
- Questions:
- whether weather is a dedicated cloud skill, a personal-report branch, or both
- what payload shape triggers local animation and weather presentation
### 13. Provider-Backed News
- Status: `ready`
- Tags: `content`
- Why later:
- first protocol path is implemented, but content is synthetic
- Questions:
- which source should provide headlines for hosted OpenJibo
- whether news belongs under a broader Lasso-style aggregation service
- how to keep content short and Jibo-native
### 14. Proactivity Selector And Surprise Offers
- Status: `discovery`
- Tags: `protocol`, `content`, `docs`
- Evidence:
- original architecture materials show cloud-side `Proactivity Selector`, `Proactivity Catalog`, and robot-side proactive trigger plumbing
- live captures include a proactive-style `I have something to share with you` offer and later proactive `TRIGGER` traffic
- `@be/surprises`, `@be/surprises-date`, and `@be/surprises-ota` exist as local robot-side building blocks
- Questions:
- minimum hosted selector for stock-OS-compatible surprise offers
- how proactive `TRIGGER` traffic maps into OpenJibo
- whether `surprises-date/offer_date_fact` should be the first intentional proactive offer
### 15. Surprises Routing
- Status: `discovery`
- Tags: `protocol`, `content`
- Evidence:
- `@be/surprises` is a router rather than one experience
- `surprises-date` and `surprises-ota` show category-specific branches
- Questions:
- whether `surprise me` should enter the top-level surprise router
- which categories depend on cloud services
- whether stock OS `1.9` differs from the `V3.1` source snapshot
### 16. History / Memory Layer
- Status: `discovery`
- Tags: `content`, `storage`, `docs`
- Evidence:
- Pegasus includes a `history` package
- original architecture materials call out cloud-side history
- stock behavior historically included names, birthdays, holidays, and personal dates
- Questions:
- what belongs in memory versus account/profile versus skill-specific storage
- first safe OpenJibo memory slice
- privacy and hosted-data boundaries
### 17. Lasso / Knowledge And Event Aggregation
- Status: `discovery`
- Tags: `content`
- Evidence:
- Pegasus `packages/lasso` is a provider credential and data aggregation service
- original architecture connected Lasso to AP News, Dark Sky, Google Calendar, Wolfram, and other providers
- Questions:
- recreate Lasso as one aggregation service or several focused providers
- which parts are needed for news, weather, calendar, commute, holidays, and special dates
### 18. Personal Report, Calendar, And Commute
- Status: `discovery`
- Tags: `protocol`, `content`
- Evidence:
- current `.NET` catalog has placeholder replies
- Nimbus has personal-report hooks and assets
- Questions:
- whether calendar and commute are independent feature paths or personal-report sections
- minimum provider data shape for natural Jibo presentation
### 19. Who Am I / Identity Management
- Status: `discovery`
- Tags: `protocol`, `content`, `storage`
- Evidence:
- `@be/who-am-i` exists
- source references `jibo.kb.loop`, owner/member lookup, enrollment, and name collection
- Questions:
- recognition, enrollment, rename, and profile-correction boundaries
- split between local state and hosted cloud state
- first useful hosted identity slice
### 20. Onboarding, Loop Management, And Fresh Start
- Status: `discovery`
- Tags: `protocol`, `docs`, `storage`
- Evidence:
- `@be/first-contact`, `@be/introductions`, `@be/tutorial`, `@be/restore`, and `@be/who-am-i` exist
- current `.NET` loop/account state is still mostly scaffolded
- Questions:
- how to provision an owner without the original mobile app
- how to add, remove, and re-enroll loop members
- whether the first replacement is operator-only, a lightweight web app, or both
### 21. How Old Are You / Robot Age Persona
- Status: `discovery`
- Tags: `protocol`, `content`
- User goals:
- `how old are you`
- answer from stored first-powered-up or first-cloud-seen metadata
- optional zodiac/personality flavor when available
- Questions:
- where stock Jibo stores first-power-up or birthdate metadata
- whether a stock persona path exists
- whether first OpenJibo pass should use first-cloud-seen metadata if stock data is unavailable
### 22. Command Vs Question Reply Style
- Status: `ready`
- Tags: `content`, `polish`
- User goals:
- `dance` should behave like a willing action
- `do you like to dance` should answer the question before or instead of treating it like the same command
- Implementation notes:
- evolve reply collections into command/question variants
- start with dance or another expressive skill
- keep the first version rule-based
## Suggested Order
Before closing `1.0.18`:
1. Radio live validation
2. News live validation
3. Backup / OTA / share yes-no regression
4. Alarm and photo/gallery regression
5. Optional small feature only if the regression pass stays calm
For `1.0.19`:
1. Stop command or volume control
2. Update, backup, and restore proof
3. STT upgrade and noise screening
4. Hosted capture/storage plan
5. Binary-safe media storage
6. Provider-backed news or weather
7. Proactivity, memory/history, Lasso, identity, and onboarding as larger discovery-driven tracks

View File

@@ -59,9 +59,8 @@ public sealed class FileTurnTelemetrySinkTests
new WebSocketMessageEnvelope { Text = """{"type":"CONTEXT","data":{"topic":"conversation"}}""" },
CancellationToken.None);
Assert.Single(replies);
using var payload = System.Text.Json.JsonDocument.Parse(replies[0].Text!);
Assert.Equal("OPENJIBO_TURN_PENDING", payload.RootElement.GetProperty("type").GetString());
Assert.Empty(replies);
Assert.True(session.TurnState.AwaitingTurnCompletion);
Assert.Equal(12000, session.TurnState.BufferedAudioBytes);
Assert.Equal("ffmpeg failed", session.TurnState.LastSttError);