dev plan and backlog updates, one bug fix

2026-04-25 23:06:49 -05:00
parent 773e768898
commit 527ddb1bfc
3 changed files with 521 additions and 592 deletions
--- a/OpenJibo/docs/development-plan.md
+++ b/OpenJibo/docs/development-plan.md
@@ -2,243 +2,159 @@

 ## Summary

-This document is the working implementation plan after the initial hosted-cloud scaffold.
+This document is the current working plan for the OpenJibo hosted cloud.

-It is intentionally broader than the current Node server. The Node server is a protocol oracle and discovery tool, not the complete map of Jibo.
+The production lane is the `.NET` cloud in `src/Jibo.Cloud/dotnet`. The Node server remains the protocol oracle, capture harness, and fast reverse-engineering lab, but it is no longer the long-term hosted architecture.

-Day-to-day feature sequencing now lives in [feature-backlog.md](/C:/Projects/JiboExperiments/OpenJibo/docs/feature-backlog.md).
+Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.

-Cloud release hygiene:
+## Current Release Snapshot

- keep a visible OpenJibo Cloud version string
- expose it through diagnostics such as `/health` and the spoken `cloud version` command
- bump the shared version constant whenever we deploy a meaningful hosted-cloud change
+- Current OpenJibo Cloud release constant: `1.0.18`
+- Source of truth: [OpenJiboCloudBuildInfo.cs](../src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/OpenJiboCloudBuildInfo.cs)
+- Spoken diagnostic: `Open Jibo Cloud version 1 dot 0 dot 18.`
+- HTTP diagnostic: `/health` returns the same version
+- Startup diagnostic: the API logs the same version on boot
+- .NET target framework: `net10.0` across the cloud projects and cloud test project

-## Current Scope
+Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm and photo/gallery behavior on stock OS `1.9`, with a few small feature slices added while the test loop is warm.

- stable .NET cloud scaffold
- Azure-oriented architecture and data ownership
- normalized runtime contracts for cloud-to-runtime handoff
- bootstrap documentation for region injection and targeted device patching
- starter endpoint coverage for account, notification, robot, loop, update, uploads, and core WebSocket acceptance
- starter xUnit coverage for the .NET application layer
+## Release Rhythm

-## Next Implementation Scope
+This is the working pattern for each hosted-cloud release:

- expand HTTP `X-Amz-Target` coverage from observed traffic and fixtures
- grow WebSocket compatibility from stub acceptance into realistic turn orchestration
- keep websocket parity fixture-driven, starting with exact sequencing and payload-shape fidelity for the successful joke vertical slice before claiming broader skill coverage
- replace in-memory state with Azure SQL-backed persistence
- add structured fixture replay tests
- harden region/bootstrap docs by software version
+1. Pick a narrow source-backed feature or compatibility slice.
+2. Confirm the stock payload shape from captures, Pegasus, the JiboOS reference tree, or live logs.
+3. Implement the smallest `.NET` path that can be tested honestly.
+4. Add focused tests around routing, websocket payload shape, and state behavior.
+5. Run the stock robot live test, collect captures, and record the result before moving on.
+6. Keep regressions and bug fixes in the current release; roll larger follow-up work into the next version.

-## Discovery Scope
+For `1.0.18`, the remaining release work should stay small: finish one or two feature slices, run the live regression pass, and only patch bugs found in that pass before calling the version complete. `1.0.19` should then reopen the broader feature queue.

-We still need to map more than the current Node server expresses. Priority discovery areas:
+## Current Code Truth

- all hostnames and service prefixes observed in real startup and turn traffic
- skill launch and skill lifecycle flows
- interactivity command families beyond the current joke flow
- richer embodied speech and animation behaviors
- upload, logging, backup, and key-sharing flows
- per-version configuration differences and region handling
+The hosted `.NET` cloud is a modular monolith:

-## Current WebSocket Discovery Focus
+```text
+Jibo.Cloud.Api -> Jibo.Cloud.Application -> Jibo.Cloud.Domain -> Jibo.Cloud.Infrastructure
+```

-The next fixture-driven websocket work should continue to separate three buckets:
+Current API and protocol scope:

- discovered behavior
-  Grounded by the Node oracle, sanitized fixtures, and live captures
- implemented parity
-  Only the narrow slices currently replayed and tested in `.NET`
- future hypotheses
-  Ideas to investigate later, but not behaviors to silently bake into the hosted cloud
+- HTTP `X-Amz-Target` dispatch through `JiboCloudProtocolService`
+- `/health` diagnostics
+- WebSocket acceptance for `api-socket.jibo.com`, `neo-hub.jibo.com` listen, and `neo-hub.jibo.com/v1/proactive`
+- token/session issuance for account, hub, and robot startup flows
+- starter account, notification, loop, media, key, person, backup, robot, update, and upload/log handling
+- media lookup through `/media/{path}`
+- no placeholder no-op update from `GetUpdateFrom` when no staged update exists

-Right now the strongest implemented vertical slice beyond basic listen completion is the successful joke turn:
+Current websocket scope:

- `CLIENT_ASR` transcript-carrying turn completion
- synthetic `LISTEN` result shaping
- `EOS`
- delayed joke `SKILL_ACTION`
+- long-lived cloud session state separated from per-turn websocket state
+- `LISTEN`, `CONTEXT`, `CLIENT_NLU`, `CLIENT_ASR`, and binary-audio handling
+- pending listen setup packets kept pending instead of finalized as turns
+- buffered Ogg/Opus audio preservation per turn
+- synthetic transcript hint support for fixture-driven parity
+- opt-in local `ffmpeg` plus `whisper.cpp` STT path for discovery
+- auto-finalize thresholds for buffered audio after a real listen phase
+- late-audio ignore windows after completed turns
+- no-input local completion for constrained prompts
+- unknown inbound websocket types dropped silently instead of echoing stock-OS-unknown OpenJibo events
+- file telemetry and fixture export for HTTP, websocket, and turn captures

-That should remain the model for future websocket work: capture first, fixture second, parity third.
+Current state and persistence scope:

-The latest live captures also support a second discovery track:
+- `InMemoryCloudStateStore` remains the runtime store
+- a local JSON persistence bridge is enabled by default at `App_Data/cloud-state.json`
+- persisted state currently covers staged updates, media metadata, and backup metadata
+- this is a bridge toward Azure SQL and Blob Storage, not the final hosted storage architecture

- menu-driven `CLIENT_NLU` parity for clock, timer, and alarm flows
- richer transcript-bearing `CLIENT_ASR` discovery beyond jokes
- buffered-audio preservation for eventual real ASR in `.NET`
+## Implemented In Current `1.0.18` Source

-Near-term ASR work should stay staged:
+The following behavior is present in source and covered by focused tests:

-1. preserve and replay the websocket audio payloads honestly
-2. validate a local tool-based decode/transcribe loop in `.NET`
-3. compare that against Azure-hosted STT before choosing a default production path
+- `cloud version` speech and `/health` version reporting share `OpenJiboCloudBuildInfo.Version`
+- apostrophes are no longer escaped to `&apos;` in spoken ESML, while `&`, `<`, `>`, and `"` remain escaped
+- radio voice launch supports `open the radio` and genre launch such as `play country music`, using local `@be/radio` `menu` payloads, `SKILL_REDIRECT`, and silent completion
+- news has a first Nimbus-shaped cloud path using `match.cloudSkill = news` and a `news` `SKILL_ACTION` with synthetic briefing content
+- stock-shaped clock handoffs cover time, date, day, clock open, timer/alarm menu, timer/alarm value, timer/alarm clarification, and timer/alarm delete
+- alarm parsing covers forms such as `7:30 am`, `830`, `8 30`, `10-25`, `10:25 pm`, and `10 25 p m`
+- ambiguous alarm times can prefer the next local occurrence when the robot context includes `runtime.location.iso`
+- `CLIENT_NLU intent=set` with only `domain=alarm` stays on the local clock clarification path instead of defaulting to a fabricated time
+- `CLIENT_NLU intent=cancel` on `clock/alarm_timer_query_menu` can reuse the last active clock domain
+- photo flows route `open photo gallery` to `@be/gallery`, `snap a picture` to `@be/create/createOnePhoto`, and `open photobooth` to `@be/create/createSomePhotos`
+- passive gallery/create context does not reopen a stale cloud turn
+- media metadata persists across store recreation and `/media/{path}` can serve the current text-body placeholder payload
+- constrained yes/no handling covers `create/is_it_a_keeper`, `shared/yes_no`, `settings/download_now_later`, `surprises-date/offer_date_fact`, `surprises-ota/want_to_download_now`, and `$YESNO` hints
+- outbound constrained yes/no responses strip unrelated `globals/*` rules so stock OS stays local
+- no-input fallback for constrained yes/no prompts emits local `LISTEN`/`EOS` instead of relaunching generic Nimbus speech
+- Word of the Day launch, spoken guesses, structured `CLIENT_NLU` guesses, hint-order guesses, fuzzy hint matching, right-word cleanup, and late audio cleanup are covered in the websocket layer

-That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
+## Reference Sources

-## Latest Capture Findings
+Use these sources as evidence, not as code to copy blindly:

-The latest live test round tightened up three priorities:
+- OpenJibo Node oracle: [open-jibo-link.js](../src/Jibo.Cloud/node/open-jibo-link.js)
+- Current hosted `.NET` cloud: [src/Jibo.Cloud/dotnet](../src/Jibo.Cloud/dotnet)
+- Live captures and robot logs: `C:\Projects\JiboExperiments\artifact-output`
+- Original Pegasus cloud source: `C:\Users\JacobDubin\Downloads\jibo\jibo copy\pegasus`
+- Original SDK and skill source snapshot: `C:\Users\JacobDubin\Downloads\jibo\jibo copy\sdk`
+- JiboOS reference tree: `C:\Projects\JiboOS`
+- JiboOS `V3.1` skill snapshot: `C:\Projects\JiboOS\V3.1\build\opt\jibo\Jibo\Skills\@be`

- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
+The Pegasus tree is especially useful for cloud service intent: `packages/hub` documents `/v1/listen`, `/nlu`, and `/asr`; `packages/lasso` documents credential and provider aggregation; `packages/history` and the architecture materials are useful for future memory and proactivity work.

-Evidence from the latest `2026-04-18` captures:
+The JiboOS trees are especially useful for local skill ownership and payload shape: `@be/clock`, `@be/gallery`, `@be/create`, `@be/radio`, `@be/nimbus`, `@be/settings`, `@be/surprises*`, `@be/restore`, `@be/who-am-i`, and `@be/idle`.

- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
+When sources disagree, prefer the newest live stock-OS capture for runtime behavior, then stock robot source for local ownership, then Pegasus for original cloud intent, then Node for known working compatibility behavior.

-Evidence from the latest word-of-the-day capture round:
+## `1.0.18` Closeout Gates

- yes/no photo confirmation improved and now completes through the constrained follow-up path
- `CLIENT_NLU` menu navigation is surfacing richer `destination` entities such as `snapshot`, `fun`, and `word-of-the-day`
- word-of-the-day guesses can arrive as structured `CLIENT_NLU` turns with `intent=guess`, `rules=["word-of-the-day/puzzle"]`, and `entities.guess=<word>`
- those structured turns should be treated as first-class cloud inputs even when no free-form transcript is present
+Before calling `1.0.18` complete, prove or explicitly defer these:

-Evidence from the continued `2026-04-18` word-of-the-day and time captures:
+- Run the focused `.NET` cloud test suite after the last feature slice.
+- Confirm the running robot build reports cloud version `1.0.18`.
+- Regression test alarm flows: set with explicit time, set with compact/spoken time, clarify missing time, cancel alarm, and local cleanup prompts.
+- Regression test photo/gallery flows: open gallery, answer the stock `shared/yes_no` prompt, hand into create, take one photo, and avoid blue-ring stale turns.
+- Live-test radio launch: `open the radio` and `play country music`.
+- Live-test first news path: `tell me the news` should use the Nimbus cloud-skill lane instead of generic chat.
+- Recheck constrained yes/no prompts for update/backup/share/gallery without leaking global rules.
+- Recheck that stock OS no longer logs OpenJibo-only websocket events such as synthetic pending/context/ack packets from the current build.
+- Treat remaining `ffmpeg` / `whisper.cpp` transcript failures as STT work unless the capture proves a separate turn-routing regression.
+
+## Known Gaps
+
+These are not blockers for calling `1.0.18` complete unless the live test shows a regression in a current release path:

- spoken "start word of the day" style requests should route into the same word-of-the-day launch path as the menu destination
- spoken puzzle answers like `pastoral` should be treated as valid guesses whenever the active listen rules show `word-of-the-day/puzzle`
- spoken numeric line picks like `two` should resolve through the active word-of-the-day hint order instead of being treated as generic chat
- after a successful word-of-the-day completion, late empty same-turn audio should be ignored instead of generating a stale blank-audio follow-up
- post-game hotphrase blank-audio turns should be treated as cleanup noise, not a new cloud conversation turn
- clock replies should use the user-facing hour format without a leading zero
-
-Evidence from the smaller `2026-04-18/19` hotphrase and word-of-the-day verification bundle:
-
- hotphrase silence can still auto-finalize into a generic `heyJibo` fallback, which sounds confused on-robot compared with a dedicated greeting path
- voice-triggered `loadMenu + destination=word-of-the-day` reaches Nimbus successfully, but Nimbus still expects a follow-up cloud skill response and times out if launch stops at `LISTEN` + `EOS`
- the newer `jibo test 2` bundle shows voice launch now reaches Nimbus and receives a cloud response, but a generic `SLIM/RUNTIME_PROMPT` just says "starting word of the day" instead of performing the menu-style redirect the on-screen path uses
- the `jibo test 3` bundle confirms Nimbus rejects `REDIRECT` in that cloud-skill slot, so the better next experiment is to hint the on-robot target skill directly on the synthetic `LISTEN` result and skip Nimbus `SKILL_ACTION` entirely for word-of-the-day launch
- the same bundle also shows `word-of-the-day/right_word` cleanup turns need a short ignore window for trailing audio or the robot can stay stuck in a blue-ring listening state
- the `jibo test 4` bundle exposed a broader websocket issue: inbound robot `LISTEN` setup packets were still being routed through turn finalization instead of just priming pending state, which can corrupt menu and word-of-the-day flows by treating setup turns like resolved intents
- the `jibo test 5` bundle suggests the remaining WOD launch and post-win cleanup bugs share the same root cause: we were leaving the robot-side `cloudSkillResponse` promise unresolved on `word_of_the_day`, `word_of_the_day_guess`, and `word-of-the-day/right_word`, so the latest .NET pass now emits a completion-only silent `SKILL_ACTION` for those paths instead of stopping at `LISTEN` + `EOS` or going fully silent
- the `jibo test 6` bundle plus the attached `@be` source snapshot refine that diagnosis: Nimbus does accept the silent completion response, but treats it as a normal `SLIM/RUNTIME_PROMPT` instead of a skill redirect, while the successful on-robot path is built around `menu + domain=word-of-the-day` skill switching through `SkillSwitchScheduler`
- the attached `be-framework.js` adds one more strong clue: the Be relaunch hook reads `skillData.nlu.skill`, so synthetic cloud launch turns for word-of-the-day should carry the explicit target skill name in the outbound NLU payload instead of expecting the robot to infer it from `intent/domain` alone
- the `JiboOs/V3.1` Nimbus source confirms the hotphrase/global launch path still routes through `@be/nimbus` and waits on `listenResult.cloudSkillResponse`, while Nimbus only supports a narrow set of cloud JCP behaviors and does not use cloud `REDIRECT` to jump into local skills; by contrast, the post-win `word-of-the-day/right_word` turn is a local `Optional-Response`, so the cleaner robot-side closeout is to synthesize an immediate empty `LISTEN + EOS` no-response result rather than replying with only `SKILL_ACTION`
- the same `jibo test 6` capture also shows the blue-ring cleanup loop was partly self-inflicted in `.NET`: after `word-of-the-day/right_word` we stopped the active turn, but later stray binary audio on the same transID could still re-arm buffering even without a fresh `LISTEN`, so the next pass now requires a real listen phase before post-turn audio can reopen buffered completion
- the local buffered-audio seam is still producing repeated `whisper.cpp returned no transcript` and `ffmpeg ... Codec not found` failures, so lightweight waveform or energy screening is worth considering once the core launch flow is stable
-
-Near-term interaction work should now prioritize:
-
-1. preserve and interpret yes/no turn constraints from observed listen rules
-2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
-3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
-4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
-5. start separating laptop-local capture storage from the eventual hosted retention/export path so group testing does not depend on repo-local zip handling
-
-## Capture Storage Direction
-
-Repo-local NDJSON plus zipped capture bundles are still good enough for current reverse-engineering and single-operator testing.
-
-For hosted group testing, the next direction should be:
-
-1. keep local file sinks for dev and laptop workflows
-2. add a cleaner export/archive boundary so noteworthy sessions can be promoted without copying raw capture trees around manually
-3. plan for hosted durable storage separately from the runtime node that is serving live robot traffic
-4. keep fixture generation and sanitized replay artifacts as the stable handoff format between local testing and hosted debugging
-
-## Working Cloud Framework
-
-The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
-
-1. local Jibo behavior observed by the cloud
-   The robot or its local skill stack already interpreted the turn and the cloud mainly tracks, acknowledges, or lightly completes it.
-2. local Jibo behavior overridden or redirected by the cloud
-   The robot reports the turn state, but the cloud chooses a different synthetic reply path.
-3. raw audio interpreted by the cloud
-   The robot sends buffered audio and the cloud performs transcript resolution before sending back `LISTEN`, `EOS`, and ESML-driven playback.
-
-Those are the right primary buckets for now. Additional side channels may still emerge later, especially around proactive traffic, direct skill/service sockets, or future on-device OS changes, but they should be treated as extensions to this model until captures prove otherwise.
-
-Latest stock-OS WOD findings:
-
- `word-of-the-day/right_word` closeout should not emit a synthetic `match`; otherwise Jetstream promotes it into `globalTurnResult` and Global Service relaunches Nimbus a few seconds later with a `Cloud Skill Response Timeout`.
- Voice `play word of the day` hotphrase launch still enters Global Service first, so a synthetic `LISTEN` result alone is not enough. The next-most-correct transport hint is a direct `SKILL_REDIRECT` event aimed at `@be/word-of-the-day`, alongside the menu-shaped `LISTEN` payload.
- Stock OS also keeps the original hotphrase/global launch cloud response promise alive even after the redirect succeeds, so voice WOD launch needs an explicit silent `SKILL_ACTION` completion on the same transID to avoid later cloud-response culling and an interrupted game state.
- Auto-dismissing `word-of-the-day/right_word` with a no-input `LISTEN`/`EOS` stops the listening ring, but it does not close the WOD UI by itself. Pairing that no-input closeout with an explicit redirect back to `@be/idle` is the current cleanest approximation.
- OTA/update yes-no prompts can advertise `$YESNO` only through ASR hints rather than `listenRules`, so short denials like `no` need to be recognized from `listenAsrHints` too.
- Spoken WOD guesses should preferentially snap to the closest offered hint when Whisper lands very close to one of the menu words, since near-misses like `haglet` for `aglet` are common in live testing.
- The stock robot still misroutes constrained local turns if the cloud echoes `globals/*` rules back on the reply. For spoken WOD guesses and settings/update `no`, we should only return the local rule (`word-of-the-day/puzzle`, `settings/download_now_later`, etc.) so Global Service does not relaunch Nimbus.
-
-Latest radio discovery findings:
-
- `@be/radio` is a true local skill, not a cloud placeholder.
- Its `open(result, refresh, previousSkillName)` path treats `result.nlu.intent === "menu"` as a `play` launch.
- `result.nlu.entities.station` is the genre selector, and `Country` is a real supported station key from the robot's `genres.json`.
- The smallest stock-shaped cloud handoff for voice launch is therefore a local `SKILL_REDIRECT` to `@be/radio` with `nlu.intent = "menu"`, optional `entities.station`, and a silent completion to settle the hotphrase cloud response.
-
-Latest news discovery findings:
-
- Nimbus explicitly treats `match.cloudSkill === "news"` like the GQA path and waits on `cloudSkillResponse`.
- The first OpenJibo news pass should therefore use a real cloud-skill shape, not a generic placeholder chat reply.
- For now, the content can stay synthetic while the protocol is grounded: `match.cloudSkill = "news"` plus a supported `SLIM` announcement response is enough to validate the robot path before provider-backed headlines arrive later.
-
-Latest clock discovery findings:
-
- `@be/clock` is a real local skill with `clock`, `timer`, and `alarm` domains.
- Menu launches use `intent = "menu"` with `entities.domain` set to the target sub-area.
- The `jibo test 15` bundle shows stock OS 1.9 rejecting our older top-level `timerValue` launch with `found no matching transition`, so the safer cloud contract is a stock-style `start` intent with the timer/alarm entities attached.
- The same bundle also shows local follow-up rules like `clock/timer_set_value`, so bare replies such as `five minutes` or `ten twenty five` need to be parsed when the robot is already collecting a timer/alarm value.
- The newest `.NET` pass now routes `open the clock` into the direct `askForTime` clock-view path, moves plain time/date/day questions onto stock-shaped local `@be/clock` handoffs, and keeps malformed timer/alarm requests on a clarification reply path instead of generic chat echo.
- The `jibo test 17` bundle shows two remaining clock realities on stock OS 1.9: some alarm misses are genuine STT loss before the cloud ever sees the minutes, and empty cleanup turns like `clock/alarm_timer_okay` must stay local instead of degrading into `heyJibo`/Nimbus.
- When the robot context includes a usable local `runtime.location.iso`, ambiguous alarm times now prefer the next real local occurrence rather than defaulting blindly.
-
-Latest photo discovery findings:
-
- `@be/gallery` is the local gallery browser and opens from `intent = "menu"`.
- `snapshot` and `photobooth` are not gallery submodes; stock main-menu logic remaps them into `@be/create` with `createOnePhoto` and `createSomePhotos`.
- The newest `.NET` pass keeps that routing, adds local-file persistence for media metadata, and serves stored media URLs back through `/media/{path}` as a first hosted-gallery slice.
- The remaining gap is binary fidelity: the current HTTP capture path stores request bodies as text, which is enough to preserve metadata and a placeholder payload, but may still be too lossy for perfect thumbnails/original fetches.
- The `jibo test 17` gallery blue-ring report is at least partly tangled up with the gallery-empty path: stock `@be/gallery` says `there's nothing in the gallery yet. want to take a picture now?`, so lingering mic state there is not purely a launch-routing issue.
- The `jibo test 18` bundle shows the more direct failure mode: short local replies like `yes` can stall if buffered-audio auto-finalize waits too long, and the old `OPENJIBO_AUDIO_RECEIVED` compatibility event only added robot-side warning noise while the ring stayed blue.
-
-Latest update and state findings:
-
- unstaged update queries should not fabricate placeholder no-op manifests, because stock settings logic can treat any returned object like a pending update
- the hosted `.NET` cloud now persists update/media/backups state to a local state file by default, which is a better bridge toward Azure SQL / Blob storage than the old process-memory-only behavior
- The `jibo test 17` session also includes a real on-robot backup announcement and temporary settings connectivity turbulence, so not all sluggishness from that run should be attributed to the newer cloud protocol changes.
-
-## Speech, Animation, And ESML
-
-The current joke flow is only a small foothold into Jibo expressiveness.
-
-Future work should map:
-
- direct speech modifiers
- animation selection and filtering
- embodied speech behaviors
- ESML and SSML subsets
- interactions between speech, visuals, and timing
-
-Useful external references:
-
- [Speak-Tweak Docs](https://hri2024.jibo.media.mit.edu/Speak-Tweak-Docs)
- [ESML PDF](https://hri2024.jibo.media.mit.edu/attachments/SDK-SDK---ESML-121023-203758.pdf)
-
-## Future Scope
-
- full endpoint inventory beyond the current Node mapping
- OTA-driven recovery
- paid hosted plans or donation-supported hosting
- deeper on-device bridge and OS modernization
- more capable skill/runtime integration
- possible LLM or tool-use patterns inspired by workshop-era experimentation
-
-## Latest Notes
-
- The `jibo test 19` bundle confirmed that gallery follow-up confirmation uses the stock local `shared/yes_no` rule family, not just the create/settings/surprise yes-no families. Spoken `yes` was being heard correctly, but leaking the global rules back into Nimbus instead of staying local.
- The same bundle also confirmed some `OPENJIBO_AUDIO_RECEIVED` noise was still coming from an older running build, because the current `.NET` source no longer emits that synthetic websocket event. When a live session still shows it, operator workflow should treat that as a rebuild/restart sanity-check clue before assuming a new regression.
- Spoken `cancel alarm` should map into stock `@be/clock` `delete` semantics, not generic chat. The current cloud path now mirrors that local intent so voice cancel can follow the same lane as the robot's clock skill.
- The `jibo test 20` bundle suggests gallery itself is mostly okay in the latest pass, but clock and protocol polish still matter: stock `CLIENT_NLU intent="set"` with only `domain="alarm"` should stay on the local clarification path instead of defaulting the cloud payload to `7:00`, and stock `CLIENT_NLU intent="cancel"` on `clock/alarm_timer_query_menu` should reuse the last active clock domain so delete actually lands on alarm/timer instead of generic chat.
-  - The same `jibo test 20` robot logs also showed `OPENJIBO_TURN_PENDING` and `OPENJIBO_CONTEXT_ACK` are just unknown-event noise on stock OS 1.9, so the compatibility layer now keeps that turn state internally and stops sending those synthetic websocket event types to the robot.
- The `jibo test 21` bundle confirms the first gallery path is healthy enough to open `@be/gallery`, ask the stock `shared/yes_no` follow-up, and hand into `@be/create` for a photo; the remaining alarm pain in that round was mostly the transcript collapsing to `set an alarm for suddenly` / `set an alarm for...`, which means the right fix is to keep `alarm_clarify` local by handing straight into `@be/clock` with `intent="set"` and `domain="alarm"` instead of asking the clarification through Nimbus-only cloud speech.
-  - The same bundle also showed stock OS 1.9 still logs fallback `OPENJIBO_ACK` packets as unknown-event noise, so the websocket compatibility layer now drops unrecognized inbound message types silently instead of replying with a synthetic ack the robot does not understand.
- That bundle still contains real `ffmpeg` / `whisper.cpp` failures in the buffered-audio STT seam, and it also includes a genuine `jibo-server-service` broken-pipe / server-connection-lost episode, so not every freeze in that round should be blamed on cloud turn routing alone.
-
-## MCP-Like Ideas
-
-Recent MIT workshop materials suggest experimentation around modern AI tooling for Jibo, including an MCP-oriented idea. We should treat that as inspiration for future OpenJibo directions, not as a present dependency or supported integration.
+- local `whisper.cpp` STT remains a discovery seam, not production ASR
+- media upload/body handling is not binary-safe enough for final gallery originals and thumbnails
+- state persistence is local JSON, not Azure SQL / Blob Storage
+- update, backup, and restore are not end-to-end proven
+- news content is synthetic
+- weather, calendar, commute, personal report, identity, memory, and proactivity are still mostly discovery or placeholder content paths
+- volume, stop, robot age, and command-versus-question personality routing are not implemented yet
+
+## `1.0.19` Direction
+
+After `1.0.18` is tested and tagged, `1.0.19` should move back into feature work:
+
+- one lightweight device-control feature, most likely stop or volume
+- end-to-end update/backup/restore proof
+- STT reliability improvements, including noise screening and a managed STT comparison
+- provider-backed first content path, likely news or weather
+- hosted capture/export boundary for group testing
+- continued Pegasus/JiboOS-backed mapping for proactivity, memory/history, Lasso-style aggregation, and identity
+
+## Azure Direction
+
+The target hosted footprint remains:
+
+- Azure App Service for HTTP and WebSocket traffic
+- Azure SQL for accounts, devices, sessions, host mappings, updates, media metadata, and provisioning records
+- Azure Blob Storage for media bodies, upload artifacts, update payloads, and curated capture bundles
+- Azure Key Vault for secrets and certificates
+- Application Insights for diagnostics and live-test observability
+
+Local JSON persistence is only a stepping stone. Do not design new feature slices as if local file state were the final hosted store.