version 18 test fixes

This commit is contained in:
Jacob Dubin
2026-04-29 09:00:04 -05:00
parent 748f117201
commit c2998593fd
13 changed files with 39586 additions and 22 deletions

View File

@@ -21,14 +21,13 @@ Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm an
## Latest Live Evidence
`jibo test 26` was captured as the next focused regression pass after the Test 25 stop/gallery/settings fixes.
`jibo test 27` was a small focused startup/cloud-version capture after the Test 26 regression pass.
- Good morning worked, and the Test 25 stop fix was live-proven: `Never mind.` now mapped to stock stop instead of generic chat. The Test 25 volume homophone fix was also visible: `Set Volume 2-6.` mapped to `volume_to_value`.
- Backup-in-progress still appeared during the test. Robot logs show the warning came from `@be/surprises-ota` (`hey since i'm doing a backup right now, I might be a little slow`), while the HTTP capture again had no `Backup_*` calls. Keep treating this as robot-local backup scheduler/status or log/upload load until a capture proves hosted backup involvement.
- Photo/gallery mostly worked: `Open Photo Galerum.` launched gallery, empty-gallery `shared/yes_no` `Yes.` handed into create, the photo was taken, keeper `Yes.` saved it, and gallery reopened. The remaining gallery quirk was after cleanup: context-only or post-skill audio tails could keep buffering without a fresh `LISTEN`, producing a long blue ring and later generic fallback.
- Alarm replacement/delete is still the main risky release path. Test 26 showed an existing `9:02 PM` alarm replacement prompt, but later `yes` turns arrived under `clock/alarm_set_value` rather than `clock/alarm_timer_change`, which pushed the robot toward the manual/value screen. The run eventually set a `7:35 PM` alarm, then repeated delete attempts such as `Delete along.`, `So, delete the alarm.`, and `So, delete the along.` fell to generic chat.
- The blue-ring/listen-loop concern is real in the capture. Several transactions buffered binary audio for 15-50 seconds with no useful turn completion, especially when a context arrived without a new `LISTEN` or after a chat fallback kept follow-up open. Current source now blocks that no-`LISTEN` buffering path and clears blank-audio hotphrase state more aggressively.
- No dominant `ffmpeg` / `whisper.cpp` decode failure emerged from Test 26. The remaining failures are mostly robot-local backup/load, short-answer STT quality, and alarm replacement/menu agreement.
- Before the cloud-version test, the robot's local `jibo-server-service` restarted after a broken pipe, then `ssm` raised `Q4-Server_connection_lost` and local `@be/settings` opened the connection-lost error path. The notification connection recovered about 31 seconds later. Treat early-test confusion as suspect if this local-server recovery appears in the same window.
- The cloud-version answer itself proved the running build was `1.0.18`, but the previous source treated `cloud_version` as a follow-up conversation. A fresh hotphrase `LISTEN` then captured speech tail as `Cloudford.`, and generic chat replied `thanks. I heard, Cloudford.`
- Current source now makes `cloud_version` a one-shot diagnostic, uses a longer diagnostic speech-tail ignore window, and ignores no-transcript hotphrase launch `LISTEN` setup packets inside that window. The existing no-`LISTEN` binary guard already ignored same-transID binary tails after finalization, but Test 27 showed it could not stop a brand-new hotphrase listen by itself.
- Backup-in-progress still appears robot-local. In Test 27 the message was selected by local `@be/surprises-ota` after Nimbus/chitchat settled, and the HTTP capture again had no `Backup_*` calls. Keep investigating robot-local scheduler/status, startup reconnect state, CPU/load, and log/upload work before assuming hosted backup API involvement.
- Test 26 remains the broader regression evidence for gallery success, alarm replacement/delete risk, stop/volume live proof, and short-answer STT weakness. Alarm replacement/menu agreement is still the main release risk after the Test 27 cloud-version-tail hardening.
## Release Rhythm
@@ -72,6 +71,8 @@ Current websocket scope:
- local whisper only attempts external decoding when buffered audio contains an Opus identification header
- auto-finalize thresholds for buffered audio after a real listen phase
- late-audio ignore windows after completed turns
- cloud-version diagnostic turns do not keep follow-up open and receive a longer speech-tail ignore window
- no-transcript hotphrase launch `LISTEN` setup packets are ignored while a completed diagnostic/local turn is still in its late-audio cleanup window
- passive local context cleanup for gallery/create/settings contexts after stock local skills take ownership
- no-input local completion for constrained prompts, clock value prompts, gallery preview prompts, and settings volume-control prompts
- active local prompt preservation so `shared/yes_no`, clock, gallery, and settings prompts can still consume transcript-bearing short replies even when the stock skill reports a local context
@@ -92,6 +93,7 @@ Current state and persistence scope:
The following behavior is present in source and covered by focused tests:
- `cloud version` speech and `/health` version reporting share `OpenJiboCloudBuildInfo.Version`
- `cloud version` is a one-shot diagnostic: it speaks the version without opening a follow-up turn, then shields the speech tail from self-listen artifacts such as the Test 27 `Cloudford.` capture
- apostrophes are no longer escaped to `&apos;` in spoken ESML, while `&`, `<`, `>`, and `"` remain escaped
- radio voice launch supports `open the radio` and genre launch such as `play country music`, using local `@be/radio` `menu` payloads, `SKILL_REDIRECT`, and silent completion
- news has a first Nimbus-shaped cloud path using `match.cloudSkill = news` and a `news` `SKILL_ACTION` with synthetic briefing content
@@ -153,7 +155,7 @@ Before calling `1.0.18` complete, prove or explicitly defer these:
- Run the focused `.NET` cloud test suite after the last feature slice.
- Run the current-release live checklist in [regression-test-plan.md](regression-test-plan.md).
- Confirm the running robot build reports cloud version `1.0.18`.
- Confirm the running robot build reports cloud version `1.0.18` without a follow-up `Cloudford` / generic chat tail.
- Regression test alarm flows again after the `jibo test 26` fixes: set with explicit time, set with compact/spoken/comma-separated time, clarify missing time, replace an existing alarm, cancel/delete by voice including `delete the alarm`, cancel out of a value prompt, and verify the menu agrees.
- Regression test timer flows after the Test 25 stale-timer observation: set a 10-second timer, let it fire, reset by gesture only after recording state, and verify a new timer prompt does not see an already-expired timer as still active.
- Regression test photo/gallery flows again after the `jibo test 26` fixes: open gallery, answer the stock `shared/yes_no` prompt with a transcript-bearing `yes`, hand into create, take one photo, keep it, and avoid blue-ring, `I heard you`, or `that's` stale turns after gallery cleanup.
@@ -172,11 +174,11 @@ These are not blockers for calling `1.0.18` complete unless the live test shows
- local `whisper.cpp` STT remains a discovery seam, not production ASR
- media upload/body handling is not binary-safe enough for final gallery originals and thumbnails
- state persistence is local JSON, not Azure SQL / Blob Storage
- update, backup, and restore are not end-to-end proven, and the `jibo test 22` sluggishness appears tied to robot-local backup status/load
- Test 26 still showed repeated backup-in-progress behavior without corresponding `Backup_*` HTTP traffic
- update, backup, and restore are not end-to-end proven, and the `jibo test 22` / Test 26 / Test 27 sluggishness appears tied to robot-local backup status/load or startup reconnect state
- Test 27 again showed backup-in-progress behavior without corresponding `Backup_*` HTTP traffic, immediately after a local `jibo-server-service` reconnect sequence
- deployed-build verification needs to prove that synthetic OpenJibo websocket events are gone from the hosted artifact, not just from source
- news content is synthetic; `jibo test 23` proved the path but not live provider-backed headlines
- alarm replacement yes/no, alarm voice delete/menu agreement, and long blue-ring cleanup still need successful live proof after the Test 26 source fixes
- alarm replacement yes/no, alarm voice delete/menu agreement, and long blue-ring cleanup still need successful live proof after the Test 26 and Test 27 source fixes
- weather, calendar, commute, personal report, identity, memory, and proactivity are still mostly discovery or placeholder content paths
- remaining stop/volume variants still need live stock-OS proof beyond Test 26's `Never mind.` and `Set Volume 2-6.` passes; robot age and command-versus-question personality routing are not implemented yet

View File

@@ -33,6 +33,7 @@ Runtime truth:
- hosted `.NET` projects and cloud tests target `net10.0`
- version source of truth is [OpenJiboCloudBuildInfo.cs](../src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/OpenJiboCloudBuildInfo.cs)
- `/health`, startup logging, and spoken `cloud version` are aligned with that constant
- spoken `cloud version` is now a one-shot diagnostic with speech-tail protection instead of a follow-up chat turn
Current release theme:
@@ -44,6 +45,7 @@ Current release theme:
- `jibo test 24` showed alarm replacement yes/no working, but exposed empty `clock/alarm_set_value` and `gallery/gallery_preview` turns falling into generic `I heard you` fallback speech; it also showed `CLIENT_NLU cancel` inside `clock/alarm_set_value` re-asking for an alarm value instead of closing the prompt
- `jibo test 25` proved a broader regression path but exposed repeated backup-in-progress/update-menu blockage, timer/alarm stale state and delete/menu disagreement, gallery `shared/yes_no` hangs under `@be/gallery`, punctuated `Never mind.` falling through to chat, volume homophone parsing (`Set Volume 2-6.`), and settings volume-control cleanup falling into `I heard you`
- `jibo test 26` live-proved punctuated stop, volume homophone parsing, gallery launch/yes/create/save, and good morning; it still exposed robot-local backup warnings, long blue-ring buffering without a fresh `LISTEN`, alarm replacement drifting into the value/manual screen, and alarm delete phrases/mishears falling to chat
- `jibo test 27` isolated early confusion: local `jibo-server-service` restarted and raised `Q4-Server_connection_lost` before testing; cloud version then self-listened into `Cloudford.` because the previous diagnostic path stayed follow-up eligible; the backup warning again came from local `@be/surprises-ota` with no `Backup_*` HTTP calls
## Immediate `1.0.18` Queue
@@ -103,6 +105,8 @@ Current release theme:
- `jibo test 22` did not show `Backup_*` HTTP traffic during the backup complaint
- `jibo test 25` again showed backup-in-progress/update-menu blockage without `Backup_*` HTTP traffic; observed cloud traffic was log upload, ASR binary upload, and update check traffic
- `jibo test 26` again had the robot announce backup-in-progress from `@be/surprises-ota`, with no `Backup_*` HTTP target in the capture
- `jibo test 27` repeated that pattern in a smaller capture: the only relevant hosted startup traffic was token/update/log style traffic, while the spoken backup warning was selected by local `@be/surprises-ota`
- Test 27 also showed local `jibo-server-service` reconnect and `Q4-Server_connection_lost` before the voice test, so startup health should be checked before blaming backup prompts on hosted cloud behavior
- stock `@be/surprises-ota` drives the backup notification from robot-local `jibo.scheduler.backupStatus`
- original `surprises-ota` tests make backup and OTA notifications contextual-priority prompts, with repeat suppression through last-notification timestamps
- a spoken `take a backup` command currently routes as generic chat and is not the same as proving the local backup scheduler path
@@ -133,6 +137,7 @@ Current release theme:
- passive gallery/create/settings context no longer reopens stale cloud turns
- active local prompts under gallery/settings contexts are preserved so real short replies are not suppressed as passive context
- context-only or post-skill binary audio tails are ignored until a fresh `LISTEN`, preventing no-`LISTEN` blue-ring buffering loops
- fresh no-transcript hotphrase launch `LISTEN` setup packets are ignored during diagnostic speech-tail cleanup, preventing the Test 27 `Cloudford.` self-listen path
- blank-audio hotphrase turns clear pending listen state and install a short late-audio ignore window
- `shared/yes_no` no-input fallback and repeated create keeper cleanup were added after `jibo test 22`
- Latest evidence:
@@ -147,6 +152,7 @@ Current release theme:
- `jibo test 25` showed timer/alarm still needs live follow-up for stale timer state, alarm replacement/PM ambiguity, and voice delete versus robot menu agreement
- `jibo test 26` showed gallery success through empty-gallery yes, create, keep, save, and reopen, but also showed a post-gallery blue-ring/fallback tail now addressed by the no-`LISTEN` binary guard
- `jibo test 26` showed alarm replacement still drifting into value/manual-screen behavior and alarm delete phrases/mishears falling to chat; current source now maps `delete the alarm`, `delete along`, and `delete the along` to local clock delete without keeping follow-up open
- `jibo test 27` showed the no-`LISTEN` guard worked for same-transID binary tails, but a new hotphrase launch `LISTEN` could still capture diagnostic speech tail; current source now blocks that diagnostic-tail shape
- original clock tests confirm cancel inside the alarm value prompt must close without scheduling, existing-alarm `keep` must preserve KB/scheduler state, and existing-alarm `delete` or `cancel` must clear it
- original gallery tests confirm empty-gallery `yes` redirects to `@be/create`, empty-gallery `no` exits, media-load failure exits, and delete confirmation only deletes on a positive `yes`
- Exit criteria:
@@ -264,6 +270,18 @@ Current release theme:
- Follow-up:
- live update/backup/share/gallery/alarm replacement prompts still need another clean pass
### Cloud Version Tail Cleanup
- Status: `implemented`
- Tags: `protocol`
- Result:
- `cloud_version` no longer keeps the generic follow-up mic open
- diagnostic speech receives an eight-second late-audio ignore window
- no-transcript hotphrase launch `LISTEN` setup packets inside that cleanup window are ignored before they can reopen a stale turn
- focused websocket coverage reproduces the Test 27 `Cloudford.` shape: cloud-version speech, tail `LISTEN`, and binary speech tail
- Follow-up:
- live smoke should confirm `cloud version` speaks `1.0.18` and settles without a generic `I heard...` reply
### Word Of The Day Cleanup
- Status: `implemented`
@@ -362,6 +380,7 @@ Current release theme:
- no-op update fabrication has been removed from `.NET`
- Test 25 still showed repeated backup-in-progress/update-menu blockage without `Backup_*` HTTP traffic
- Test 26 repeated the backup-in-progress warning from robot-local `@be/surprises-ota` without `Backup_*` HTTP traffic
- Test 27 repeated the same no-`Backup_*` finding and added evidence of local startup reconnect / `Q4-Server_connection_lost` before the test
- Exit criteria:
- no phantom "always has updates" behavior
- one controlled update can be staged and delivered

View File

@@ -15,7 +15,7 @@ Run this plan:
- after the last code change before calling a release complete
- after any fix that touches websocket turn finalization, local skill redirects, constrained yes/no, or STT
- before moving from `1.0.18` bug-fix closeout into `1.0.19` feature work
- after the Test 26 fixes, run at least the focused alarm/timer, photo/gallery, stop, volume, and blue-ring cleanup sections before deciding whether `1.0.18` is ready to freeze
- after the Test 26 and Test 27 fixes, run at least the focused cloud-version, alarm/timer, photo/gallery, stop, volume, and blue-ring cleanup sections before deciding whether `1.0.18` is ready to freeze
For small feature slices, run the automated `.NET` tests plus the smoke checks and only the live sections that share the same machinery. Before release closeout, run the full current-release suite.
@@ -37,6 +37,7 @@ A release is not ready until these are true or explicitly deferred in [developme
- focused `.NET` cloud tests pass
- running robot reports the expected cloud version by voice and `/health`
- `cloud version` settles without a self-listened `Cloudford` / generic chat tail
- no current-release path emits obsolete OpenJibo-only websocket events such as synthetic pending/context/ack packets
- known working live paths still work: startup, simple chat, radio, basic news, constrained yes/no, alarm, and gallery/create
- any remaining failure is classified as cloud payload, local robot state, STT/audio quality, environment/routing, or deferred feature gap
@@ -57,10 +58,11 @@ Run these first so obvious environment problems do not pollute feature results:
1. Start the `.NET` cloud using the live runbook.
2. Confirm `/health` reports the expected version.
3. Ask `cloud version`; confirm Jibo speaks the same version.
4. Run one simple chat turn.
5. Run one joke turn.
6. Confirm websocket capture is being written before continuing.
3. Confirm the robot is not in a local connection-lost state; if logs show `Q4-Server_connection_lost` or a fresh `jibo-server-service` reconnect, wait for it to clear before scoring voice behavior.
4. Ask `cloud version`; confirm Jibo speaks the same version and does not follow with `Cloudford`, `I heard...`, or another generic tail reply.
5. Run one simple chat turn.
6. Run one joke turn.
7. Confirm websocket capture is being written before continuing.
Stop and fix environment issues if startup, websocket connection, or capture output is not clean.
@@ -93,7 +95,8 @@ Goal: prove constrained yes/no prompts stay local and do not leak global launch
- Observe backup-in-progress behavior separately from explicit voice commands.
- Do not treat a spoken `take a backup` failure as proof of the backup scheduler path; that command is not currently wired as a hosted-cloud voice feature.
- If the update menu reports backup-in-progress, record whether HTTP captures include any `Backup_*` targets; current evidence points to robot-local scheduler/status or log/upload load unless those calls appear.
- If Jibo announces backup-in-progress without update-menu interaction, note the local skill in robot logs; Test 26 showed `@be/surprises-ota`.
- If Jibo announces backup-in-progress without update-menu interaction, note the local skill in robot logs; Tests 26 and 27 showed `@be/surprises-ota`.
- If the warning appears soon after startup or update, check for local `jibo-server-service` restart, notification reconnect, or `Q4-Server_connection_lost` before scoring it as a hosted backup defect.
- Expected: short `yes`/`no` replies map locally, empty replies no-input locally, and backup/download notifications are not repeatedly re-announced once acknowledged.
- Capture check: active rule remains the constrained rule such as `surprises-ota/want_to_download_now`, `settings/download_now_later`, `shared/yes_no`, or another stock prompt rule.
@@ -215,12 +218,13 @@ Capture check:
### Blue-Ring Cleanup
Goal: catch the Test 26 no-`LISTEN` buffering regression quickly.
Goal: catch the Test 26 no-`LISTEN` buffering regression and the Test 27 diagnostic speech-tail regression quickly.
- After any local skill redirect or generic chat reply, wait five to ten seconds before issuing the next phrase.
- If the blue ring remains open, record the active transID and whether the websocket capture shows a new `LISTEN`.
- Expected: binary audio for an existing transID is ignored until a fresh `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely.
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`.
- After `cloud version`, wait five to ten seconds and confirm there is no fresh no-transcript hotphrase launch `LISTEN` that turns speech tail into generic chat.
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens.
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`; a late ignored diagnostic `LISTEN` may appear as cleanup telemetry but should not set `SawListen` or buffer audio.
## Optional Feature Slice Checks