version 18 test fixes
This commit is contained in:
@@ -21,14 +21,13 @@ Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm an
|
||||
|
||||
## Latest Live Evidence
|
||||
|
||||
`jibo test 26` was captured as the next focused regression pass after the Test 25 stop/gallery/settings fixes.
|
||||
`jibo test 27` was a small focused startup/cloud-version capture after the Test 26 regression pass.
|
||||
|
||||
- Good morning worked, and the Test 25 stop fix was live-proven: `Never mind.` now mapped to stock stop instead of generic chat. The Test 25 volume homophone fix was also visible: `Set Volume 2-6.` mapped to `volume_to_value`.
|
||||
- Backup-in-progress still appeared during the test. Robot logs show the warning came from `@be/surprises-ota` (`hey since i'm doing a backup right now, I might be a little slow`), while the HTTP capture again had no `Backup_*` calls. Keep treating this as robot-local backup scheduler/status or log/upload load until a capture proves hosted backup involvement.
|
||||
- Photo/gallery mostly worked: `Open Photo Galerum.` launched gallery, empty-gallery `shared/yes_no` `Yes.` handed into create, the photo was taken, keeper `Yes.` saved it, and gallery reopened. The remaining gallery quirk was after cleanup: context-only or post-skill audio tails could keep buffering without a fresh `LISTEN`, producing a long blue ring and later generic fallback.
|
||||
- Alarm replacement/delete is still the main risky release path. Test 26 showed an existing `9:02 PM` alarm replacement prompt, but later `yes` turns arrived under `clock/alarm_set_value` rather than `clock/alarm_timer_change`, which pushed the robot toward the manual/value screen. The run eventually set a `7:35 PM` alarm, then repeated delete attempts such as `Delete along.`, `So, delete the alarm.`, and `So, delete the along.` fell to generic chat.
|
||||
- The blue-ring/listen-loop concern is real in the capture. Several transactions buffered binary audio for 15-50 seconds with no useful turn completion, especially when a context arrived without a new `LISTEN` or after a chat fallback kept follow-up open. Current source now blocks that no-`LISTEN` buffering path and clears blank-audio hotphrase state more aggressively.
|
||||
- No dominant `ffmpeg` / `whisper.cpp` decode failure emerged from Test 26. The remaining failures are mostly robot-local backup/load, short-answer STT quality, and alarm replacement/menu agreement.
|
||||
- Before the cloud-version test, the robot's local `jibo-server-service` restarted after a broken pipe, then `ssm` raised `Q4-Server_connection_lost` and local `@be/settings` opened the connection-lost error path. The notification connection recovered about 31 seconds later. Treat early-test confusion as suspect if this local-server recovery appears in the same window.
|
||||
- The cloud-version answer itself proved the running build was `1.0.18`, but the previous source treated `cloud_version` as a follow-up conversation. A fresh hotphrase `LISTEN` then captured speech tail as `Cloudford.`, and generic chat replied `thanks. I heard, Cloudford.`
|
||||
- Current source now makes `cloud_version` a one-shot diagnostic, uses a longer diagnostic speech-tail ignore window, and ignores no-transcript hotphrase launch `LISTEN` setup packets inside that window. The existing no-`LISTEN` binary guard already ignored same-transID binary tails after finalization, but Test 27 showed it could not stop a brand-new hotphrase listen by itself.
|
||||
- Backup-in-progress still appears robot-local. In Test 27 the message was selected by local `@be/surprises-ota` after Nimbus/chitchat settled, and the HTTP capture again had no `Backup_*` calls. Keep investigating robot-local scheduler/status, startup reconnect state, CPU/load, and log/upload work before assuming hosted backup API involvement.
|
||||
- Test 26 remains the broader regression evidence for gallery success, alarm replacement/delete risk, stop/volume live proof, and short-answer STT weakness. Alarm replacement/menu agreement is still the main release risk after the Test 27 cloud-version-tail hardening.
|
||||
|
||||
## Release Rhythm
|
||||
|
||||
@@ -72,6 +71,8 @@ Current websocket scope:
|
||||
- local whisper only attempts external decoding when buffered audio contains an Opus identification header
|
||||
- auto-finalize thresholds for buffered audio after a real listen phase
|
||||
- late-audio ignore windows after completed turns
|
||||
- cloud-version diagnostic turns do not keep follow-up open and receive a longer speech-tail ignore window
|
||||
- no-transcript hotphrase launch `LISTEN` setup packets are ignored while a completed diagnostic/local turn is still in its late-audio cleanup window
|
||||
- passive local context cleanup for gallery/create/settings contexts after stock local skills take ownership
|
||||
- no-input local completion for constrained prompts, clock value prompts, gallery preview prompts, and settings volume-control prompts
|
||||
- active local prompt preservation so `shared/yes_no`, clock, gallery, and settings prompts can still consume transcript-bearing short replies even when the stock skill reports a local context
|
||||
@@ -92,6 +93,7 @@ Current state and persistence scope:
|
||||
The following behavior is present in source and covered by focused tests:
|
||||
|
||||
- `cloud version` speech and `/health` version reporting share `OpenJiboCloudBuildInfo.Version`
|
||||
- `cloud version` is a one-shot diagnostic: it speaks the version without opening a follow-up turn, then shields the speech tail from self-listen artifacts such as the Test 27 `Cloudford.` capture
|
||||
- apostrophes are no longer escaped to `'` in spoken ESML, while `&`, `<`, `>`, and `"` remain escaped
|
||||
- radio voice launch supports `open the radio` and genre launch such as `play country music`, using local `@be/radio` `menu` payloads, `SKILL_REDIRECT`, and silent completion
|
||||
- news has a first Nimbus-shaped cloud path using `match.cloudSkill = news` and a `news` `SKILL_ACTION` with synthetic briefing content
|
||||
@@ -153,7 +155,7 @@ Before calling `1.0.18` complete, prove or explicitly defer these:
|
||||
|
||||
- Run the focused `.NET` cloud test suite after the last feature slice.
|
||||
- Run the current-release live checklist in [regression-test-plan.md](regression-test-plan.md).
|
||||
- Confirm the running robot build reports cloud version `1.0.18`.
|
||||
- Confirm the running robot build reports cloud version `1.0.18` without a follow-up `Cloudford` / generic chat tail.
|
||||
- Regression test alarm flows again after the `jibo test 26` fixes: set with explicit time, set with compact/spoken/comma-separated time, clarify missing time, replace an existing alarm, cancel/delete by voice including `delete the alarm`, cancel out of a value prompt, and verify the menu agrees.
|
||||
- Regression test timer flows after the Test 25 stale-timer observation: set a 10-second timer, let it fire, reset by gesture only after recording state, and verify a new timer prompt does not see an already-expired timer as still active.
|
||||
- Regression test photo/gallery flows again after the `jibo test 26` fixes: open gallery, answer the stock `shared/yes_no` prompt with a transcript-bearing `yes`, hand into create, take one photo, keep it, and avoid blue-ring, `I heard you`, or `that's` stale turns after gallery cleanup.
|
||||
@@ -172,11 +174,11 @@ These are not blockers for calling `1.0.18` complete unless the live test shows
|
||||
- local `whisper.cpp` STT remains a discovery seam, not production ASR
|
||||
- media upload/body handling is not binary-safe enough for final gallery originals and thumbnails
|
||||
- state persistence is local JSON, not Azure SQL / Blob Storage
|
||||
- update, backup, and restore are not end-to-end proven, and the `jibo test 22` sluggishness appears tied to robot-local backup status/load
|
||||
- Test 26 still showed repeated backup-in-progress behavior without corresponding `Backup_*` HTTP traffic
|
||||
- update, backup, and restore are not end-to-end proven, and the `jibo test 22` / Test 26 / Test 27 sluggishness appears tied to robot-local backup status/load or startup reconnect state
|
||||
- Test 27 again showed backup-in-progress behavior without corresponding `Backup_*` HTTP traffic, immediately after a local `jibo-server-service` reconnect sequence
|
||||
- deployed-build verification needs to prove that synthetic OpenJibo websocket events are gone from the hosted artifact, not just from source
|
||||
- news content is synthetic; `jibo test 23` proved the path but not live provider-backed headlines
|
||||
- alarm replacement yes/no, alarm voice delete/menu agreement, and long blue-ring cleanup still need successful live proof after the Test 26 source fixes
|
||||
- alarm replacement yes/no, alarm voice delete/menu agreement, and long blue-ring cleanup still need successful live proof after the Test 26 and Test 27 source fixes
|
||||
- weather, calendar, commute, personal report, identity, memory, and proactivity are still mostly discovery or placeholder content paths
|
||||
- remaining stop/volume variants still need live stock-OS proof beyond Test 26's `Never mind.` and `Set Volume 2-6.` passes; robot age and command-versus-question personality routing are not implemented yet
|
||||
|
||||
|
||||
@@ -33,6 +33,7 @@ Runtime truth:
|
||||
- hosted `.NET` projects and cloud tests target `net10.0`
|
||||
- version source of truth is [OpenJiboCloudBuildInfo.cs](../src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/OpenJiboCloudBuildInfo.cs)
|
||||
- `/health`, startup logging, and spoken `cloud version` are aligned with that constant
|
||||
- spoken `cloud version` is now a one-shot diagnostic with speech-tail protection instead of a follow-up chat turn
|
||||
|
||||
Current release theme:
|
||||
|
||||
@@ -44,6 +45,7 @@ Current release theme:
|
||||
- `jibo test 24` showed alarm replacement yes/no working, but exposed empty `clock/alarm_set_value` and `gallery/gallery_preview` turns falling into generic `I heard you` fallback speech; it also showed `CLIENT_NLU cancel` inside `clock/alarm_set_value` re-asking for an alarm value instead of closing the prompt
|
||||
- `jibo test 25` proved a broader regression path but exposed repeated backup-in-progress/update-menu blockage, timer/alarm stale state and delete/menu disagreement, gallery `shared/yes_no` hangs under `@be/gallery`, punctuated `Never mind.` falling through to chat, volume homophone parsing (`Set Volume 2-6.`), and settings volume-control cleanup falling into `I heard you`
|
||||
- `jibo test 26` live-proved punctuated stop, volume homophone parsing, gallery launch/yes/create/save, and good morning; it still exposed robot-local backup warnings, long blue-ring buffering without a fresh `LISTEN`, alarm replacement drifting into the value/manual screen, and alarm delete phrases/mishears falling to chat
|
||||
- `jibo test 27` isolated early confusion: local `jibo-server-service` restarted and raised `Q4-Server_connection_lost` before testing; cloud version then self-listened into `Cloudford.` because the previous diagnostic path stayed follow-up eligible; the backup warning again came from local `@be/surprises-ota` with no `Backup_*` HTTP calls
|
||||
|
||||
## Immediate `1.0.18` Queue
|
||||
|
||||
@@ -103,6 +105,8 @@ Current release theme:
|
||||
- `jibo test 22` did not show `Backup_*` HTTP traffic during the backup complaint
|
||||
- `jibo test 25` again showed backup-in-progress/update-menu blockage without `Backup_*` HTTP traffic; observed cloud traffic was log upload, ASR binary upload, and update check traffic
|
||||
- `jibo test 26` again had the robot announce backup-in-progress from `@be/surprises-ota`, with no `Backup_*` HTTP target in the capture
|
||||
- `jibo test 27` repeated that pattern in a smaller capture: the only relevant hosted startup traffic was token/update/log style traffic, while the spoken backup warning was selected by local `@be/surprises-ota`
|
||||
- Test 27 also showed local `jibo-server-service` reconnect and `Q4-Server_connection_lost` before the voice test, so startup health should be checked before blaming backup prompts on hosted cloud behavior
|
||||
- stock `@be/surprises-ota` drives the backup notification from robot-local `jibo.scheduler.backupStatus`
|
||||
- original `surprises-ota` tests make backup and OTA notifications contextual-priority prompts, with repeat suppression through last-notification timestamps
|
||||
- a spoken `take a backup` command currently routes as generic chat and is not the same as proving the local backup scheduler path
|
||||
@@ -133,6 +137,7 @@ Current release theme:
|
||||
- passive gallery/create/settings context no longer reopens stale cloud turns
|
||||
- active local prompts under gallery/settings contexts are preserved so real short replies are not suppressed as passive context
|
||||
- context-only or post-skill binary audio tails are ignored until a fresh `LISTEN`, preventing no-`LISTEN` blue-ring buffering loops
|
||||
- fresh no-transcript hotphrase launch `LISTEN` setup packets are ignored during diagnostic speech-tail cleanup, preventing the Test 27 `Cloudford.` self-listen path
|
||||
- blank-audio hotphrase turns clear pending listen state and install a short late-audio ignore window
|
||||
- `shared/yes_no` no-input fallback and repeated create keeper cleanup were added after `jibo test 22`
|
||||
- Latest evidence:
|
||||
@@ -147,6 +152,7 @@ Current release theme:
|
||||
- `jibo test 25` showed timer/alarm still needs live follow-up for stale timer state, alarm replacement/PM ambiguity, and voice delete versus robot menu agreement
|
||||
- `jibo test 26` showed gallery success through empty-gallery yes, create, keep, save, and reopen, but also showed a post-gallery blue-ring/fallback tail now addressed by the no-`LISTEN` binary guard
|
||||
- `jibo test 26` showed alarm replacement still drifting into value/manual-screen behavior and alarm delete phrases/mishears falling to chat; current source now maps `delete the alarm`, `delete along`, and `delete the along` to local clock delete without keeping follow-up open
|
||||
- `jibo test 27` showed the no-`LISTEN` guard worked for same-transID binary tails, but a new hotphrase launch `LISTEN` could still capture diagnostic speech tail; current source now blocks that diagnostic-tail shape
|
||||
- original clock tests confirm cancel inside the alarm value prompt must close without scheduling, existing-alarm `keep` must preserve KB/scheduler state, and existing-alarm `delete` or `cancel` must clear it
|
||||
- original gallery tests confirm empty-gallery `yes` redirects to `@be/create`, empty-gallery `no` exits, media-load failure exits, and delete confirmation only deletes on a positive `yes`
|
||||
- Exit criteria:
|
||||
@@ -264,6 +270,18 @@ Current release theme:
|
||||
- Follow-up:
|
||||
- live update/backup/share/gallery/alarm replacement prompts still need another clean pass
|
||||
|
||||
### Cloud Version Tail Cleanup
|
||||
|
||||
- Status: `implemented`
|
||||
- Tags: `protocol`
|
||||
- Result:
|
||||
- `cloud_version` no longer keeps the generic follow-up mic open
|
||||
- diagnostic speech receives an eight-second late-audio ignore window
|
||||
- no-transcript hotphrase launch `LISTEN` setup packets inside that cleanup window are ignored before they can reopen a stale turn
|
||||
- focused websocket coverage reproduces the Test 27 `Cloudford.` shape: cloud-version speech, tail `LISTEN`, and binary speech tail
|
||||
- Follow-up:
|
||||
- live smoke should confirm `cloud version` speaks `1.0.18` and settles without a generic `I heard...` reply
|
||||
|
||||
### Word Of The Day Cleanup
|
||||
|
||||
- Status: `implemented`
|
||||
@@ -362,6 +380,7 @@ Current release theme:
|
||||
- no-op update fabrication has been removed from `.NET`
|
||||
- Test 25 still showed repeated backup-in-progress/update-menu blockage without `Backup_*` HTTP traffic
|
||||
- Test 26 repeated the backup-in-progress warning from robot-local `@be/surprises-ota` without `Backup_*` HTTP traffic
|
||||
- Test 27 repeated the same no-`Backup_*` finding and added evidence of local startup reconnect / `Q4-Server_connection_lost` before the test
|
||||
- Exit criteria:
|
||||
- no phantom "always has updates" behavior
|
||||
- one controlled update can be staged and delivered
|
||||
|
||||
@@ -15,7 +15,7 @@ Run this plan:
|
||||
- after the last code change before calling a release complete
|
||||
- after any fix that touches websocket turn finalization, local skill redirects, constrained yes/no, or STT
|
||||
- before moving from `1.0.18` bug-fix closeout into `1.0.19` feature work
|
||||
- after the Test 26 fixes, run at least the focused alarm/timer, photo/gallery, stop, volume, and blue-ring cleanup sections before deciding whether `1.0.18` is ready to freeze
|
||||
- after the Test 26 and Test 27 fixes, run at least the focused cloud-version, alarm/timer, photo/gallery, stop, volume, and blue-ring cleanup sections before deciding whether `1.0.18` is ready to freeze
|
||||
|
||||
For small feature slices, run the automated `.NET` tests plus the smoke checks and only the live sections that share the same machinery. Before release closeout, run the full current-release suite.
|
||||
|
||||
@@ -37,6 +37,7 @@ A release is not ready until these are true or explicitly deferred in [developme
|
||||
|
||||
- focused `.NET` cloud tests pass
|
||||
- running robot reports the expected cloud version by voice and `/health`
|
||||
- `cloud version` settles without a self-listened `Cloudford` / generic chat tail
|
||||
- no current-release path emits obsolete OpenJibo-only websocket events such as synthetic pending/context/ack packets
|
||||
- known working live paths still work: startup, simple chat, radio, basic news, constrained yes/no, alarm, and gallery/create
|
||||
- any remaining failure is classified as cloud payload, local robot state, STT/audio quality, environment/routing, or deferred feature gap
|
||||
@@ -57,10 +58,11 @@ Run these first so obvious environment problems do not pollute feature results:
|
||||
|
||||
1. Start the `.NET` cloud using the live runbook.
|
||||
2. Confirm `/health` reports the expected version.
|
||||
3. Ask `cloud version`; confirm Jibo speaks the same version.
|
||||
4. Run one simple chat turn.
|
||||
5. Run one joke turn.
|
||||
6. Confirm websocket capture is being written before continuing.
|
||||
3. Confirm the robot is not in a local connection-lost state; if logs show `Q4-Server_connection_lost` or a fresh `jibo-server-service` reconnect, wait for it to clear before scoring voice behavior.
|
||||
4. Ask `cloud version`; confirm Jibo speaks the same version and does not follow with `Cloudford`, `I heard...`, or another generic tail reply.
|
||||
5. Run one simple chat turn.
|
||||
6. Run one joke turn.
|
||||
7. Confirm websocket capture is being written before continuing.
|
||||
|
||||
Stop and fix environment issues if startup, websocket connection, or capture output is not clean.
|
||||
|
||||
@@ -93,7 +95,8 @@ Goal: prove constrained yes/no prompts stay local and do not leak global launch
|
||||
- Observe backup-in-progress behavior separately from explicit voice commands.
|
||||
- Do not treat a spoken `take a backup` failure as proof of the backup scheduler path; that command is not currently wired as a hosted-cloud voice feature.
|
||||
- If the update menu reports backup-in-progress, record whether HTTP captures include any `Backup_*` targets; current evidence points to robot-local scheduler/status or log/upload load unless those calls appear.
|
||||
- If Jibo announces backup-in-progress without update-menu interaction, note the local skill in robot logs; Test 26 showed `@be/surprises-ota`.
|
||||
- If Jibo announces backup-in-progress without update-menu interaction, note the local skill in robot logs; Tests 26 and 27 showed `@be/surprises-ota`.
|
||||
- If the warning appears soon after startup or update, check for local `jibo-server-service` restart, notification reconnect, or `Q4-Server_connection_lost` before scoring it as a hosted backup defect.
|
||||
- Expected: short `yes`/`no` replies map locally, empty replies no-input locally, and backup/download notifications are not repeatedly re-announced once acknowledged.
|
||||
- Capture check: active rule remains the constrained rule such as `surprises-ota/want_to_download_now`, `settings/download_now_later`, `shared/yes_no`, or another stock prompt rule.
|
||||
|
||||
@@ -215,12 +218,13 @@ Capture check:
|
||||
|
||||
### Blue-Ring Cleanup
|
||||
|
||||
Goal: catch the Test 26 no-`LISTEN` buffering regression quickly.
|
||||
Goal: catch the Test 26 no-`LISTEN` buffering regression and the Test 27 diagnostic speech-tail regression quickly.
|
||||
|
||||
- After any local skill redirect or generic chat reply, wait five to ten seconds before issuing the next phrase.
|
||||
- If the blue ring remains open, record the active transID and whether the websocket capture shows a new `LISTEN`.
|
||||
- Expected: binary audio for an existing transID is ignored until a fresh `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely.
|
||||
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`.
|
||||
- After `cloud version`, wait five to ten seconds and confirm there is no fresh no-transcript hotphrase launch `LISTEN` that turns speech tail into generic chat.
|
||||
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens.
|
||||
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`; a late ignored diagnostic `LISTEN` may appear as cleanup telemetry but should not set `SawListen` or buffer audio.
|
||||
|
||||
## Optional Feature Slice Checks
|
||||
|
||||
|
||||
Reference in New Issue
Block a user