Update regression docs for test 32 alarm and Word of the Day

This commit is contained in:
Jacob Dubin
2026-05-03 13:16:55 -05:00
parent 21c5c7d681
commit 2ec4902189
12 changed files with 47315 additions and 0 deletions

View File

@@ -32,6 +32,7 @@ Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm an
- Current source now speaks the diagnostic as `Cloud version ...` without saying `Jibo`, while keeping the one-shot and late-listen cleanup guards.
- Test 30 showed `cloud version` speaking cleanly with no interruption. The backup warning later appeared after opening gallery from the menu: gallery asked the empty-gallery photo question, then stock BE opened `@be/surprises`, selected `@be/surprises-ota`, and spoke the local backup announcement. The captured HTTP traffic still did not show hosted `Backup_*` calls.
- Test 31 sharpened the remaining alarm/back-up picture: the startup capture includes a legacy `Backup_20170222.List` request before any voice turn, the alarm set path still collapsed `7:11 AM` into `7:00 PM` / `setting alarm for seven`, and the later clock `No` replied `that's fine` before the robot opened `@be/surprises` and eventually got stuck in a blue-ring listen loop until reset.
- Test 32 shows the alarm set path is better, but two cleanup gaps remain in the newer-code window: the alarm flow can still leave a listen open at the end, and the proactive Word of the Day yes/no branch can miss a short `Yes` and bounce into a mock/echo response. The delete-alarm retry case also still asks whether to set an alarm again, then mishandles the follow-up yes/no reply.
- Test 30 showed the alarm value reply `638` arrived at 6:38:13 AM local. Stock clock parsed that as `6:38 PM`, and our cloud response then added a delayed `@be/clock` relaunch on top of the active local clock value flow, causing the duplicate existing-alarm replacement prompt. Current source now suppresses the extra clock relaunch for local clock follow-up rules.
- Backup-in-progress still appears robot-local in the user-facing voice flow. Tests 27, 28, 29, and 30 had no matching `Backup_*` HTTP calls during the voice prompt itself. Keep investigating robot-local scheduler/status, startup reconnect state, CPU/load, and log/upload work if backup status itself remains sluggish after surprise suppression.
- Test 26 remains the broader regression evidence for gallery success, alarm replacement/delete risk, stop/volume live proof, and short-answer STT weakness. Alarm replacement/menu agreement is still a live release risk, but Test 30 identified and patched one duplicate-handoff cause.

View File

@@ -50,6 +50,7 @@ Current release theme:
- `jibo test 29` confirmed `skipSurprises = true` was reaching stock BE and no backup announcement repeated in the focused run, but the cloud-version answer still interrupted because the spoken diagnostic included `Jibo` and triggered local hotphrase barge-in during Nimbus TTS
- `jibo test 30` confirmed cloud-version now speaks cleanly; it still exposed a local gallery-to-`@be/surprises-ota` backup announcement, missing visible empty-gallery voice listen, and a duplicate alarm clock relaunch after `638` was parsed locally as `6:38 PM`
- `jibo test 31` showed the remaining alarm/backup wrinkle in full: startup logged a legacy `Backup_20170222.List` request before the first voice turn, `7:11 AM` collapsed into `7:00 PM` / `setting alarm for seven`, and the later clock `No` replied `that's fine` before the robot opened `@be/surprises` and ended in a blue-ring listen loop until reset
- `jibo test 32` suggests the alarm set path is improving, but the remaining regression surface is now sharper: an alarm can still leave the listen open at the end, the proactive Word of the Day `Yes` branch can miss its yes/no slot and echo back, and delete-alarm retry still falls into a second `set one?` question with a broken follow-up reply
## Immediate `1.0.18` Queue

View File

@@ -117,6 +117,7 @@ Test these paths:
- value-prompt cancel: `set an alarm`, then say `cancel`
- voice delete: `delete my alarm` or `cancel alarm`
- voice delete variants from Test 26: `delete the alarm`, `delete alarm`, and, if ASR mishears it, record whether `delete along` maps to local clock delete
- repeat delete: after clearing an alarm, issue `delete alarm` again and verify the prompt/answer path if the robot asks whether to set one
- no-input cleanup: allow one value prompt to miss or time out when practical
- timer sanity: `set a timer for 10 seconds`, let it fire or record the exact remaining state, then verify a second timer request does not report a stale already-running timer
- STT sanity: if a short alarm time collapses to a shorter transcript such as `seven`, capture that as STT loss; Test 31's `7:11 AM` attempt collapsed to `7:00 PM`
@@ -172,6 +173,24 @@ Capture check:
- post-gallery binary audio does not continue buffering unless a fresh `LISTEN` appears
- when gallery is empty and asks whether to take a picture, verify whether a local `shared/yes_no` or equivalent `LISTEN` appears and whether the blue ring visually opens for voice input
### Word Of The Day
Goal: prove proactive prompts consume short yes/no answers cleanly instead of echoing them back as generic dialog.
- Let the robot proactively launch Word of the Day when it chooses to do so.
- Answer the proactive prompt with a short `yes` and, if available, a short `no`.
- If the robot echoes or mocks the answer instead of consuming it, record the exact transcript and the active rule.
Expected:
- proactive Word of the Day uses the constrained yes/no path and consumes short confirmation answers
- the robot does not stay in a stray listen state after the proactive prompt resolves
Capture check:
- proactive yes/no should present a constrained rule rather than a generic chat rule
- the answer should finalize cleanly without falling back to an unrelated surprise or mock response
### STT And Audio Quality
Goal: avoid misclassifying transcript failures as payload regressions.
@@ -232,6 +251,7 @@ Goal: catch the Test 26 no-`LISTEN` buffering regression, the Test 27 diagnostic
- Confirm ordinary hosted replies and local redirects carry `match.skipSurprises = true`.
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens; settled turns do not open `@be/surprises` / `@be/surprises-ota`.
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens; settled turns do not open `@be/surprises` / `@be/surprises-ota`; a delete/replacement `No` should not strand the robot in a blue-ring listen loop.
- Expected: a proactive yes/no prompt such as Word of the Day should consume `yes`/`no` without echoing the answer back or leaving the robot listening.
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`; a late ignored diagnostic `LISTEN` may appear as cleanup telemetry but should not set `SawListen` or buffer audio; normal cloud/local completions should not be followed by a BE surprise router request.
## Optional Feature Slice Checks