Files
JiboExperiments/OpenJibo/docs/regression-test-plan.md

279 lines
17 KiB
Markdown
Raw Permalink Normal View History

2026-04-26 20:03:30 -05:00
# Regression Test Plan
## Purpose
This plan is the repeatable live regression checklist for OpenJibo Cloud releases.
Use [live-jibo-test-runbook.md](live-jibo-test-runbook.md) for the environment setup and capture mechanics. Use this file for what to test once the robot is connected and the hosted `.NET` cloud is running.
The goal is to reduce trial-and-error cycles: every live pass should prove the release theme, keep prior working paths warm, and produce enough evidence to separate payload bugs, local robot behavior, and STT quality issues.
## When To Run
Run this plan:
- after the last code change before calling a release complete
- after any fix that touches websocket turn finalization, local skill redirects, constrained yes/no, or STT
- before moving from `1.0.18` bug-fix closeout into `1.0.19` feature work
2026-04-29 09:00:04 -05:00
- after the Test 26 and Test 27 fixes, run at least the focused cloud-version, alarm/timer, photo/gallery, stop, volume, and blue-ring cleanup sections before deciding whether `1.0.18` is ready to freeze
2026-04-26 20:03:30 -05:00
For small feature slices, run the automated `.NET` tests plus the smoke checks and only the live sections that share the same machinery. Before release closeout, run the full current-release suite.
## Required Evidence
For each live pass, keep these artifacts together under a named test folder such as `artifact-output/jibo-test-N`:
- `.NET` console logs
- websocket captures and fixture exports
- HTTP captures when startup, update, backup, media, or upload paths are involved
- robot runtime logs pulled after the session
- operator notes with exact phrases attempted and visible robot/menu state
Record failures with the observed transcript, active listen rules, emitted websocket response shape, and whether the robot menu state agreed with the cloud response.
## Release Gates
A release is not ready until these are true or explicitly deferred in [development-plan.md](development-plan.md):
- focused `.NET` cloud tests pass
- running robot reports the expected cloud version by voice and `/health`
2026-04-30 00:21:33 -05:00
- `cloud version` uses `Cloud version ...` wording and settles without self-hotphrase interruption, a self-listened `Cloudford`, or a generic chat tail
2026-04-26 20:03:30 -05:00
- no current-release path emits obsolete OpenJibo-only websocket events such as synthetic pending/context/ack packets
- known working live paths still work: startup, simple chat, radio, basic news, constrained yes/no, alarm, and gallery/create
- any remaining failure is classified as cloud payload, local robot state, STT/audio quality, environment/routing, or deferred feature gap
## Automated Baseline
Run before the live session:
```powershell
dotnet test tests\Jibo.Cloud.Tests\Jibo.Cloud.Tests.csproj --no-restore --nologo -v minimal
```
Expected result for the current baseline: all tests pass.
## Live Smoke Checks
Run these first so obvious environment problems do not pollute feature results:
1. Start the `.NET` cloud using the live runbook.
2. Confirm `/health` reports the expected version.
2026-04-29 09:00:04 -05:00
3. Confirm the robot is not in a local connection-lost state; if logs show `Q4-Server_connection_lost` or a fresh `jibo-server-service` reconnect, wait for it to clear before scoring voice behavior.
2026-04-30 00:21:33 -05:00
4. Ask `cloud version`; confirm Jibo speaks the same version using `Cloud version ...` wording and does not stop itself, follow with `Cloudford`, `I heard...`, a local `@be/surprises` handoff, or another generic tail reply.
2026-04-29 09:00:04 -05:00
5. Run one simple chat turn.
6. Run one joke turn.
7. Confirm websocket capture is being written before continuing.
2026-04-26 20:03:30 -05:00
Stop and fix environment issues if startup, websocket connection, or capture output is not clean.
## Current `1.0.18` Regression Suite
### Radio
Goal: keep the local radio redirect path proven.
- Say `open the radio`.
- Say `play country music`.
- Expected: Jibo opens or resumes the radio locally, and the country phrase carries a `Country` station entity.
- Capture check: websocket output should be local `SKILL_REDIRECT` plus silent completion, not generic chat speech.
### News
Goal: keep the Nimbus-shaped cloud skill path proven.
- Say `tell me the news`.
- Expected: Jibo plays the current synthetic quick brief.
- Capture check: `LISTEN` match includes `cloudSkill = news`, followed by a `news` `SKILL_ACTION`.
- Current limitation: provider-backed and category-expanded headlines are deferred unless selected as the optional feature slice.
### Backup, OTA, And Share Yes/No
Goal: prove constrained yes/no prompts stay local and do not leak global launch rules.
- Trigger the update menu path when available and answer one short `yes` or `no` prompt.
- Exercise any available share/date/offer yes-no prompt and answer both `yes` and `no` across runs when practical.
- Observe backup-in-progress behavior separately from explicit voice commands.
- Do not treat a spoken `take a backup` failure as proof of the backup scheduler path; that command is not currently wired as a hosted-cloud voice feature.
- If the update menu reports backup-in-progress, record whether HTTP captures include any `Backup_*` targets; current evidence points to robot-local scheduler/status or log/upload load unless those calls appear.
2026-04-30 07:37:14 -05:00
- If Jibo announces backup-in-progress without update-menu interaction, note the local skill in robot logs; Tests 26 and 27 showed `@be/surprises-ota`, Test 28 showed the preceding `@be/surprises` router opening after Nimbus, and Test 30 showed gallery settling into `@be/surprises` -> `@be/surprises-ota`.
2026-05-03 00:03:01 -05:00
- Test 31 added a startup `Backup_20170222.List` request before the first voice turn, so if the warning returns, capture that startup backup-status traffic alongside the later surprise handoff.
2026-04-29 09:00:04 -05:00
- If the warning appears soon after startup or update, check for local `jibo-server-service` restart, notification reconnect, or `Q4-Server_connection_lost` before scoring it as a hosted backup defect.
- After cloud-version and generic Nimbus/chat turns, verify the outgoing `LISTEN` match includes `skipSurprises = true`.
2026-04-26 20:03:30 -05:00
- Expected: short `yes`/`no` replies map locally, empty replies no-input locally, and backup/download notifications are not repeatedly re-announced once acknowledged.
- Capture check: active rule remains the constrained rule such as `surprises-ota/want_to_download_now`, `settings/download_now_later`, `shared/yes_no`, or another stock prompt rule; ordinary Nimbus/cloud/local turns should not transition into `@be/surprises` after completion.
2026-04-26 20:03:30 -05:00
### Alarm
Goal: prove the clock skill behaves locally and menu state agrees after the `jibo test 24` fixes.
Start from a known state. If an alarm already exists, record it and clear it through the menu or a controlled voice delete before beginning.
Test these paths:
- explicit set: `set an alarm for 7:43 AM`, adjusted to a near-future time during the actual run
- compact set: `set alarm for 743`, adjusted to a near-future time during the actual run
- clarification: `set an alarm`, then answer the value prompt with a short time such as `7 44` or `7, 44`
- replacement: with an alarm already set, set a different alarm and answer the replacement prompt; verify whether the answer kept or replaced the old alarm
- value-prompt cancel: `set an alarm`, then say `cancel`
- voice delete: `delete my alarm` or `cancel alarm`
2026-04-28 08:01:35 -05:00
- voice delete variants from Test 26: `delete the alarm`, `delete alarm`, and, if ASR mishears it, record whether `delete along` maps to local clock delete
- repeat delete: after clearing an alarm, issue `delete alarm` again and verify the prompt/answer path if the robot asks whether to set one
2026-04-26 20:03:30 -05:00
- no-input cleanup: allow one value prompt to miss or time out when practical
- timer sanity: `set a timer for 10 seconds`, let it fire or record the exact remaining state, then verify a second timer request does not report a stale already-running timer
2026-05-03 00:03:01 -05:00
- STT sanity: if a short alarm time collapses to a shorter transcript such as `seven`, capture that as STT loss; Test 31's `7:11 AM` attempt collapsed to `7:00 PM`
2026-04-26 20:03:30 -05:00
Expected:
- successful set paths appear in the robot alarm menu and fire at the expected time
- replacement prompt answer changes or preserves the alarm consistently with the robot's question
- `cancel` inside the value prompt closes without scheduling
- voice delete clears the robot menu state
2026-05-03 00:03:01 -05:00
- local clock delete/cancel settles without generic chat speech, an open follow-up blue ring, or an unexpected `@be/surprises` handoff
- timer state agrees with what just happened on the robot; a reset gesture should not leave a phantom active timer in the next prompt
2026-04-26 20:03:30 -05:00
- empty value prompt turns complete locally instead of generic `I heard you` speech
Capture check:
- clock payloads use local `@be/clock` handoff with alarm entities when a value exists
- missing values stay in local clock clarification
- `CLIENT_NLU cancel` under `clock/alarm_set_value` or `clock/timer_set_value` maps to local clock `cancel`
- no-input under `clock/alarm_set_value` or `clock/timer_set_value` returns local `LISTEN`/`EOS` only
2026-04-30 07:37:14 -05:00
- value replies under `clock/alarm_set_value` or `clock/timer_set_value` also return local `LISTEN`/`EOS` only; a delayed `@be/clock` relaunch after the local clock skill consumes the reply is a regression
2026-05-03 00:03:01 -05:00
- after a delete/replacement `No`, the robot should not remain in a continuous listen loop or open `@be/surprises` unless the stock OS explicitly takes that route
2026-04-26 20:03:30 -05:00
### Photo Gallery And Create
Goal: prove gallery/create no longer leaves stale listening state after yes/no or preview prompts.
Test these paths:
- `open photo gallery`
- if gallery is empty, answer `yes` to the offer to take a picture
- if the robot hears `open photogal` or another close gallery alias, verify it still launches gallery
2026-04-26 20:03:30 -05:00
- take one photo and answer the keeper prompt with `yes`
- repeat a gallery empty prompt or create keeper prompt with a missed/empty answer when practical
- if using disposable test photos, test delete confirmation once with `no` and once with `yes`
Expected:
- empty gallery `yes` redirects to `@be/create`
- empty gallery `no` exits cleanly when tested
- keeper `yes` completes and Jibo settles without a stale blue ring
2026-04-28 08:01:35 -05:00
- after gallery settles, context-only tails do not produce delayed generic replies such as `that's` or `I didn't hear you`
- transcript-bearing `yes` under gallery `shared/yes_no` is consumed even when the robot reports `@be/gallery` context
2026-04-26 20:03:30 -05:00
- empty `shared/yes_no`, `create/is_it_a_keeper`, and `gallery/gallery_preview` turns no-input locally instead of generic `I heard you`
- delete confirmation only deletes on a positive `yes`
Capture check:
- gallery launch redirects to `@be/gallery`
- create photo redirects to `@be/create/createOnePhoto`
- local no-input replies keep the active constrained rule and strip unrelated global launch rules
- active `shared/yes_no` is not suppressed merely because the current context is `@be/gallery`
2026-04-28 08:01:35 -05:00
- post-gallery binary audio does not continue buffering unless a fresh `LISTEN` appears
2026-04-30 07:37:14 -05:00
- when gallery is empty and asks whether to take a picture, verify whether a local `shared/yes_no` or equivalent `LISTEN` appears and whether the blue ring visually opens for voice input
2026-04-26 20:03:30 -05:00
### Word Of The Day
Goal: prove proactive prompts consume short yes/no answers cleanly instead of echoing them back as generic dialog.
- Let the robot proactively launch Word of the Day when it chooses to do so.
- Answer the proactive prompt with a short `yes` and, if available, a short `no`.
- If the robot echoes or mocks the answer instead of consuming it, record the exact transcript and the active rule.
Expected:
- proactive Word of the Day uses the constrained yes/no path and consumes short confirmation answers
- the robot does not stay in a stray listen state after the proactive prompt resolves
Capture check:
- proactive yes/no should present a constrained rule rather than a generic chat rule
- the answer should finalize cleanly without falling back to an unrelated surprise or mock response
2026-04-26 20:03:30 -05:00
### STT And Audio Quality
Goal: avoid misclassifying transcript failures as payload regressions.
For every failed voice turn, record:
- phrase attempted
- transcript observed in websocket capture
- active listen rule
- whether the transcript was empty, collapsed, or semantically wrong
- whether local `ffmpeg` or `whisper.cpp` logged an error
Expected:
- no `ffmpeg` failure should become the dominant failure mode for non-Opus buffered audio
- short replies such as `yes`, `no`, `cancel`, and short alarm times should either map correctly or be classified as STT misses with evidence
- when chasing a flaky `$YESNO` reply, look for the new turn telemetry categories `binary_audio_received`, `binary_audio_ignored`, `yes_no_turn_received`, `yes_no_turn_resolved`, and `yes_no_no_input`; the useful question is whether the short reply still had `AwaitingTurnCompletion = true`, active listen rules, and buffered audio when it hit the finalizer
2026-04-26 20:03:30 -05:00
2026-04-26 20:19:16 -05:00
### Stop And Volume
Goal: prove the added lightweight device-control slice before closing `1.0.18`.
Test these phrases:
- `stop`
- `stop that`
- `never mind`
- `never mind.` or any punctuated transcript form observed in the capture
2026-04-26 20:19:16 -05:00
- `turn it up`
- `turn it down`
- `set volume to six`
- `set volume to 6`
2026-04-26 20:19:16 -05:00
- `show volume controls`
Expected:
- stop commands settle the robot locally without generic chat speech
- `turn it up` and `turn it down` adjust volume or at least produce the stock local volume event/log
- `set volume to six` sets or attempts to set the local volume level to `6`
- `show volume controls` opens the settings volume panel
- after `show volume controls`, the robot settles without a trailing `I heard you`
2026-04-26 20:19:16 -05:00
Capture check:
- stop emits `nlu.intent = stop`, `nlu.domain = global_commands`, then redirects to `@be/idle`
- punctuated `Never mind.` still maps to global stop, not generic chat
2026-04-26 20:19:16 -05:00
- relative volume emits `nlu.intent = volumeUp` or `volumeDown`, `nlu.domain = global_commands`, and `entities.volumeLevel = null`, with no `SKILL_ACTION` cloud speech
- absolute volume emits `nlu.intent = volumeToValue` and `entities.volumeLevel` matching the requested value, including the observed `Set Volume 2-6.` homophone shape, with no `SKILL_ACTION` cloud speech
2026-04-26 20:19:16 -05:00
- volume controls redirects to `@be/settings` with `nlu.intent = volumeQuery`
- passive `@be/settings` / `settings/volume_control` audio tails complete locally and do not reopen Nimbus fallback speech
2026-04-26 20:19:16 -05:00
2026-04-28 08:01:35 -05:00
### Blue-Ring Cleanup
Goal: catch the Test 26 no-`LISTEN` buffering regression, the Test 27 diagnostic speech-tail regression, and the Test 28 unsuppressed end-of-skill surprise handoff quickly.
2026-04-28 08:01:35 -05:00
- After any local skill redirect or generic chat reply, wait five to ten seconds before issuing the next phrase.
- If the blue ring remains open, record the active transID and whether the websocket capture shows a new `LISTEN`.
2026-04-29 09:00:04 -05:00
- After `cloud version`, wait five to ten seconds and confirm there is no fresh no-transcript hotphrase launch `LISTEN` that turns speech tail into generic chat.
- Confirm ordinary hosted replies and local redirects carry `match.skipSurprises = true`.
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens; settled turns do not open `@be/surprises` / `@be/surprises-ota`.
2026-05-03 00:03:01 -05:00
- Expected: binary audio for an existing transID is ignored until a fresh valid `LISTEN` appears; blank hotphrase turns clear instead of buffering indefinitely; diagnostic speech tails do not reopen launch listens; settled turns do not open `@be/surprises` / `@be/surprises-ota`; a delete/replacement `No` should not strand the robot in a blue-ring listen loop.
- Expected: a proactive yes/no prompt such as Word of the Day should consume `yes`/`no` without echoing the answer back or leaving the robot listening.
- Capture check: long-running context-only transactions should not accumulate buffered audio chunks or stay `AwaitingTurnCompletion = true`; a late ignored diagnostic `LISTEN` may appear as cleanup telemetry but should not set `SawListen` or buffer audio; normal cloud/local completions should not be followed by a BE surprise router request.
2026-04-28 08:01:35 -05:00
2026-04-26 20:03:30 -05:00
## Optional Feature Slice Checks
When a new feature is added before a release closes:
- add two or three exact phrases to this section before live testing
- capture one successful path and one near-miss phrase if the feature is voice-routed
- keep the test narrow enough that a failure can be fixed or deferred without reopening the whole release
For the current candidate list, add cases here when implemented:
- robot age/persona: `how old are you`
## After The Run
After each session:
1. Summarize pass/fail by section.
2. Mark each failure as cloud payload, local robot state, STT/audio, environment, or deferred gap.
3. Import any high-value websocket fixture.
4. Update [development-plan.md](development-plan.md) with latest live evidence.
5. Update [feature-backlog.md](feature-backlog.md) with what remains in the current release versus what moves to the next release.