add regression test runbook

2026-04-26 20:03:30 -05:00
parent eab4efd820
commit df78170aa2
5 changed files with 202 additions and 1 deletions
--- a/OpenJibo/docs/development-plan.md
+++ b/OpenJibo/docs/development-plan.md
@@ -6,7 +6,7 @@ This document is the current working plan for the OpenJibo hosted cloud.

 The production lane is the `.NET` cloud in `src/Jibo.Cloud/dotnet`. The Node server remains the protocol oracle, capture harness, and fast reverse-engineering lab, but it is no longer the long-term hosted architecture.

-Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.
+Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). Live closeout checks live in [regression-test-plan.md](regression-test-plan.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.

 ## Current Release Snapshot

@@ -141,6 +141,7 @@ When sources disagree, prefer the newest live stock-OS capture for runtime behav
 Before calling `1.0.18` complete, prove or explicitly defer these:

 - Run the focused `.NET` cloud test suite after the last feature slice.
+- Run the current-release live checklist in [regression-test-plan.md](regression-test-plan.md).
 - Confirm the running robot build reports cloud version `1.0.18`.
 - Regression test alarm flows again after the `jibo test 24` fixes: set with explicit time, set with compact/spoken/comma-separated time, clarify missing time, replace an existing alarm, cancel/delete by voice, cancel out of a value prompt, and verify the menu agrees.
 - Regression test photo/gallery flows again after the `jibo test 24` fixes: open gallery, answer the stock `shared/yes_no` prompt with a transcript-bearing `yes`, hand into create, take one photo, keep it, and avoid blue-ring or `I heard you` stale turns.
--- a/OpenJibo/docs/feature-backlog.md
+++ b/OpenJibo/docs/feature-backlog.md
@@ -6,6 +6,8 @@ This backlog turns discovery into implementation slices for the hosted `.NET` cl

 Use it as the working queue when picking the next feature or bug-fix slice. The release pattern is: implement a narrow slice, test it on stock OS `1.9`, update this file with what happened, then either close the release or roll the next larger idea forward.

+The live regression checklist for release closeout is [regression-test-plan.md](regression-test-plan.md).
+
 Status key:

 - `implemented`: present in current source and covered by focused tests
@@ -505,6 +507,8 @@ Before closing `1.0.18`:
 4. Alarm and photo/gallery regression
 5. Optional small feature only if the regression pass stays calm

+Use [regression-test-plan.md](regression-test-plan.md) as the detailed checklist for this sequence.
+
 For `1.0.19`:

 1. Stop command or volume control
--- a/OpenJibo/docs/live-jibo-test-runbook.md
+++ b/OpenJibo/docs/live-jibo-test-runbook.md
@@ -6,6 +6,8 @@ Run the first real `Jibo -> .NET OpenJibo cloud` test on the Ubuntu machine usin

 This runbook intentionally avoids introducing Azure, new hostnames, or new robot bootstrap changes during the first live test.

+For release closeout coverage after the robot is connected, use [regression-test-plan.md](regression-test-plan.md).
+
 ## Recommended Approach

 Use the existing Ubuntu networking path and certificate material first.
--- a/OpenJibo/docs/regression-test-plan.md
+++ b/OpenJibo/docs/regression-test-plan.md
@@ -0,0 +1,193 @@
+# Regression Test Plan
+
+## Purpose
+
+This plan is the repeatable live regression checklist for OpenJibo Cloud releases.
+
+Use [live-jibo-test-runbook.md](live-jibo-test-runbook.md) for the environment setup and capture mechanics. Use this file for what to test once the robot is connected and the hosted `.NET` cloud is running.
+
+The goal is to reduce trial-and-error cycles: every live pass should prove the release theme, keep prior working paths warm, and produce enough evidence to separate payload bugs, local robot behavior, and STT quality issues.
+
+## When To Run
+
+Run this plan:
+
+- after the last code change before calling a release complete
+- after any fix that touches websocket turn finalization, local skill redirects, constrained yes/no, or STT
+- before moving from `1.0.18` bug-fix closeout into `1.0.19` feature work
+
+For small feature slices, run the automated `.NET` tests plus the smoke checks and only the live sections that share the same machinery. Before release closeout, run the full current-release suite.
+
+## Required Evidence
+
+For each live pass, keep these artifacts together under a named test folder such as `artifact-output/jibo-test-N`:
+
+- `.NET` console logs
+- websocket captures and fixture exports
+- HTTP captures when startup, update, backup, media, or upload paths are involved
+- robot runtime logs pulled after the session
+- operator notes with exact phrases attempted and visible robot/menu state
+
+Record failures with the observed transcript, active listen rules, emitted websocket response shape, and whether the robot menu state agreed with the cloud response.
+
+## Release Gates
+
+A release is not ready until these are true or explicitly deferred in [development-plan.md](development-plan.md):
+
+- focused `.NET` cloud tests pass
+- running robot reports the expected cloud version by voice and `/health`
+- no current-release path emits obsolete OpenJibo-only websocket events such as synthetic pending/context/ack packets
+- known working live paths still work: startup, simple chat, radio, basic news, constrained yes/no, alarm, and gallery/create
+- any remaining failure is classified as cloud payload, local robot state, STT/audio quality, environment/routing, or deferred feature gap
+
+## Automated Baseline
+
+Run before the live session:
+
+```powershell
+dotnet test tests\Jibo.Cloud.Tests\Jibo.Cloud.Tests.csproj --no-restore --nologo -v minimal
+```
+
+Expected result for the current baseline: all tests pass.
+
+## Live Smoke Checks
+
+Run these first so obvious environment problems do not pollute feature results:
+
+1. Start the `.NET` cloud using the live runbook.
+2. Confirm `/health` reports the expected version.
+3. Ask `cloud version`; confirm Jibo speaks the same version.
+4. Run one simple chat turn.
+5. Run one joke turn.
+6. Confirm websocket capture is being written before continuing.
+
+Stop and fix environment issues if startup, websocket connection, or capture output is not clean.
+
+## Current `1.0.18` Regression Suite
+
+### Radio
+
+Goal: keep the local radio redirect path proven.
+
+- Say `open the radio`.
+- Say `play country music`.
+- Expected: Jibo opens or resumes the radio locally, and the country phrase carries a `Country` station entity.
+- Capture check: websocket output should be local `SKILL_REDIRECT` plus silent completion, not generic chat speech.
+
+### News
+
+Goal: keep the Nimbus-shaped cloud skill path proven.
+
+- Say `tell me the news`.
+- Expected: Jibo plays the current synthetic quick brief.
+- Capture check: `LISTEN` match includes `cloudSkill = news`, followed by a `news` `SKILL_ACTION`.
+- Current limitation: provider-backed and category-expanded headlines are deferred unless selected as the optional feature slice.
+
+### Backup, OTA, And Share Yes/No
+
+Goal: prove constrained yes/no prompts stay local and do not leak global launch rules.
+
+- Trigger the update menu path when available and answer one short `yes` or `no` prompt.
+- Exercise any available share/date/offer yes-no prompt and answer both `yes` and `no` across runs when practical.
+- Observe backup-in-progress behavior separately from explicit voice commands.
+- Do not treat a spoken `take a backup` failure as proof of the backup scheduler path; that command is not currently wired as a hosted-cloud voice feature.
+- Expected: short `yes`/`no` replies map locally, empty replies no-input locally, and backup/download notifications are not repeatedly re-announced once acknowledged.
+- Capture check: active rule remains the constrained rule such as `surprises-ota/want_to_download_now`, `settings/download_now_later`, `shared/yes_no`, or another stock prompt rule.
+
+### Alarm
+
+Goal: prove the clock skill behaves locally and menu state agrees after the `jibo test 24` fixes.
+
+Start from a known state. If an alarm already exists, record it and clear it through the menu or a controlled voice delete before beginning.
+
+Test these paths:
+
+- explicit set: `set an alarm for 7:43 AM`, adjusted to a near-future time during the actual run
+- compact set: `set alarm for 743`, adjusted to a near-future time during the actual run
+- clarification: `set an alarm`, then answer the value prompt with a short time such as `7 44` or `7, 44`
+- replacement: with an alarm already set, set a different alarm and answer the replacement prompt; verify whether the answer kept or replaced the old alarm
+- value-prompt cancel: `set an alarm`, then say `cancel`
+- voice delete: `delete my alarm` or `cancel alarm`
+- no-input cleanup: allow one value prompt to miss or time out when practical
+
+Expected:
+
+- successful set paths appear in the robot alarm menu and fire at the expected time
+- replacement prompt answer changes or preserves the alarm consistently with the robot's question
+- `cancel` inside the value prompt closes without scheduling
+- voice delete clears the robot menu state
+- empty value prompt turns complete locally instead of generic `I heard you` speech
+
+Capture check:
+
+- clock payloads use local `@be/clock` handoff with alarm entities when a value exists
+- missing values stay in local clock clarification
+- `CLIENT_NLU cancel` under `clock/alarm_set_value` or `clock/timer_set_value` maps to local clock `cancel`
+- no-input under `clock/alarm_set_value` or `clock/timer_set_value` returns local `LISTEN`/`EOS` only
+
+### Photo Gallery And Create
+
+Goal: prove gallery/create no longer leaves stale listening state after yes/no or preview prompts.
+
+Test these paths:
+
+- `open photo gallery`
+- if gallery is empty, answer `yes` to the offer to take a picture
+- take one photo and answer the keeper prompt with `yes`
+- repeat a gallery empty prompt or create keeper prompt with a missed/empty answer when practical
+- if using disposable test photos, test delete confirmation once with `no` and once with `yes`
+
+Expected:
+
+- empty gallery `yes` redirects to `@be/create`
+- empty gallery `no` exits cleanly when tested
+- keeper `yes` completes and Jibo settles without a stale blue ring
+- empty `shared/yes_no`, `create/is_it_a_keeper`, and `gallery/gallery_preview` turns no-input locally instead of generic `I heard you`
+- delete confirmation only deletes on a positive `yes`
+
+Capture check:
+
+- gallery launch redirects to `@be/gallery`
+- create photo redirects to `@be/create/createOnePhoto`
+- local no-input replies keep the active constrained rule and strip unrelated global launch rules
+
+### STT And Audio Quality
+
+Goal: avoid misclassifying transcript failures as payload regressions.
+
+For every failed voice turn, record:
+
+- phrase attempted
+- transcript observed in websocket capture
+- active listen rule
+- whether the transcript was empty, collapsed, or semantically wrong
+- whether local `ffmpeg` or `whisper.cpp` logged an error
+
+Expected:
+
+- no `ffmpeg` failure should become the dominant failure mode for non-Opus buffered audio
+- short replies such as `yes`, `no`, `cancel`, and short alarm times should either map correctly or be classified as STT misses with evidence
+
+## Optional Feature Slice Checks
+
+When a new feature is added before a release closes:
+
+- add two or three exact phrases to this section before live testing
+- capture one successful path and one near-miss phrase if the feature is voice-routed
+- keep the test narrow enough that a failure can be fixed or deferred without reopening the whole release
+
+For the current candidate list, add cases here when implemented:
+
+- stop command: `stop`, `stop that`, `never mind`
+- volume: `turn it up`, `turn it down`, `increase the volume`, `decrease the volume`
+- robot age/persona: `how old are you`
+
+## After The Run
+
+After each session:
+
+1. Summarize pass/fail by section.
+2. Mark each failure as cloud payload, local robot state, STT/audio, environment, or deferred gap.
+3. Import any high-value websocket fixture.
+4. Update [development-plan.md](development-plan.md) with latest live evidence.
+5. Update [feature-backlog.md](feature-backlog.md) with what remains in the current release versus what moves to the next release.