add regression test runbook
This commit is contained in:
@@ -6,7 +6,7 @@ This document is the current working plan for the OpenJibo hosted cloud.
|
||||
|
||||
The production lane is the `.NET` cloud in `src/Jibo.Cloud/dotnet`. The Node server remains the protocol oracle, capture harness, and fast reverse-engineering lab, but it is no longer the long-term hosted architecture.
|
||||
|
||||
Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.
|
||||
Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md). Live closeout checks live in [regression-test-plan.md](regression-test-plan.md). This file tracks release shape, current code truth, evidence sources, and the boundary between `1.0.18` closeout work and `1.0.19` follow-up work.
|
||||
|
||||
## Current Release Snapshot
|
||||
|
||||
@@ -141,6 +141,7 @@ When sources disagree, prefer the newest live stock-OS capture for runtime behav
|
||||
Before calling `1.0.18` complete, prove or explicitly defer these:
|
||||
|
||||
- Run the focused `.NET` cloud test suite after the last feature slice.
|
||||
- Run the current-release live checklist in [regression-test-plan.md](regression-test-plan.md).
|
||||
- Confirm the running robot build reports cloud version `1.0.18`.
|
||||
- Regression test alarm flows again after the `jibo test 24` fixes: set with explicit time, set with compact/spoken/comma-separated time, clarify missing time, replace an existing alarm, cancel/delete by voice, cancel out of a value prompt, and verify the menu agrees.
|
||||
- Regression test photo/gallery flows again after the `jibo test 24` fixes: open gallery, answer the stock `shared/yes_no` prompt with a transcript-bearing `yes`, hand into create, take one photo, keep it, and avoid blue-ring or `I heard you` stale turns.
|
||||
|
||||
@@ -6,6 +6,8 @@ This backlog turns discovery into implementation slices for the hosted `.NET` cl
|
||||
|
||||
Use it as the working queue when picking the next feature or bug-fix slice. The release pattern is: implement a narrow slice, test it on stock OS `1.9`, update this file with what happened, then either close the release or roll the next larger idea forward.
|
||||
|
||||
The live regression checklist for release closeout is [regression-test-plan.md](regression-test-plan.md).
|
||||
|
||||
Status key:
|
||||
|
||||
- `implemented`: present in current source and covered by focused tests
|
||||
@@ -505,6 +507,8 @@ Before closing `1.0.18`:
|
||||
4. Alarm and photo/gallery regression
|
||||
5. Optional small feature only if the regression pass stays calm
|
||||
|
||||
Use [regression-test-plan.md](regression-test-plan.md) as the detailed checklist for this sequence.
|
||||
|
||||
For `1.0.19`:
|
||||
|
||||
1. Stop command or volume control
|
||||
|
||||
@@ -6,6 +6,8 @@ Run the first real `Jibo -> .NET OpenJibo cloud` test on the Ubuntu machine usin
|
||||
|
||||
This runbook intentionally avoids introducing Azure, new hostnames, or new robot bootstrap changes during the first live test.
|
||||
|
||||
For release closeout coverage after the robot is connected, use [regression-test-plan.md](regression-test-plan.md).
|
||||
|
||||
## Recommended Approach
|
||||
|
||||
Use the existing Ubuntu networking path and certificate material first.
|
||||
|
||||
193
OpenJibo/docs/regression-test-plan.md
Normal file
193
OpenJibo/docs/regression-test-plan.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Regression Test Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This plan is the repeatable live regression checklist for OpenJibo Cloud releases.
|
||||
|
||||
Use [live-jibo-test-runbook.md](live-jibo-test-runbook.md) for the environment setup and capture mechanics. Use this file for what to test once the robot is connected and the hosted `.NET` cloud is running.
|
||||
|
||||
The goal is to reduce trial-and-error cycles: every live pass should prove the release theme, keep prior working paths warm, and produce enough evidence to separate payload bugs, local robot behavior, and STT quality issues.
|
||||
|
||||
## When To Run
|
||||
|
||||
Run this plan:
|
||||
|
||||
- after the last code change before calling a release complete
|
||||
- after any fix that touches websocket turn finalization, local skill redirects, constrained yes/no, or STT
|
||||
- before moving from `1.0.18` bug-fix closeout into `1.0.19` feature work
|
||||
|
||||
For small feature slices, run the automated `.NET` tests plus the smoke checks and only the live sections that share the same machinery. Before release closeout, run the full current-release suite.
|
||||
|
||||
## Required Evidence
|
||||
|
||||
For each live pass, keep these artifacts together under a named test folder such as `artifact-output/jibo-test-N`:
|
||||
|
||||
- `.NET` console logs
|
||||
- websocket captures and fixture exports
|
||||
- HTTP captures when startup, update, backup, media, or upload paths are involved
|
||||
- robot runtime logs pulled after the session
|
||||
- operator notes with exact phrases attempted and visible robot/menu state
|
||||
|
||||
Record failures with the observed transcript, active listen rules, emitted websocket response shape, and whether the robot menu state agreed with the cloud response.
|
||||
|
||||
## Release Gates
|
||||
|
||||
A release is not ready until these are true or explicitly deferred in [development-plan.md](development-plan.md):
|
||||
|
||||
- focused `.NET` cloud tests pass
|
||||
- running robot reports the expected cloud version by voice and `/health`
|
||||
- no current-release path emits obsolete OpenJibo-only websocket events such as synthetic pending/context/ack packets
|
||||
- known working live paths still work: startup, simple chat, radio, basic news, constrained yes/no, alarm, and gallery/create
|
||||
- any remaining failure is classified as cloud payload, local robot state, STT/audio quality, environment/routing, or deferred feature gap
|
||||
|
||||
## Automated Baseline
|
||||
|
||||
Run before the live session:
|
||||
|
||||
```powershell
|
||||
dotnet test tests\Jibo.Cloud.Tests\Jibo.Cloud.Tests.csproj --no-restore --nologo -v minimal
|
||||
```
|
||||
|
||||
Expected result for the current baseline: all tests pass.
|
||||
|
||||
## Live Smoke Checks
|
||||
|
||||
Run these first so obvious environment problems do not pollute feature results:
|
||||
|
||||
1. Start the `.NET` cloud using the live runbook.
|
||||
2. Confirm `/health` reports the expected version.
|
||||
3. Ask `cloud version`; confirm Jibo speaks the same version.
|
||||
4. Run one simple chat turn.
|
||||
5. Run one joke turn.
|
||||
6. Confirm websocket capture is being written before continuing.
|
||||
|
||||
Stop and fix environment issues if startup, websocket connection, or capture output is not clean.
|
||||
|
||||
## Current `1.0.18` Regression Suite
|
||||
|
||||
### Radio
|
||||
|
||||
Goal: keep the local radio redirect path proven.
|
||||
|
||||
- Say `open the radio`.
|
||||
- Say `play country music`.
|
||||
- Expected: Jibo opens or resumes the radio locally, and the country phrase carries a `Country` station entity.
|
||||
- Capture check: websocket output should be local `SKILL_REDIRECT` plus silent completion, not generic chat speech.
|
||||
|
||||
### News
|
||||
|
||||
Goal: keep the Nimbus-shaped cloud skill path proven.
|
||||
|
||||
- Say `tell me the news`.
|
||||
- Expected: Jibo plays the current synthetic quick brief.
|
||||
- Capture check: `LISTEN` match includes `cloudSkill = news`, followed by a `news` `SKILL_ACTION`.
|
||||
- Current limitation: provider-backed and category-expanded headlines are deferred unless selected as the optional feature slice.
|
||||
|
||||
### Backup, OTA, And Share Yes/No
|
||||
|
||||
Goal: prove constrained yes/no prompts stay local and do not leak global launch rules.
|
||||
|
||||
- Trigger the update menu path when available and answer one short `yes` or `no` prompt.
|
||||
- Exercise any available share/date/offer yes-no prompt and answer both `yes` and `no` across runs when practical.
|
||||
- Observe backup-in-progress behavior separately from explicit voice commands.
|
||||
- Do not treat a spoken `take a backup` failure as proof of the backup scheduler path; that command is not currently wired as a hosted-cloud voice feature.
|
||||
- Expected: short `yes`/`no` replies map locally, empty replies no-input locally, and backup/download notifications are not repeatedly re-announced once acknowledged.
|
||||
- Capture check: active rule remains the constrained rule such as `surprises-ota/want_to_download_now`, `settings/download_now_later`, `shared/yes_no`, or another stock prompt rule.
|
||||
|
||||
### Alarm
|
||||
|
||||
Goal: prove the clock skill behaves locally and menu state agrees after the `jibo test 24` fixes.
|
||||
|
||||
Start from a known state. If an alarm already exists, record it and clear it through the menu or a controlled voice delete before beginning.
|
||||
|
||||
Test these paths:
|
||||
|
||||
- explicit set: `set an alarm for 7:43 AM`, adjusted to a near-future time during the actual run
|
||||
- compact set: `set alarm for 743`, adjusted to a near-future time during the actual run
|
||||
- clarification: `set an alarm`, then answer the value prompt with a short time such as `7 44` or `7, 44`
|
||||
- replacement: with an alarm already set, set a different alarm and answer the replacement prompt; verify whether the answer kept or replaced the old alarm
|
||||
- value-prompt cancel: `set an alarm`, then say `cancel`
|
||||
- voice delete: `delete my alarm` or `cancel alarm`
|
||||
- no-input cleanup: allow one value prompt to miss or time out when practical
|
||||
|
||||
Expected:
|
||||
|
||||
- successful set paths appear in the robot alarm menu and fire at the expected time
|
||||
- replacement prompt answer changes or preserves the alarm consistently with the robot's question
|
||||
- `cancel` inside the value prompt closes without scheduling
|
||||
- voice delete clears the robot menu state
|
||||
- empty value prompt turns complete locally instead of generic `I heard you` speech
|
||||
|
||||
Capture check:
|
||||
|
||||
- clock payloads use local `@be/clock` handoff with alarm entities when a value exists
|
||||
- missing values stay in local clock clarification
|
||||
- `CLIENT_NLU cancel` under `clock/alarm_set_value` or `clock/timer_set_value` maps to local clock `cancel`
|
||||
- no-input under `clock/alarm_set_value` or `clock/timer_set_value` returns local `LISTEN`/`EOS` only
|
||||
|
||||
### Photo Gallery And Create
|
||||
|
||||
Goal: prove gallery/create no longer leaves stale listening state after yes/no or preview prompts.
|
||||
|
||||
Test these paths:
|
||||
|
||||
- `open photo gallery`
|
||||
- if gallery is empty, answer `yes` to the offer to take a picture
|
||||
- take one photo and answer the keeper prompt with `yes`
|
||||
- repeat a gallery empty prompt or create keeper prompt with a missed/empty answer when practical
|
||||
- if using disposable test photos, test delete confirmation once with `no` and once with `yes`
|
||||
|
||||
Expected:
|
||||
|
||||
- empty gallery `yes` redirects to `@be/create`
|
||||
- empty gallery `no` exits cleanly when tested
|
||||
- keeper `yes` completes and Jibo settles without a stale blue ring
|
||||
- empty `shared/yes_no`, `create/is_it_a_keeper`, and `gallery/gallery_preview` turns no-input locally instead of generic `I heard you`
|
||||
- delete confirmation only deletes on a positive `yes`
|
||||
|
||||
Capture check:
|
||||
|
||||
- gallery launch redirects to `@be/gallery`
|
||||
- create photo redirects to `@be/create/createOnePhoto`
|
||||
- local no-input replies keep the active constrained rule and strip unrelated global launch rules
|
||||
|
||||
### STT And Audio Quality
|
||||
|
||||
Goal: avoid misclassifying transcript failures as payload regressions.
|
||||
|
||||
For every failed voice turn, record:
|
||||
|
||||
- phrase attempted
|
||||
- transcript observed in websocket capture
|
||||
- active listen rule
|
||||
- whether the transcript was empty, collapsed, or semantically wrong
|
||||
- whether local `ffmpeg` or `whisper.cpp` logged an error
|
||||
|
||||
Expected:
|
||||
|
||||
- no `ffmpeg` failure should become the dominant failure mode for non-Opus buffered audio
|
||||
- short replies such as `yes`, `no`, `cancel`, and short alarm times should either map correctly or be classified as STT misses with evidence
|
||||
|
||||
## Optional Feature Slice Checks
|
||||
|
||||
When a new feature is added before a release closes:
|
||||
|
||||
- add two or three exact phrases to this section before live testing
|
||||
- capture one successful path and one near-miss phrase if the feature is voice-routed
|
||||
- keep the test narrow enough that a failure can be fixed or deferred without reopening the whole release
|
||||
|
||||
For the current candidate list, add cases here when implemented:
|
||||
|
||||
- stop command: `stop`, `stop that`, `never mind`
|
||||
- volume: `turn it up`, `turn it down`, `increase the volume`, `decrease the volume`
|
||||
- robot age/persona: `how old are you`
|
||||
|
||||
## After The Run
|
||||
|
||||
After each session:
|
||||
|
||||
1. Summarize pass/fail by section.
|
||||
2. Mark each failure as cloud payload, local robot state, STT/audio, environment, or deferred gap.
|
||||
3. Import any high-value websocket fixture.
|
||||
4. Update [development-plan.md](development-plan.md) with latest live evidence.
|
||||
5. Update [feature-backlog.md](feature-backlog.md) with what remains in the current release versus what moves to the next release.
|
||||
Reference in New Issue
Block a user