enhanced skill and yes/no routing

2026-04-18 16:29:27 -05:00
parent faf021eb89
commit 83a9350a9d
13 changed files with 455 additions and 29 deletions
--- a/OpenJibo/docs/development-plan.md
+++ b/OpenJibo/docs/development-plan.md
@@ -69,6 +69,27 @@ Near-term ASR work should stay staged:

 That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.

+## Latest Capture Findings
+
+The latest live test round tightened up three priorities:
+
+- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
+- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
+- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
+
+Evidence from the latest `2026-04-18` captures:
+
+- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
+- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
+- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
+
+Near-term interaction work should now prioritize:
+
+1. preserve and interpret yes/no turn constraints from observed listen rules
+2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
+3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
+4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
+
 ## Working Cloud Framework

 The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
--- a/OpenJibo/docs/live-jibo-test-runbook.md
+++ b/OpenJibo/docs/live-jibo-test-runbook.md
@@ -130,6 +130,23 @@ python3 ./scripts/cloud/import-websocket-capture-fixture.py \
 - whether EOS timing matched expectations
 - whether any unexpected message families appeared

+## Latest Test Notes To Carry Forward
+
+The most recent live round showed that startup and some Q-and-A paths are progressing, but audio-turn reliability is still uneven.
+
+Carry these expectations into the next run:
+
+- constrained yes/no replies should be tested intentionally because they need special handling and are easy to miss if STT drifts
+- phrases intended to trigger known skills should be repeated using a small, documented wording set so we can separate routing issues from Whisper errors
+- provider-backed placeholder answers are still expected for weather, commute, calendar, news, and similar routes unless that feature path is explicitly implemented
+
+For STT during live testing:
+
+- prefer runs where `audioTranscriptHint` or other synthetic replay cues are available
+- do not assume local `whisper.cpp` success means the audio pipeline is stable overall
+- if many turns stay pending or `ffmpeg` rejects normalized Ogg files, treat that as a speech-pipeline issue first, not an intent-mapping issue
+- keep the Node server available as the comparison path for yes/no and audio-preprocessing behavior
+
 ## What To Do If The Test Fails

 If the robot does not connect or the first turn fails:
--- a/OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
+++ b/OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
@@ -0,0 +1,54 @@
+# Cloud Deploy And Jibo RCM Path Prompt
+
+Prepare OpenJibo for a lightweight v1 cloud deployment and the cleanest practical Jibo configuration path for group testing.
+
+Current repo context:
+
+- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
+- the current `.NET` cloud is the target runtime
+- the Node server remains a discovery oracle and fallback
+- latest live-test guidance is in:
+  - `docs/live-jibo-test-runbook.md`
+  - `docs/live-jibo-capture.md`
+  - `docs/device-bootstrap.md`
+  - `docs/development-plan.md`
+  - `src/Jibo.Cloud/dotnet/README.md`
+
+What we need from this workstream:
+
+1. define the smallest, cleanest, easiest-to-repeat deployment path for a v1 hosted OpenJibo cloud
+2. define the lightest reliable way to configure Jibo devices to use that cloud, with as few manual error-prone steps as possible
+3. produce scripts and docs that make it realistic for additional revival-group testers to get connected quickly
+
+Important goals:
+
+- prefer a path that is easy for non-experts in the revival group to follow
+- minimize hand-edited device changes and confusing setup steps
+- preserve a clear fallback path when a deployment or routing change fails
+- keep the deployment practical for a small testing cohort first; enterprise polish can come later
+
+Areas to review:
+
+- current API host and routing logic in `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Api/Program.cs`
+- existing scripts under:
+  - `scripts/cloud/`
+  - `scripts/bootstrap/`
+- docs around routing and bootstrap in:
+  - `docs/device-bootstrap.md`
+  - `docs/live-jibo-test-runbook.md`
+  - `docs/live-jibo-capture.md`
+
+Deliverables:
+
+- a concrete v1 deployment recommendation
+- any needed deployment scripts or setup helpers
+- a clean Jibo configuration / routing / RCM procedure with the fewest practical steps
+- validation steps that clearly distinguish cloud issues from robot/network issues
+- doc updates aimed at making group adoption fast and low-risk
+
+Constraints:
+
+- do not over-design for full production scale yet
+- avoid adding multiple competing deployment paths unless there is a strong reason
+- optimize for reliability, repeatability, and low support burden for the next round of testers
+- keep the Node oracle available as a troubleshooting fallback until `.NET` parity is clearly strong enough
--- a/OpenJibo/docs/prompts/stt-upgrade-path.md
+++ b/OpenJibo/docs/prompts/stt-upgrade-path.md
@@ -0,0 +1,47 @@
+# STT Upgrade Path Prompt
+
+Improve the OpenJibo `.NET` speech-to-text path for live robot testing.
+
+Current repo context:
+
+- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
+- current live captures from `2026-04-18` showed that some turns succeeded, but many buffered-audio turns failed before producing a usable transcript
+- the current local `.NET` STT path is in:
+  - `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/LocalWhisperCppBufferedAudioSttStrategy.cs`
+  - `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/OggOpusAudioNormalizer.cs`
+  - `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/WebSocketTurnFinalizationService.cs`
+  - `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/DefaultSttStrategySelector.cs`
+- Node remains the oracle for current behavior:
+  - `src/Jibo.Cloud/node/open-jibo-link.js`
+- live test evidence and guidance are documented in:
+  - `docs/development-plan.md`
+  - `docs/live-jibo-test-runbook.md`
+  - `src/Jibo.Cloud/dotnet/README.md`
+
+Observed problems to ground the work:
+
+- one captured run could not find `whisper-cli` at the configured rooted path
+- many buffered-audio turns failed because `ffmpeg` rejected the normalized Ogg output
+- we need a more reliable path for testing than the current partially working local whisper chain
+
+Goals:
+
+1. review the current `.NET` STT seam and compare it against the Node preprocessing flow
+2. recommend and implement the best next STT path for testing, preferring reliability and simplicity over novelty
+3. keep the STT integration behind the existing abstractions so we can swap providers later
+4. preserve or improve telemetry so failed turns clearly show whether the problem is decode, tool lookup, provider failure, or unusable transcript quality
+5. update tests and docs to match the chosen direction
+
+Constraints:
+
+- do not remove the synthetic transcript-hint path; it is still valuable for fixture replay and parity
+- do not assume Azure-hosted STT is automatically the answer unless the codebase and testing needs support that choice
+- prefer an implementation that is easy for other revival-group testers to run consistently
+- avoid large speculative architecture changes that are not needed for a near-term v1 testable cloud
+
+Deliverables:
+
+- code changes for the improved STT path
+- tests covering strategy selection, success, and failure handling
+- doc updates with exact setup guidance and a recommendation on whether local whisper remains optional, fallback-only, or deprecated for testing
+- a short summary of the tradeoffs and why the chosen path is the best next step