enhanced skill and yes/no routing

This commit is contained in:
Jacob Dubin
2026-04-18 16:29:27 -05:00
parent faf021eb89
commit 83a9350a9d
13 changed files with 455 additions and 29 deletions

View File

@@ -69,6 +69,27 @@ Near-term ASR work should stay staged:
That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
## Latest Capture Findings
The latest live test round tightened up three priorities:
- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
Evidence from the latest `2026-04-18` captures:
- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
Near-term interaction work should now prioritize:
1. preserve and interpret yes/no turn constraints from observed listen rules
2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
## Working Cloud Framework
The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:

View File

@@ -130,6 +130,23 @@ python3 ./scripts/cloud/import-websocket-capture-fixture.py \
- whether EOS timing matched expectations
- whether any unexpected message families appeared
## Latest Test Notes To Carry Forward
The most recent live round showed that startup and some Q-and-A paths are progressing, but audio-turn reliability is still uneven.
Carry these expectations into the next run:
- constrained yes/no replies should be tested intentionally because they need special handling and are easy to miss if STT drifts
- phrases intended to trigger known skills should be repeated using a small, documented wording set so we can separate routing issues from Whisper errors
- provider-backed placeholder answers are still expected for weather, commute, calendar, news, and similar routes unless that feature path is explicitly implemented
For STT during live testing:
- prefer runs where `audioTranscriptHint` or other synthetic replay cues are available
- do not assume local `whisper.cpp` success means the audio pipeline is stable overall
- if many turns stay pending or `ffmpeg` rejects normalized Ogg files, treat that as a speech-pipeline issue first, not an intent-mapping issue
- keep the Node server available as the comparison path for yes/no and audio-preprocessing behavior
## What To Do If The Test Fails
If the robot does not connect or the first turn fails:

View File

@@ -0,0 +1,54 @@
# Cloud Deploy And Jibo RCM Path Prompt
Prepare OpenJibo for a lightweight v1 cloud deployment and the cleanest practical Jibo configuration path for group testing.
Current repo context:
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
- the current `.NET` cloud is the target runtime
- the Node server remains a discovery oracle and fallback
- latest live-test guidance is in:
- `docs/live-jibo-test-runbook.md`
- `docs/live-jibo-capture.md`
- `docs/device-bootstrap.md`
- `docs/development-plan.md`
- `src/Jibo.Cloud/dotnet/README.md`
What we need from this workstream:
1. define the smallest, cleanest, easiest-to-repeat deployment path for a v1 hosted OpenJibo cloud
2. define the lightest reliable way to configure Jibo devices to use that cloud, with as few manual error-prone steps as possible
3. produce scripts and docs that make it realistic for additional revival-group testers to get connected quickly
Important goals:
- prefer a path that is easy for non-experts in the revival group to follow
- minimize hand-edited device changes and confusing setup steps
- preserve a clear fallback path when a deployment or routing change fails
- keep the deployment practical for a small testing cohort first; enterprise polish can come later
Areas to review:
- current API host and routing logic in `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Api/Program.cs`
- existing scripts under:
- `scripts/cloud/`
- `scripts/bootstrap/`
- docs around routing and bootstrap in:
- `docs/device-bootstrap.md`
- `docs/live-jibo-test-runbook.md`
- `docs/live-jibo-capture.md`
Deliverables:
- a concrete v1 deployment recommendation
- any needed deployment scripts or setup helpers
- a clean Jibo configuration / routing / RCM procedure with the fewest practical steps
- validation steps that clearly distinguish cloud issues from robot/network issues
- doc updates aimed at making group adoption fast and low-risk
Constraints:
- do not over-design for full production scale yet
- avoid adding multiple competing deployment paths unless there is a strong reason
- optimize for reliability, repeatability, and low support burden for the next round of testers
- keep the Node oracle available as a troubleshooting fallback until `.NET` parity is clearly strong enough

View File

@@ -0,0 +1,47 @@
# STT Upgrade Path Prompt
Improve the OpenJibo `.NET` speech-to-text path for live robot testing.
Current repo context:
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
- current live captures from `2026-04-18` showed that some turns succeeded, but many buffered-audio turns failed before producing a usable transcript
- the current local `.NET` STT path is in:
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/LocalWhisperCppBufferedAudioSttStrategy.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/OggOpusAudioNormalizer.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/WebSocketTurnFinalizationService.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/DefaultSttStrategySelector.cs`
- Node remains the oracle for current behavior:
- `src/Jibo.Cloud/node/open-jibo-link.js`
- live test evidence and guidance are documented in:
- `docs/development-plan.md`
- `docs/live-jibo-test-runbook.md`
- `src/Jibo.Cloud/dotnet/README.md`
Observed problems to ground the work:
- one captured run could not find `whisper-cli` at the configured rooted path
- many buffered-audio turns failed because `ffmpeg` rejected the normalized Ogg output
- we need a more reliable path for testing than the current partially working local whisper chain
Goals:
1. review the current `.NET` STT seam and compare it against the Node preprocessing flow
2. recommend and implement the best next STT path for testing, preferring reliability and simplicity over novelty
3. keep the STT integration behind the existing abstractions so we can swap providers later
4. preserve or improve telemetry so failed turns clearly show whether the problem is decode, tool lookup, provider failure, or unusable transcript quality
5. update tests and docs to match the chosen direction
Constraints:
- do not remove the synthetic transcript-hint path; it is still valuable for fixture replay and parity
- do not assume Azure-hosted STT is automatically the answer unless the codebase and testing needs support that choice
- prefer an implementation that is easy for other revival-group testers to run consistently
- avoid large speculative architecture changes that are not needed for a near-term v1 testable cloud
Deliverables:
- code changes for the improved STT path
- tests covering strategy selection, success, and failure handling
- doc updates with exact setup guidance and a recommendation on whether local whisper remains optional, fallback-only, or deprecated for testing
- a short summary of the tradeoffs and why the chosen path is the best next step