enhanced skill and yes/no routing
This commit is contained in:
@@ -69,6 +69,27 @@ Near-term ASR work should stay staged:
|
||||
|
||||
That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
|
||||
|
||||
## Latest Capture Findings
|
||||
|
||||
The latest live test round tightened up three priorities:
|
||||
|
||||
- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
|
||||
- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
|
||||
- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
|
||||
|
||||
Evidence from the latest `2026-04-18` captures:
|
||||
|
||||
- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
|
||||
- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
|
||||
- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
|
||||
|
||||
Near-term interaction work should now prioritize:
|
||||
|
||||
1. preserve and interpret yes/no turn constraints from observed listen rules
|
||||
2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
|
||||
3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
|
||||
4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
|
||||
|
||||
## Working Cloud Framework
|
||||
|
||||
The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
|
||||
|
||||
@@ -130,6 +130,23 @@ python3 ./scripts/cloud/import-websocket-capture-fixture.py \
|
||||
- whether EOS timing matched expectations
|
||||
- whether any unexpected message families appeared
|
||||
|
||||
## Latest Test Notes To Carry Forward
|
||||
|
||||
The most recent live round showed that startup and some Q-and-A paths are progressing, but audio-turn reliability is still uneven.
|
||||
|
||||
Carry these expectations into the next run:
|
||||
|
||||
- constrained yes/no replies should be tested intentionally because they need special handling and are easy to miss if STT drifts
|
||||
- phrases intended to trigger known skills should be repeated using a small, documented wording set so we can separate routing issues from Whisper errors
|
||||
- provider-backed placeholder answers are still expected for weather, commute, calendar, news, and similar routes unless that feature path is explicitly implemented
|
||||
|
||||
For STT during live testing:
|
||||
|
||||
- prefer runs where `audioTranscriptHint` or other synthetic replay cues are available
|
||||
- do not assume local `whisper.cpp` success means the audio pipeline is stable overall
|
||||
- if many turns stay pending or `ffmpeg` rejects normalized Ogg files, treat that as a speech-pipeline issue first, not an intent-mapping issue
|
||||
- keep the Node server available as the comparison path for yes/no and audio-preprocessing behavior
|
||||
|
||||
## What To Do If The Test Fails
|
||||
|
||||
If the robot does not connect or the first turn fails:
|
||||
|
||||
54
OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
Normal file
54
OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Cloud Deploy And Jibo RCM Path Prompt
|
||||
|
||||
Prepare OpenJibo for a lightweight v1 cloud deployment and the cleanest practical Jibo configuration path for group testing.
|
||||
|
||||
Current repo context:
|
||||
|
||||
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
|
||||
- the current `.NET` cloud is the target runtime
|
||||
- the Node server remains a discovery oracle and fallback
|
||||
- latest live-test guidance is in:
|
||||
- `docs/live-jibo-test-runbook.md`
|
||||
- `docs/live-jibo-capture.md`
|
||||
- `docs/device-bootstrap.md`
|
||||
- `docs/development-plan.md`
|
||||
- `src/Jibo.Cloud/dotnet/README.md`
|
||||
|
||||
What we need from this workstream:
|
||||
|
||||
1. define the smallest, cleanest, easiest-to-repeat deployment path for a v1 hosted OpenJibo cloud
|
||||
2. define the lightest reliable way to configure Jibo devices to use that cloud, with as few manual error-prone steps as possible
|
||||
3. produce scripts and docs that make it realistic for additional revival-group testers to get connected quickly
|
||||
|
||||
Important goals:
|
||||
|
||||
- prefer a path that is easy for non-experts in the revival group to follow
|
||||
- minimize hand-edited device changes and confusing setup steps
|
||||
- preserve a clear fallback path when a deployment or routing change fails
|
||||
- keep the deployment practical for a small testing cohort first; enterprise polish can come later
|
||||
|
||||
Areas to review:
|
||||
|
||||
- current API host and routing logic in `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Api/Program.cs`
|
||||
- existing scripts under:
|
||||
- `scripts/cloud/`
|
||||
- `scripts/bootstrap/`
|
||||
- docs around routing and bootstrap in:
|
||||
- `docs/device-bootstrap.md`
|
||||
- `docs/live-jibo-test-runbook.md`
|
||||
- `docs/live-jibo-capture.md`
|
||||
|
||||
Deliverables:
|
||||
|
||||
- a concrete v1 deployment recommendation
|
||||
- any needed deployment scripts or setup helpers
|
||||
- a clean Jibo configuration / routing / RCM procedure with the fewest practical steps
|
||||
- validation steps that clearly distinguish cloud issues from robot/network issues
|
||||
- doc updates aimed at making group adoption fast and low-risk
|
||||
|
||||
Constraints:
|
||||
|
||||
- do not over-design for full production scale yet
|
||||
- avoid adding multiple competing deployment paths unless there is a strong reason
|
||||
- optimize for reliability, repeatability, and low support burden for the next round of testers
|
||||
- keep the Node oracle available as a troubleshooting fallback until `.NET` parity is clearly strong enough
|
||||
47
OpenJibo/docs/prompts/stt-upgrade-path.md
Normal file
47
OpenJibo/docs/prompts/stt-upgrade-path.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# STT Upgrade Path Prompt
|
||||
|
||||
Improve the OpenJibo `.NET` speech-to-text path for live robot testing.
|
||||
|
||||
Current repo context:
|
||||
|
||||
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
|
||||
- current live captures from `2026-04-18` showed that some turns succeeded, but many buffered-audio turns failed before producing a usable transcript
|
||||
- the current local `.NET` STT path is in:
|
||||
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/LocalWhisperCppBufferedAudioSttStrategy.cs`
|
||||
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/OggOpusAudioNormalizer.cs`
|
||||
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/WebSocketTurnFinalizationService.cs`
|
||||
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/DefaultSttStrategySelector.cs`
|
||||
- Node remains the oracle for current behavior:
|
||||
- `src/Jibo.Cloud/node/open-jibo-link.js`
|
||||
- live test evidence and guidance are documented in:
|
||||
- `docs/development-plan.md`
|
||||
- `docs/live-jibo-test-runbook.md`
|
||||
- `src/Jibo.Cloud/dotnet/README.md`
|
||||
|
||||
Observed problems to ground the work:
|
||||
|
||||
- one captured run could not find `whisper-cli` at the configured rooted path
|
||||
- many buffered-audio turns failed because `ffmpeg` rejected the normalized Ogg output
|
||||
- we need a more reliable path for testing than the current partially working local whisper chain
|
||||
|
||||
Goals:
|
||||
|
||||
1. review the current `.NET` STT seam and compare it against the Node preprocessing flow
|
||||
2. recommend and implement the best next STT path for testing, preferring reliability and simplicity over novelty
|
||||
3. keep the STT integration behind the existing abstractions so we can swap providers later
|
||||
4. preserve or improve telemetry so failed turns clearly show whether the problem is decode, tool lookup, provider failure, or unusable transcript quality
|
||||
5. update tests and docs to match the chosen direction
|
||||
|
||||
Constraints:
|
||||
|
||||
- do not remove the synthetic transcript-hint path; it is still valuable for fixture replay and parity
|
||||
- do not assume Azure-hosted STT is automatically the answer unless the codebase and testing needs support that choice
|
||||
- prefer an implementation that is easy for other revival-group testers to run consistently
|
||||
- avoid large speculative architecture changes that are not needed for a near-term v1 testable cloud
|
||||
|
||||
Deliverables:
|
||||
|
||||
- code changes for the improved STT path
|
||||
- tests covering strategy selection, success, and failure handling
|
||||
- doc updates with exact setup guidance and a recommendation on whether local whisper remains optional, fallback-only, or deprecated for testing
|
||||
- a short summary of the tradeoffs and why the chosen path is the best next step
|
||||
Reference in New Issue
Block a user