Files
JiboExperiments/OpenJibo/docs/prompts/stt-upgrade-path.md
2026-04-18 16:29:27 -05:00

48 lines
2.6 KiB
Markdown

# STT Upgrade Path Prompt
Improve the OpenJibo `.NET` speech-to-text path for live robot testing.
Current repo context:
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
- current live captures from `2026-04-18` showed that some turns succeeded, but many buffered-audio turns failed before producing a usable transcript
- the current local `.NET` STT path is in:
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/LocalWhisperCppBufferedAudioSttStrategy.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/OggOpusAudioNormalizer.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/WebSocketTurnFinalizationService.cs`
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/DefaultSttStrategySelector.cs`
- Node remains the oracle for current behavior:
- `src/Jibo.Cloud/node/open-jibo-link.js`
- live test evidence and guidance are documented in:
- `docs/development-plan.md`
- `docs/live-jibo-test-runbook.md`
- `src/Jibo.Cloud/dotnet/README.md`
Observed problems to ground the work:
- one captured run could not find `whisper-cli` at the configured rooted path
- many buffered-audio turns failed because `ffmpeg` rejected the normalized Ogg output
- we need a more reliable path for testing than the current partially working local whisper chain
Goals:
1. review the current `.NET` STT seam and compare it against the Node preprocessing flow
2. recommend and implement the best next STT path for testing, preferring reliability and simplicity over novelty
3. keep the STT integration behind the existing abstractions so we can swap providers later
4. preserve or improve telemetry so failed turns clearly show whether the problem is decode, tool lookup, provider failure, or unusable transcript quality
5. update tests and docs to match the chosen direction
Constraints:
- do not remove the synthetic transcript-hint path; it is still valuable for fixture replay and parity
- do not assume Azure-hosted STT is automatically the answer unless the codebase and testing needs support that choice
- prefer an implementation that is easy for other revival-group testers to run consistently
- avoid large speculative architecture changes that are not needed for a near-term v1 testable cloud
Deliverables:
- code changes for the improved STT path
- tests covering strategy selection, success, and failure handling
- doc updates with exact setup guidance and a recommendation on whether local whisper remains optional, fallback-only, or deprecated for testing
- a short summary of the tradeoffs and why the chosen path is the best next step