fixes for next round of testing

2026-04-15 14:33:43 -05:00
parent 3f0c17e424
commit 874e5a1637
8 changed files with 348 additions and 58 deletions
--- a/OpenJibo/docs/protocol-inventory.md
+++ b/OpenJibo/docs/protocol-inventory.md
@@ -66,6 +66,7 @@ The current .NET pass covers only a narrow, explicitly synthetic subset of obser
 - token/session tracking across websocket turns
 - explicit per-turn state tracking for transID, rules, context, buffered audio, and finalize attempts
 - buffered audio accounting and turn-pending state
+- auto-finalize triggering for raw audio once `LISTEN`, `CONTEXT`, and minimum buffered-audio thresholds are present
 - `LISTEN` message handling with synthetic `LISTEN` result payload shaping
 - `CONTEXT` capture for turn/session state
 - `CLIENT_NLU` turn completion using remembered listen/session metadata
@@ -81,6 +82,12 @@ This does not yet mean parity for:
 - multi-step skill lifecycles beyond the current synthetic playback response
 - broader interaction, animation, or ESML command families

+Current raw-audio fallback behavior remains explicitly synthetic:
+
+- when a buffered-audio turn can be resolved through the synthetic transcript-hint seam, `.NET` now auto-finalizes and emits `LISTEN` + `EOS` + `SKILL_ACTION`
+- when the turn crosses the finalize threshold without a usable transcript, `.NET` now emits a fallback `LISTEN` + `EOS` + generic `SKILL_ACTION` rather than leaving the robot hanging on an unfinished turn
+- that fallback is a compatibility measure inspired by the Node oracle, not a claim of real ASR understanding
+
 ### Internal ASR Direction

 The current .NET websocket layer now separates: