Add GLSM listener telemetry and stale-listen recovery
This commit is contained in:
@@ -88,6 +88,11 @@ Current websocket scope:
|
||||
- active local prompt preservation so `shared/yes_no`, clock, gallery, and settings prompts can still consume transcript-bearing short replies even when the stock skill reports a local context
|
||||
- binary audio ignored for an existing transID until a fresh `LISTEN` has been seen, preventing context-only or post-speech tails from reopening an endless buffered turn
|
||||
- blank-audio hotphrase turns clear pending listen state and install a short late-audio ignore window
|
||||
- first GLSM-aligned listener telemetry and recovery slice is now in source:
|
||||
- derived phase labels (`HJ_LISTENING`, `LISTENING`, `WAIT_LISTEN_FINISHED`, `DISPATCH_DIALOG`, `PROCESS_LISTENER_QUEUE`)
|
||||
- `glsm_phase_transition` turn diagnostics
|
||||
- websocket turn events with `glsmPhase` snapshots
|
||||
- stale pending-listen recovery for long-open no-context/no-audio listens before processing a new hotphrase listen
|
||||
- unknown inbound websocket types dropped silently instead of echoing stock-OS-unknown OpenJibo events
|
||||
- file telemetry and fixture export for HTTP, websocket, and turn captures
|
||||
|
||||
@@ -145,6 +150,7 @@ Use these sources as evidence, not as code to copy blindly:
|
||||
- User-provided original source snapshot: `..\jibo` when extracted locally
|
||||
- Original Pegasus cloud source inside that snapshot: `pegasus`
|
||||
- Original SDK and skill source inside that snapshot: `sdk`
|
||||
- Legacy listener flow reference diagram: `..\jibo\sdk\packages\skills-service-manager\resources\state-diagrams\glsm.png`
|
||||
- JiboOS reference tree: `..\JiboOS`
|
||||
- JiboOS skill snapshot: `..\JiboOS\opt\jibo\Jibo\Skills\@be`
|
||||
|
||||
|
||||
@@ -301,6 +301,20 @@ Current release theme:
|
||||
- Follow-up:
|
||||
- live smoke should confirm `cloud version` speaks `1.0.18`, carries `match.skipSurprises = true`, does not stop itself on the word `Jibo`, and settles without a generic `I heard...` reply or a local surprise handoff
|
||||
|
||||
### GLSM Listener Flow Capture And Recovery
|
||||
|
||||
- Status: `implemented`
|
||||
- Tags: `protocol`, `docs`
|
||||
- Result:
|
||||
- the legacy listener state machine source (`sdk ... glsm.png`) is now captured in current planning docs
|
||||
- runtime now emits GLSM-aligned phase snapshots (`HJ_LISTENING`, `LISTENING`, `WAIT_LISTEN_FINISHED`, `DISPATCH_DIALOG`, `PROCESS_LISTENER_QUEUE`)
|
||||
- turn diagnostics now include `glsm_phase_transition` for phase changes
|
||||
- websocket telemetry now records `glsmPhase` on binary/context/turn events
|
||||
- stale pending-listen recovery is now in source so a long-open no-context/no-audio listen can be cleared when the next hotphrase listen arrives
|
||||
- Follow-up:
|
||||
- live-capture proof is still required against the recurring blue-ring/stuck-listening sequence
|
||||
- deeper GLSM parity (`Interrupt Listeners`, launch/global parse branches) should be tackled after this first capture slice is validated on-device
|
||||
|
||||
### End-Of-Skill Surprise Suppression
|
||||
|
||||
- Status: `implemented`
|
||||
|
||||
@@ -119,7 +119,7 @@ Reference:
|
||||
|
||||
## Next Queued Task (`2026-05-06`)
|
||||
|
||||
Queued next `1.0.19` implementation task:
|
||||
Queued next `1.0.19` implementation task (now started):
|
||||
|
||||
- dialog parsing expansion and ambiguity guardrails
|
||||
|
||||
@@ -129,6 +129,12 @@ Execution focus:
|
||||
- reduce trigger-only captures that drop the rest of the utterance
|
||||
- preserve command-vs-question personality split and local skill payload compatibility
|
||||
- add focused tests for new phrase families and ambiguity boundaries
|
||||
- keep listener-state observability aligned with the legacy GLSM flow while phrase guardrails are added
|
||||
|
||||
First completed guardrail slice under this queue:
|
||||
|
||||
- GLSM listener flow capture + telemetry mapping
|
||||
- stale pending-listen recovery path for long-open no-context/no-audio listens
|
||||
|
||||
## Next Slices
|
||||
|
||||
|
||||
@@ -16,6 +16,7 @@ As-of date: `2026-05-06`
|
||||
|
||||
- Legacy system architecture: `C:\Projects\jibo\pegasus\resources\system_diagram.png`
|
||||
- Legacy generic skill scaffold: `C:\Projects\jibo\pegasus\packages\template-skill\docs\TemplateSkill.png`
|
||||
- Legacy listener state machine: `C:\Projects\jibo\sdk\packages\skills-service-manager\resources\state-diagrams\glsm.png`
|
||||
|
||||
## Template Skill Verdict
|
||||
|
||||
@@ -45,6 +46,30 @@ Conclusion: do not treat template-skill flow as a port target. Treat it as a sha
|
||||
| `Proactivity Catalog` | in-code candidate lists/weights | explicit catalog service with tuned weights and operator controls |
|
||||
| `Audio Logs` | file telemetry sinks in infrastructure telemetry | hosted indexed capture/retention for multi-operator analysis |
|
||||
|
||||
## GLSM Listener Flow Alignment (`2026-05-06`)
|
||||
|
||||
Captured source:
|
||||
|
||||
- `C:\Projects\jibo\sdk\packages\skills-service-manager\resources\state-diagrams\glsm.png`
|
||||
|
||||
First OpenJibo support slice (implemented):
|
||||
|
||||
- explicit derived listener phases are now emitted in cloud diagnostics:
|
||||
- `HJ_LISTENING`
|
||||
- `LISTENING`
|
||||
- `WAIT_LISTEN_FINISHED`
|
||||
- `DISPATCH_DIALOG`
|
||||
- `PROCESS_LISTENER_QUEUE`
|
||||
- turn telemetry now records `glsm_phase_transition` with previous/next state and trigger
|
||||
- websocket telemetry now includes `glsmPhase` on binary, context, and turn-processed events
|
||||
- stale pending-listen recovery is now implemented:
|
||||
- when a pending `LISTEN` stays open long enough with no context/audio, a new hotphrase listen can recover the stuck state before continuing
|
||||
|
||||
Current parity boundary:
|
||||
|
||||
- this slice focuses on listener lifecycle observability plus stuck-listen recovery
|
||||
- deeper explicit parity states from GLSM (`Interrupt Listeners`, `Handle Launch Parse`, `Handle Global Parse`, `Dispatch Dialog` sub-branches) are next candidates once this capture-driven slice is validated live
|
||||
|
||||
## Where We Were
|
||||
|
||||
Legacy cloud design was service-oriented around:
|
||||
|
||||
@@ -25,7 +25,8 @@ public sealed class JiboWebSocketService(
|
||||
var replies = await turnFinalizationService.HandleBinaryAudioAsync(session, envelope, cancellationToken);
|
||||
await telemetrySink.RecordTurnEventAsync(envelope, session, "binary_audio_received", new Dictionary<string, object?>
|
||||
{
|
||||
["bytes"] = envelope.Binary?.Length ?? 0
|
||||
["bytes"] = envelope.Binary?.Length ?? 0,
|
||||
["glsmPhase"] = WebSocketTurnFinalizationService.ResolveGlsmPhase(session)
|
||||
}, cancellationToken);
|
||||
return replies;
|
||||
}
|
||||
@@ -33,6 +34,8 @@ public sealed class JiboWebSocketService(
|
||||
var parsedType = ReadMessageType(envelope.Text);
|
||||
session.LastMessageType = parsedType;
|
||||
var containsInlineTurnPayload = parsedType == "LISTEN" && ContainsInlineTurnPayload(envelope.Text);
|
||||
var staleListenRecovered = false;
|
||||
var staleListenAgeMs = 0;
|
||||
if (parsedType == "LISTEN" &&
|
||||
!containsInlineTurnPayload &&
|
||||
WebSocketTurnFinalizationService.ShouldIgnoreLateListenSetup(session, envelope.Text))
|
||||
@@ -57,6 +60,19 @@ public sealed class JiboWebSocketService(
|
||||
return replies;
|
||||
}
|
||||
|
||||
if (parsedType == "LISTEN" &&
|
||||
!containsInlineTurnPayload &&
|
||||
WebSocketTurnFinalizationService.TryRecoverStalePendingListen(session, out staleListenAgeMs))
|
||||
{
|
||||
staleListenRecovered = true;
|
||||
await telemetrySink.RecordTurnEventAsync(envelope, session, "glsm_stale_listen_recovered", new Dictionary<string, object?>
|
||||
{
|
||||
["staleAgeMs"] = staleListenAgeMs,
|
||||
["transID"] = session.TurnState.TransId,
|
||||
["glsmPhase"] = WebSocketTurnFinalizationService.ResolveGlsmPhase(session)
|
||||
}, cancellationToken);
|
||||
}
|
||||
|
||||
WebSocketTurnFinalizationService.ObserveIncomingMessage(session, envelope.Text);
|
||||
|
||||
switch (parsedType)
|
||||
@@ -66,7 +82,8 @@ public sealed class JiboWebSocketService(
|
||||
var replies = await turnFinalizationService.HandleContextAsync(session, envelope, cancellationToken);
|
||||
await telemetrySink.RecordTurnEventAsync(envelope, session, "context_received", new Dictionary<string, object?>
|
||||
{
|
||||
["transID"] = session.TurnState.TransId
|
||||
["transID"] = session.TurnState.TransId,
|
||||
["glsmPhase"] = WebSocketTurnFinalizationService.ResolveGlsmPhase(session)
|
||||
}, cancellationToken);
|
||||
return replies;
|
||||
}
|
||||
@@ -80,7 +97,10 @@ public sealed class JiboWebSocketService(
|
||||
["messageType"] = parsedType,
|
||||
["replyCount"] = replies.Count,
|
||||
["transcript"] = session.LastTranscript,
|
||||
["intent"] = session.LastIntent
|
||||
["intent"] = session.LastIntent,
|
||||
["glsmPhase"] = WebSocketTurnFinalizationService.ResolveGlsmPhase(session),
|
||||
["staleListenRecovered"] = staleListenRecovered,
|
||||
["staleListenAgeMs"] = staleListenAgeMs
|
||||
}, cancellationToken);
|
||||
return replies;
|
||||
}
|
||||
@@ -92,7 +112,8 @@ public sealed class JiboWebSocketService(
|
||||
["messageType"] = parsedType,
|
||||
["replyCount"] = replies.Count,
|
||||
["transcript"] = session.LastTranscript,
|
||||
["intent"] = session.LastIntent
|
||||
["intent"] = session.LastIntent,
|
||||
["glsmPhase"] = WebSocketTurnFinalizationService.ResolveGlsmPhase(session)
|
||||
}, cancellationToken);
|
||||
return replies;
|
||||
}
|
||||
|
||||
@@ -14,9 +14,11 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
{
|
||||
private const int AutoFinalizeMinBufferedAudioBytes = 15000;
|
||||
private const int AutoFinalizeMinBufferedAudioChunks = 5;
|
||||
private const string GlsmPhaseMetadataKey = "glsmPhase";
|
||||
private static readonly TimeSpan AutoFinalizeMinTurnAge = TimeSpan.FromMilliseconds(1800);
|
||||
private static readonly TimeSpan AutoFinalizeMissingTranscriptFallbackAge = TimeSpan.FromMilliseconds(4200);
|
||||
private static readonly TimeSpan AutoFinalizeContinuationDeferralMaxAge = TimeSpan.FromMilliseconds(3600);
|
||||
private static readonly TimeSpan StaleListenSetupRecoveryAge = TimeSpan.FromSeconds(9);
|
||||
private const int AutoFinalizeContinuationDeferralMaxAttempts = 2;
|
||||
private static readonly HashSet<string> PegasusAffinityContinuationStems = new(StringComparer.Ordinal)
|
||||
{
|
||||
@@ -60,6 +62,8 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
CloudSession session,
|
||||
WebSocketMessageEnvelope envelope,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var turnState = session.TurnState;
|
||||
var ignoreLateAudio = ShouldIgnoreLateAudio(session);
|
||||
@@ -110,11 +114,18 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
|
||||
return [];
|
||||
}
|
||||
finally
|
||||
{
|
||||
await TrackGlsmPhaseAsync(session, envelope, "binary_audio", cancellationToken);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<IReadOnlyList<WebSocketReply>> HandleContextAsync(
|
||||
CloudSession session,
|
||||
WebSocketMessageEnvelope envelope,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var turnState = session.TurnState;
|
||||
turnState.SawContext = true;
|
||||
@@ -133,8 +144,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
turnState.AwaitingTurnCompletion = false;
|
||||
turnState.IgnoreAdditionalAudioUntilUtc = DateTimeOffset.UtcNow.Add(WebSocketTurnState.DefaultLateAudioIgnoreWindow);
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
ClearListenTracking(turnState);
|
||||
return [];
|
||||
}
|
||||
|
||||
@@ -145,6 +155,11 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
|
||||
return [];
|
||||
}
|
||||
finally
|
||||
{
|
||||
await TrackGlsmPhaseAsync(session, envelope, "context", cancellationToken);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<IReadOnlyList<WebSocketReply>> HandleTurnAsync(
|
||||
CloudSession session,
|
||||
@@ -167,8 +182,8 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
session.TurnState.IgnoreAdditionalAudioUntilUtc = DateTimeOffset.UtcNow.Add(WebSocketTurnState.DefaultLateAudioIgnoreWindow);
|
||||
session.FollowUpExpiresUtc = null;
|
||||
ResetBufferedAudio(session);
|
||||
session.TurnState.SawListen = false;
|
||||
session.TurnState.SawContext = false;
|
||||
ClearListenTracking(session.TurnState);
|
||||
UpdateGlsmPhaseMarker(session);
|
||||
return [.. ResponsePlanToSocketMessagesMapper.MapNoInputAndRedirectToSkill(
|
||||
session.TurnState.TransId ?? session.LastTransId ?? string.Empty,
|
||||
session.TurnState.ListenRules,
|
||||
@@ -181,6 +196,8 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
}
|
||||
|
||||
session.TurnState.AwaitingTurnCompletion = true;
|
||||
session.TurnState.ListenOpenedUtc ??= DateTimeOffset.UtcNow;
|
||||
UpdateGlsmPhaseMarker(session);
|
||||
return [];
|
||||
}
|
||||
|
||||
@@ -275,6 +292,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
string.Equals(type.GetString(), "LISTEN", StringComparison.OrdinalIgnoreCase))
|
||||
{
|
||||
turnState.SawListen = true;
|
||||
turnState.ListenOpenedUtc ??= DateTimeOffset.UtcNow;
|
||||
}
|
||||
|
||||
if (root.TryGetProperty("transID", out var transId) && transId.ValueKind == JsonValueKind.String)
|
||||
@@ -351,6 +369,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
turnState.TransId = transId;
|
||||
turnState.ContextPayload = null;
|
||||
turnState.AudioTranscriptHint = null;
|
||||
turnState.ListenOpenedUtc = null;
|
||||
turnState.LastSttError = null;
|
||||
turnState.LastSttErrorUtc = null;
|
||||
turnState.FirstAudioReceivedUtc = null;
|
||||
@@ -375,6 +394,8 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
string messageType,
|
||||
bool allowFallbackOnMissingTranscript,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
try
|
||||
{
|
||||
var turn = ProtocolToTurnContextMapper.MapListenMessage(envelope, session, messageType);
|
||||
var turnState = session.TurnState;
|
||||
@@ -402,8 +423,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
session.TurnState.IgnoreAdditionalAudioUntilUtc = DateTimeOffset.UtcNow.Add(WebSocketTurnState.DefaultLateAudioIgnoreWindow);
|
||||
session.FollowUpExpiresUtc = null;
|
||||
ResetBufferedAudio(session);
|
||||
session.TurnState.SawListen = false;
|
||||
session.TurnState.SawContext = false;
|
||||
ClearListenTracking(session.TurnState);
|
||||
return [];
|
||||
}
|
||||
|
||||
@@ -445,8 +465,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
turnState.IgnoreAdditionalAudioUntilUtc = DateTimeOffset.UtcNow.Add(WebSocketTurnState.DefaultLateAudioIgnoreWindow);
|
||||
session.FollowUpExpiresUtc = null;
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
ClearListenTracking(turnState);
|
||||
return [.. ResponsePlanToSocketMessagesMapper.MapNoInputAndRedirectToSkill(
|
||||
turnState.TransId ?? session.LastTransId ?? string.Empty,
|
||||
turnState.ListenRules,
|
||||
@@ -483,8 +502,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
var localRule = ReadPrimaryNoInputRule(finalizedTurn);
|
||||
var noInputReplies = BuildLocalNoInputReplies(session, turnState, localRule);
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
ClearListenTracking(turnState);
|
||||
return noInputReplies;
|
||||
}
|
||||
|
||||
@@ -545,8 +563,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
.Select(map => new WebSocketReply { Text = map.Text, DelayMs = map.DelayMs })
|
||||
.ToArray();
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
ClearListenTracking(turnState);
|
||||
return fallbackReplies;
|
||||
}
|
||||
case true when
|
||||
@@ -679,10 +696,14 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
}
|
||||
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
ClearListenTracking(turnState);
|
||||
return replies;
|
||||
}
|
||||
finally
|
||||
{
|
||||
await TrackGlsmPhaseAsync(session, envelope, $"finalize:{messageType}", cancellationToken);
|
||||
}
|
||||
}
|
||||
|
||||
private static bool ShouldAutoFinalize(CloudSession session)
|
||||
{
|
||||
@@ -708,6 +729,58 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
return ShouldIgnoreLateAudio(session) && IsHotphraseLaunchListenSetup(text);
|
||||
}
|
||||
|
||||
public static bool TryRecoverStalePendingListen(CloudSession session, out int staleAgeMs)
|
||||
{
|
||||
staleAgeMs = 0;
|
||||
var turnState = session.TurnState;
|
||||
if (!turnState.AwaitingTurnCompletion ||
|
||||
!turnState.SawListen ||
|
||||
turnState.SawContext ||
|
||||
turnState.BufferedAudioBytes > 0 ||
|
||||
!turnState.ListenOpenedUtc.HasValue)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
var age = DateTimeOffset.UtcNow - turnState.ListenOpenedUtc.Value;
|
||||
if (age < StaleListenSetupRecoveryAge)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
staleAgeMs = (int)age.TotalMilliseconds;
|
||||
turnState.AwaitingTurnCompletion = false;
|
||||
ResetBufferedAudio(session);
|
||||
ClearListenTracking(turnState);
|
||||
turnState.ListenHotphrase = false;
|
||||
turnState.HotphraseEmptyTurnCount = 0;
|
||||
UpdateGlsmPhaseMarker(session);
|
||||
return true;
|
||||
}
|
||||
|
||||
public static string ResolveGlsmPhase(CloudSession session)
|
||||
{
|
||||
var turnState = session.TurnState;
|
||||
if (!turnState.AwaitingTurnCompletion)
|
||||
{
|
||||
return session.FollowUpOpen ? "DISPATCH_DIALOG" : "PROCESS_LISTENER_QUEUE";
|
||||
}
|
||||
|
||||
if (turnState.SawListen && !turnState.SawContext && turnState.BufferedAudioBytes == 0)
|
||||
{
|
||||
return "HJ_LISTENING";
|
||||
}
|
||||
|
||||
if (turnState.SawListen && turnState.SawContext && turnState.BufferedAudioBytes == 0)
|
||||
{
|
||||
return "LISTENING";
|
||||
}
|
||||
|
||||
return turnState.BufferedAudioBytes > 0
|
||||
? "WAIT_LISTEN_FINISHED"
|
||||
: "LISTENING";
|
||||
}
|
||||
|
||||
private static TimeSpan ResolveLateAudioIgnoreWindow(ResponsePlan plan)
|
||||
{
|
||||
return string.Equals(plan.IntentName, "cloud_version", StringComparison.OrdinalIgnoreCase)
|
||||
@@ -1518,6 +1591,53 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
return PegasusAffinityContinuationStems.Contains(normalized);
|
||||
}
|
||||
|
||||
private static void ClearListenTracking(WebSocketTurnState turnState)
|
||||
{
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
turnState.ListenOpenedUtc = null;
|
||||
}
|
||||
|
||||
private static void UpdateGlsmPhaseMarker(CloudSession session)
|
||||
{
|
||||
session.Metadata[GlsmPhaseMetadataKey] = ResolveGlsmPhase(session);
|
||||
}
|
||||
|
||||
private async Task TrackGlsmPhaseAsync(
|
||||
CloudSession session,
|
||||
WebSocketMessageEnvelope envelope,
|
||||
string trigger,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var nextPhase = ResolveGlsmPhase(session);
|
||||
var previousPhase = session.Metadata.TryGetValue(GlsmPhaseMetadataKey, out var rawPhase)
|
||||
? rawPhase?.ToString()
|
||||
: null;
|
||||
session.Metadata[GlsmPhaseMetadataKey] = nextPhase;
|
||||
|
||||
if (string.Equals(previousPhase, nextPhase, StringComparison.OrdinalIgnoreCase))
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
await sink.RecordTurnDiagnosticAsync("glsm_phase_transition", BuildTurnDiagnosticSnapshot(session, envelope, new Dictionary<string, object?>
|
||||
{
|
||||
["trigger"] = trigger,
|
||||
["previousState"] = previousPhase,
|
||||
["state"] = nextPhase,
|
||||
["listenOpenedUtc"] = session.TurnState.ListenOpenedUtc,
|
||||
["followUpOpen"] = session.FollowUpOpen,
|
||||
["listenRules"] = session.TurnState.ListenRules
|
||||
}), cancellationToken);
|
||||
}
|
||||
catch
|
||||
{
|
||||
// Diagnostics should not interrupt turn handling.
|
||||
}
|
||||
}
|
||||
|
||||
private static Dictionary<string, object?> BuildTurnDiagnosticSnapshot(
|
||||
CloudSession session,
|
||||
WebSocketMessageEnvelope envelope,
|
||||
@@ -1534,6 +1654,7 @@ public sealed partial class WebSocketTurnFinalizationService(
|
||||
details["bufferedAudioChunks"] = session.TurnState.BufferedAudioChunkCount;
|
||||
details["sawListen"] = session.TurnState.SawListen;
|
||||
details["sawContext"] = session.TurnState.SawContext;
|
||||
details["glsmState"] = ResolveGlsmPhase(session);
|
||||
return details;
|
||||
}
|
||||
|
||||
|
||||
@@ -7,6 +7,7 @@ public sealed class WebSocketTurnState
|
||||
|
||||
public string? TransId { get; set; }
|
||||
public string? ContextPayload { get; set; }
|
||||
public DateTimeOffset? ListenOpenedUtc { get; set; }
|
||||
public bool ListenHotphrase { get; set; }
|
||||
public int HotphraseEmptyTurnCount { get; set; }
|
||||
public DateTimeOffset? IgnoreAdditionalAudioUntilUtc { get; set; }
|
||||
|
||||
@@ -101,4 +101,49 @@ public sealed class FileTurnTelemetrySinkTests
|
||||
s => s.RecordTranscriptError(It.IsAny<Exception>(), It.IsAny<string>(), It.IsAny<CancellationToken>()),
|
||||
Times.Once());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task HandleContext_EmitsGlsmPhaseTransitionDiagnostic()
|
||||
{
|
||||
var sink = new Mock<ITurnTelemetrySink>();
|
||||
sink.Setup(s => s.RecordTurnDiagnosticAsync(It.IsAny<string>(), It.IsAny<IReadOnlyDictionary<string, object?>>(), It.IsAny<CancellationToken>()))
|
||||
.Returns(Task.CompletedTask);
|
||||
var turnService = new WebSocketTurnFinalizationService(
|
||||
Mock.Of<IConversationBroker>(),
|
||||
Mock.Of<ISttStrategySelector>(),
|
||||
sink.Object);
|
||||
|
||||
var session = new CloudSession
|
||||
{
|
||||
Token = "glsm-phase-token",
|
||||
TurnState =
|
||||
{
|
||||
TransId = "trans-glsm",
|
||||
AwaitingTurnCompletion = true,
|
||||
SawListen = true,
|
||||
ListenOpenedUtc = DateTimeOffset.UtcNow - TimeSpan.FromSeconds(1)
|
||||
}
|
||||
};
|
||||
session.Metadata["glsmPhase"] = "HJ_LISTENING";
|
||||
|
||||
await turnService.HandleContextAsync(
|
||||
session,
|
||||
new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Text = """{"type":"CONTEXT","transID":"trans-glsm","data":{"topic":"conversation"}}"""
|
||||
},
|
||||
CancellationToken.None);
|
||||
|
||||
sink.Verify(
|
||||
s => s.RecordTurnDiagnosticAsync(
|
||||
"glsm_phase_transition",
|
||||
It.Is<IReadOnlyDictionary<string, object?>>(details =>
|
||||
details.ContainsKey("state") &&
|
||||
string.Equals(details["state"] == null ? null : details["state"]!.ToString(), "LISTENING", StringComparison.OrdinalIgnoreCase)),
|
||||
It.IsAny<CancellationToken>()),
|
||||
Times.AtLeastOnce());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2523,6 +2523,47 @@ public sealed class JiboWebSocketServiceTests
|
||||
Assert.Null(session.LastIntent);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task StaleListenSetup_IsRecoveredWhenNextHotphraseListenArrives()
|
||||
{
|
||||
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-stale-listen-token",
|
||||
Text = """{"type":"LISTEN","transID":"trans-stale-listen","data":{"hotphrase":true,"rules":["launch","globals/global_commands_launch"]}}"""
|
||||
});
|
||||
|
||||
var session = _store.FindSessionByToken("hub-stale-listen-token");
|
||||
Assert.NotNull(session);
|
||||
session.TurnState.ListenOpenedUtc = DateTimeOffset.UtcNow - TimeSpan.FromSeconds(12);
|
||||
session.TurnState.AwaitingTurnCompletion = true;
|
||||
session.TurnState.SawListen = true;
|
||||
session.TurnState.SawContext = false;
|
||||
session.TurnState.BufferedAudioBytes = 0;
|
||||
session.TurnState.BufferedAudioChunkCount = 0;
|
||||
session.TurnState.HotphraseEmptyTurnCount = 2;
|
||||
|
||||
var replies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-stale-listen-token",
|
||||
Text = """{"type":"LISTEN","transID":"trans-stale-listen","data":{"hotphrase":true,"rules":["launch","globals/global_commands_launch"]}}"""
|
||||
});
|
||||
|
||||
Assert.Empty(replies);
|
||||
Assert.True(session.TurnState.AwaitingTurnCompletion);
|
||||
Assert.True(session.TurnState.SawListen);
|
||||
Assert.False(session.TurnState.SawContext);
|
||||
Assert.Equal(0, session.TurnState.BufferedAudioBytes);
|
||||
Assert.Equal(0, session.TurnState.BufferedAudioChunkCount);
|
||||
Assert.Equal(0, session.TurnState.HotphraseEmptyTurnCount);
|
||||
Assert.True(session.TurnState.ListenOpenedUtc > DateTimeOffset.UtcNow - TimeSpan.FromSeconds(3));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BinaryAudio_AfterWordOfDayRightWordListen_IsIgnoredDuringCleanupWindow()
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user