version 18 fixes
This commit is contained in:
@@ -19,6 +19,19 @@ Day-to-day feature sequencing lives in [feature-backlog.md](feature-backlog.md).
|
||||
|
||||
Release `1.0.18` is now in feature-hardening. Its main bug-fix theme is alarm and photo/gallery behavior on stock OS `1.9`, with a few small feature slices added while the test loop is warm.
|
||||
|
||||
## Latest Live Evidence
|
||||
|
||||
`jibo test 22` was captured against a robot that spoke `Open Jibo Cloud version 1 dot 0 dot 18.`
|
||||
|
||||
- Radio live validation passed.
|
||||
- News routing was observed in websocket telemetry from the phrase `So, play the news.`, but the user did not get enough live confidence to call news complete because a backup notification/slowness path was active during the session.
|
||||
- The backup notification came from stock `@be/surprises-ota` checking `jibo.scheduler.backupStatus`; no `Backup_*` HTTP operation appeared in the captured cloud traffic. The update-menu block therefore looks more like a robot-local scheduler/backup load issue than a cloud `Backup.List` response issue.
|
||||
- The robot log showed high load, a `jibo-server-service` broken pipe, a settings error path for `Q4-Server_connection_lost`, and the stock backup prompt: `hey i'm sorry if I seem a little slow, I can be that way while i'm doing a backup.`
|
||||
- Photo/gallery reached the local gallery/create path, but a missed short reply left repeated `create/is_it_a_keeper` listens and the visible blue-ring/listening state.
|
||||
- Alarm attempts were dominated by collapsed transcripts such as `set and alarm`, `Set and Alonzo`, and `Set an alarm for...`; one path reached local alarm clarification but did not get a complete value-setting pass.
|
||||
- The turn telemetry contained `ffmpeg` failures where local whisper tried to decode buffered Ogg/Opus turns that were not usable by `ffmpeg`.
|
||||
- The websocket capture still contained `OPENJIBO_TURN_PENDING`, `OPENJIBO_CONTEXT_ACK`, and proactive `OPENJIBO_ACK` output in the deployed run. The current source has no references to those synthetic OpenJibo events, so the next deployment needs an artifact/build verification pass.
|
||||
|
||||
## Release Rhythm
|
||||
|
||||
This is the working pattern for each hosted-cloud release:
|
||||
@@ -58,6 +71,7 @@ Current websocket scope:
|
||||
- buffered Ogg/Opus audio preservation per turn
|
||||
- synthetic transcript hint support for fixture-driven parity
|
||||
- opt-in local `ffmpeg` plus `whisper.cpp` STT path for discovery
|
||||
- local whisper only attempts external decoding when buffered audio contains an Opus identification header
|
||||
- auto-finalize thresholds for buffered audio after a real listen phase
|
||||
- late-audio ignore windows after completed turns
|
||||
- no-input local completion for constrained prompts
|
||||
@@ -89,7 +103,9 @@ The following behavior is present in source and covered by focused tests:
|
||||
- media metadata persists across store recreation and `/media/{path}` can serve the current text-body placeholder payload
|
||||
- constrained yes/no handling covers `create/is_it_a_keeper`, `shared/yes_no`, `settings/download_now_later`, `surprises-date/offer_date_fact`, `surprises-ota/want_to_download_now`, and `$YESNO` hints
|
||||
- outbound constrained yes/no responses strip unrelated `globals/*` rules so stock OS stays local
|
||||
- no-input fallback for constrained yes/no prompts emits local `LISTEN`/`EOS` instead of relaunching generic Nimbus speech
|
||||
- no-input fallback for constrained yes/no prompts emits local `LISTEN`/`EOS` instead of relaunching generic Nimbus speech, including `shared/yes_no` after STT failure
|
||||
- repeated empty `create/is_it_a_keeper` replies redirect to `@be/idle` after the second miss so the photo/create flow can settle instead of leaving a stale listening state
|
||||
- local whisper skips buffered audio turns that do not contain `OpusHead`, preventing a known `ffmpeg` failure path from becoming the noisy failure mode
|
||||
- Word of the Day launch, spoken guesses, structured `CLIENT_NLU` guesses, hint-order guesses, fuzzy hint matching, right-word cleanup, and late audio cleanup are covered in the websocket layer
|
||||
|
||||
## Reference Sources
|
||||
@@ -116,12 +132,13 @@ Before calling `1.0.18` complete, prove or explicitly defer these:
|
||||
|
||||
- Run the focused `.NET` cloud test suite after the last feature slice.
|
||||
- Confirm the running robot build reports cloud version `1.0.18`.
|
||||
- Regression test alarm flows: set with explicit time, set with compact/spoken time, clarify missing time, cancel alarm, and local cleanup prompts.
|
||||
- Regression test photo/gallery flows: open gallery, answer the stock `shared/yes_no` prompt, hand into create, take one photo, and avoid blue-ring stale turns.
|
||||
- Live-test radio launch: `open the radio` and `play country music`.
|
||||
- Live-test first news path: `tell me the news` should use the Nimbus cloud-skill lane instead of generic chat.
|
||||
- Regression test alarm flows again after the `jibo test 22` fixes: set with explicit time, set with compact/spoken time, clarify missing time, cancel alarm, and local cleanup prompts.
|
||||
- Regression test photo/gallery flows again after the `jibo test 22` fixes: open gallery, answer the stock `shared/yes_no` prompt, hand into create, take one photo, and avoid blue-ring stale turns.
|
||||
- Live-test radio launch: `open the radio` passed in `jibo test 22`; re-run `play country music` if that exact phrase was not captured.
|
||||
- Live-test first news path again: `jibo test 22` reached the news intent, but the live behavior still needs a clean non-backup session.
|
||||
- Recheck constrained yes/no prompts for update/backup/share/gallery without leaking global rules.
|
||||
- Recheck that stock OS no longer logs OpenJibo-only websocket events such as synthetic pending/context/ack packets from the current build.
|
||||
- Recheck backup/update behavior with explicit attention to robot-local `jibo.scheduler.backupStatus`, CPU/load, and whether the deployed cloud is involved at all.
|
||||
- Treat remaining `ffmpeg` / `whisper.cpp` transcript failures as STT work unless the capture proves a separate turn-routing regression.
|
||||
|
||||
## Known Gaps
|
||||
@@ -131,7 +148,8 @@ These are not blockers for calling `1.0.18` complete unless the live test shows
|
||||
- local `whisper.cpp` STT remains a discovery seam, not production ASR
|
||||
- media upload/body handling is not binary-safe enough for final gallery originals and thumbnails
|
||||
- state persistence is local JSON, not Azure SQL / Blob Storage
|
||||
- update, backup, and restore are not end-to-end proven
|
||||
- update, backup, and restore are not end-to-end proven, and the `jibo test 22` sluggishness appears tied to robot-local backup status/load
|
||||
- deployed-build verification needs to prove that synthetic OpenJibo websocket events are gone from the hosted artifact, not just from source
|
||||
- news content is synthetic
|
||||
- weather, calendar, commute, personal report, identity, memory, and proactivity are still mostly discovery or placeholder content paths
|
||||
- volume, stop, robot age, and command-versus-question personality routing are not implemented yet
|
||||
|
||||
@@ -37,6 +37,7 @@ Current release theme:
|
||||
- alarm and photo/gallery quirks have received the main bug-fix attention
|
||||
- Word of the Day cleanup, constrained yes/no routing, unknown websocket event suppression, and local state persistence are already in the current code
|
||||
- radio, ESML apostrophe cleanup, and first news are implemented in source/tests and need live confidence before the version is called complete
|
||||
- `jibo test 22` validated radio, exposed backup/load interference, exposed a shared yes/no no-input gap, exposed repeated create keeper prompts after photo handoff, and showed local whisper `ffmpeg` failures on unusable buffered audio
|
||||
|
||||
## Immediate `1.0.18` Queue
|
||||
|
||||
@@ -52,6 +53,7 @@ Current release theme:
|
||||
- Evidence:
|
||||
- JiboOS `@be/radio` treats `menu` as a play launch and reads `result.nlu.entities.station`
|
||||
- `Country` is a supported station key in the inspected genre metadata
|
||||
- `jibo test 22` radio live validation passed
|
||||
- Exit criteria:
|
||||
- live `open the radio` resumes or opens radio without generic chat speech
|
||||
- live `play country music` opens a country station
|
||||
@@ -70,6 +72,7 @@ Current release theme:
|
||||
- `SKILL_ACTION` uses skill id `news` and `mim_id = runtime-news`
|
||||
- Evidence:
|
||||
- JiboOS Nimbus checks `match.cloudSkill === "news"` and waits for a cloud response
|
||||
- `jibo test 22` captured the phrase `So, play the news.` reaching the `news` intent, but live behavior was not cleanly confirmed
|
||||
- Exit criteria:
|
||||
- live `tell me the news` reaches a non-placeholder Nimbus path
|
||||
- the robot behavior feels like a cloud skill response, not generic chat playback
|
||||
@@ -86,15 +89,22 @@ Current release theme:
|
||||
- covered prompt families include `settings/download_now_later`, `surprises-ota/want_to_download_now`, `surprises-date/offer_date_fact`, `shared/yes_no`, and `create/is_it_a_keeper`
|
||||
- outbound replies strip global rules and keep the local rule
|
||||
- no-input fallback for constrained prompts emits local `LISTEN`/`EOS`
|
||||
- `shared/yes_no` now participates in the STT-failure no-input path instead of staying pending behind `$YESNO` hints
|
||||
- repeated empty `create/is_it_a_keeper` replies redirect to `@be/idle` after the second miss
|
||||
- Latest evidence:
|
||||
- `jibo test 22` did not show `Backup_*` HTTP traffic during the backup complaint
|
||||
- stock `@be/surprises-ota` drives the backup notification from robot-local `jibo.scheduler.backupStatus`
|
||||
- a spoken `take a backup` command currently routes as generic chat and is not the same as proving the local backup scheduler path
|
||||
- Exit criteria:
|
||||
- spoken `yes` and `no` work on update, backup, share/offer, and gallery/create prompts
|
||||
- empty or missed short replies retry locally instead of relaunching Nimbus or generic chat
|
||||
- Next action:
|
||||
- include these prompt families in the `1.0.18` live regression pass
|
||||
- re-run these prompt families in the `1.0.18` live regression pass after the shared yes/no and create no-input fixes
|
||||
- keep explicit backup creation as part of the update/backup/restore proof slice, not as an assumed yes/no prompt test
|
||||
|
||||
### 4. Alarm And Photo Gallery Release Regression
|
||||
|
||||
- Status: `ready`
|
||||
- Status: `polish`
|
||||
- Tags: `protocol`, `stt`
|
||||
- Why now: this is the main bug-fix theme for `1.0.18`.
|
||||
- Current code:
|
||||
@@ -103,12 +113,17 @@ Current release theme:
|
||||
- alarm cancel can reuse the last active clock domain
|
||||
- gallery opens as `@be/gallery`; snapshot and photobooth open through `@be/create`
|
||||
- passive gallery/create context no longer reopens stale cloud turns
|
||||
- `shared/yes_no` no-input fallback and repeated create keeper cleanup were added after `jibo test 22`
|
||||
- Latest evidence:
|
||||
- gallery opened and handed into create, but repeated `create/is_it_a_keeper` prompts could leave the blue ring/listening state
|
||||
- alarm recognition collapsed several attempts before a complete alarm value could be set
|
||||
- `ffmpeg` failures were present during the same test window, so alarm/gallery retest should separate transcript quality from payload shape
|
||||
- Exit criteria:
|
||||
- gallery opens, offers to take a picture if empty, accepts `yes`, and hands into create
|
||||
- alarm set, clarify, and cancel flows behave locally without blue-ring stale turns
|
||||
- failures caused by collapsed STT transcripts are logged as STT issues rather than misdiagnosed as payload bugs
|
||||
- Next action:
|
||||
- run a stock OS `1.9` regression bundle before declaring `1.0.18` complete
|
||||
- re-run a stock OS `1.9` regression bundle before declaring `1.0.18` complete
|
||||
|
||||
### 5. Optional Small Feature Before `1.0.18` Freeze
|
||||
|
||||
@@ -176,10 +191,22 @@ Current release theme:
|
||||
- gallery, snapshot, and photobooth voice paths route to the correct local skills
|
||||
- media metadata persists locally
|
||||
- `/media/{path}` serves the current text-body placeholder payload
|
||||
- repeated empty `create/is_it_a_keeper` turns redirect to `@be/idle` after the second miss
|
||||
- Follow-up:
|
||||
- live regression remains in the immediate queue
|
||||
- binary-safe media storage remains future work
|
||||
|
||||
### Constrained Yes-No Cleanup
|
||||
|
||||
- Status: `implemented`
|
||||
- Tags: `protocol`, `stt`
|
||||
- Result:
|
||||
- `shared/yes_no` is included in yes/no STT-failure detection
|
||||
- local no-input replies strip global rules and keep the active constrained rule
|
||||
- update, OTA, share/date-offer, gallery shared yes/no, and create keeper rules share the same no-input fallback machinery
|
||||
- Follow-up:
|
||||
- live update/backup/share/gallery prompts still need another clean pass
|
||||
|
||||
### Word Of The Day Cleanup
|
||||
|
||||
- Status: `implemented`
|
||||
@@ -202,7 +229,7 @@ Current release theme:
|
||||
- current websocket service drops unknown inbound message types silently
|
||||
- synthetic `OPENJIBO_TURN_PENDING`, `OPENJIBO_CONTEXT_ACK`, and fallback `OPENJIBO_ACK` should no longer be emitted by current source
|
||||
- Follow-up:
|
||||
- if live logs show those event types, first verify the deployed process is actually the current build
|
||||
- `jibo test 22` still captured those event types from the deployed run, so the next deployment must verify the artifact/build as well as source
|
||||
|
||||
### Update Phantom Manifest Fix
|
||||
|
||||
@@ -273,6 +300,7 @@ Current release theme:
|
||||
- feature paths are now often correct when a transcript exists, but short replies and low-quality audio still block otherwise-correct flows
|
||||
- Current evidence:
|
||||
- live captures still show `ffmpeg` and `whisper.cpp` failures
|
||||
- current source now skips local whisper when buffered audio does not contain an Opus identification header
|
||||
- yes/no and alarm flows are especially sensitive to short or collapsed transcripts
|
||||
- Implementation notes:
|
||||
- add lightweight waveform or energy screening before transcription
|
||||
|
||||
@@ -413,11 +413,7 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
session.LastIntent = null;
|
||||
session.LastListenType = "no-input";
|
||||
var localRule = ReadPrimaryNoInputRule(finalizedTurn);
|
||||
var noInputReplies = ResponsePlanToSocketMessagesMapper.MapNoInput(
|
||||
turnState.TransId ?? session.LastTransId ?? string.Empty,
|
||||
string.IsNullOrWhiteSpace(localRule) ? turnState.ListenRules : [localRule])
|
||||
.Select(map => new WebSocketReply { Text = map.Text, DelayMs = map.DelayMs })
|
||||
.ToArray();
|
||||
var noInputReplies = BuildLocalNoInputReplies(session, turnState, localRule);
|
||||
ResetBufferedAudio(session);
|
||||
turnState.SawListen = false;
|
||||
turnState.SawContext = false;
|
||||
@@ -461,11 +457,7 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
session.LastIntent = null;
|
||||
session.LastListenType = "no-input";
|
||||
var localRule = ReadPrimaryYesNoRule(finalizedTurn);
|
||||
var noInputReplies = ResponsePlanToSocketMessagesMapper.MapNoInput(
|
||||
turnState.TransId ?? session.LastTransId ?? string.Empty,
|
||||
string.IsNullOrWhiteSpace(localRule) ? turnState.ListenRules : [localRule])
|
||||
.Select(map => new WebSocketReply { Text = map.Text, DelayMs = map.DelayMs })
|
||||
.ToArray();
|
||||
var noInputReplies = BuildLocalNoInputReplies(session, turnState, localRule);
|
||||
ResetBufferedAudio(session);
|
||||
return noInputReplies;
|
||||
}
|
||||
@@ -493,6 +485,8 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
session.LastTranscript = finalizedTurn.NormalizedTranscript ?? finalizedTurn.RawTranscript;
|
||||
session.LastIntent = plan.IntentName;
|
||||
session.LastListenType = listenAction?.Mode;
|
||||
turnState.LastLocalNoInputRule = null;
|
||||
turnState.LocalNoInputCount = 0;
|
||||
if (plan.Actions.OfType<InvokeNativeSkillAction>().FirstOrDefault() is { SkillName: "@be/clock", Payload: not null } clockAction &&
|
||||
clockAction.Payload.TryGetValue("domain", out var lastClockDomainValue) &&
|
||||
lastClockDomainValue is not null)
|
||||
@@ -720,12 +714,7 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
return ReadRules(turn, "listenRules")
|
||||
.Concat(ReadRules(turn, "clientRules"))
|
||||
.Concat(ReadRules(turn, "listenAsrHints"))
|
||||
.Any(static rule =>
|
||||
string.Equals(rule, "$YESNO", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "settings/download_now_later", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-date/offer_date_fact", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-ota/want_to_download_now", StringComparison.OrdinalIgnoreCase));
|
||||
.Any(IsYesNoRule);
|
||||
}
|
||||
|
||||
private static bool ShouldHandleAsLocalNoInput(TurnContext turn)
|
||||
@@ -737,8 +726,7 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
|
||||
return ReadRules(turn, "listenRules")
|
||||
.Concat(ReadRules(turn, "clientRules"))
|
||||
.Any(static rule =>
|
||||
string.Equals(rule, "clock/alarm_timer_okay", StringComparison.OrdinalIgnoreCase));
|
||||
.Any(IsLocalNoInputRule);
|
||||
}
|
||||
|
||||
private static string? ReadPrimaryNoInputRule(TurnContext turn)
|
||||
@@ -757,12 +745,65 @@ public sealed class WebSocketTurnFinalizationService(
|
||||
{
|
||||
return ReadRules(turn, "listenRules")
|
||||
.Concat(ReadRules(turn, "clientRules"))
|
||||
.FirstOrDefault(static rule =>
|
||||
string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "shared/yes_no", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "settings/download_now_later", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-date/offer_date_fact", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-ota/want_to_download_now", StringComparison.OrdinalIgnoreCase));
|
||||
.FirstOrDefault(IsConstrainedYesNoRule);
|
||||
}
|
||||
|
||||
private static IReadOnlyList<WebSocketReply> BuildLocalNoInputReplies(
|
||||
CloudSession session,
|
||||
WebSocketTurnState turnState,
|
||||
string? localRule)
|
||||
{
|
||||
var transId = turnState.TransId ?? session.LastTransId ?? string.Empty;
|
||||
var effectiveRule = string.IsNullOrWhiteSpace(localRule)
|
||||
? turnState.ListenRules.FirstOrDefault(IsLocalNoInputRule)
|
||||
: localRule;
|
||||
IReadOnlyList<string> rules = string.IsNullOrWhiteSpace(effectiveRule) ? turnState.ListenRules : [effectiveRule];
|
||||
var maps = ShouldRedirectRepeatedNoInputToIdle(turnState, effectiveRule)
|
||||
? ResponsePlanToSocketMessagesMapper.MapNoInputAndRedirectToSkill(transId, rules, "@be/idle")
|
||||
: ResponsePlanToSocketMessagesMapper.MapNoInput(transId, rules);
|
||||
|
||||
return maps
|
||||
.Select(map => new WebSocketReply { Text = map.Text, DelayMs = map.DelayMs })
|
||||
.ToArray();
|
||||
}
|
||||
|
||||
private static bool ShouldRedirectRepeatedNoInputToIdle(WebSocketTurnState turnState, string? localRule)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(localRule))
|
||||
{
|
||||
turnState.LastLocalNoInputRule = null;
|
||||
turnState.LocalNoInputCount = 0;
|
||||
return false;
|
||||
}
|
||||
|
||||
turnState.LocalNoInputCount = string.Equals(turnState.LastLocalNoInputRule, localRule, StringComparison.OrdinalIgnoreCase)
|
||||
? turnState.LocalNoInputCount + 1
|
||||
: 1;
|
||||
turnState.LastLocalNoInputRule = localRule;
|
||||
|
||||
return turnState.LocalNoInputCount >= 2 &&
|
||||
string.Equals(localRule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
private static bool IsYesNoRule(string rule)
|
||||
{
|
||||
return string.Equals(rule, "$YESNO", StringComparison.OrdinalIgnoreCase) ||
|
||||
IsConstrainedYesNoRule(rule);
|
||||
}
|
||||
|
||||
private static bool IsLocalNoInputRule(string rule)
|
||||
{
|
||||
return string.Equals(rule, "clock/alarm_timer_okay", StringComparison.OrdinalIgnoreCase) ||
|
||||
IsConstrainedYesNoRule(rule);
|
||||
}
|
||||
|
||||
private static bool IsConstrainedYesNoRule(string rule)
|
||||
{
|
||||
return string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "shared/yes_no", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "settings/download_now_later", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-date/offer_date_fact", StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(rule, "surprises-ota/want_to_download_now", StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
private static IEnumerable<string> ReadRules(TurnContext turn, string key)
|
||||
|
||||
@@ -18,6 +18,8 @@ public sealed class WebSocketTurnState
|
||||
public int BufferedAudioBytes { get; set; }
|
||||
public List<byte[]> BufferedAudioFrames { get; } = [];
|
||||
public int FinalizeAttemptCount { get; set; }
|
||||
public string? LastLocalNoInputRule { get; set; }
|
||||
public int LocalNoInputCount { get; set; }
|
||||
public bool AwaitingTurnCompletion { get; set; }
|
||||
public bool SawListen { get; set; }
|
||||
public bool SawContext { get; set; }
|
||||
|
||||
@@ -15,7 +15,7 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategy(
|
||||
IsConfiguredPathAvailable(options.FfmpegPath, checkFileExists: false) &&
|
||||
IsConfiguredPathAvailable(options.WhisperCliPath, checkFileExists: true) &&
|
||||
IsConfiguredPathAvailable(options.WhisperModelPath, checkFileExists: true) &&
|
||||
ReadBufferedAudioFrames(turn).Count > 0;
|
||||
ReadBufferedAudioFrames(turn).Any(ContainsOpusIdentificationHeader);
|
||||
}
|
||||
|
||||
public async Task<SttResult> TranscribeAsync(TurnContext turn, CancellationToken cancellationToken = default)
|
||||
@@ -26,6 +26,11 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategy(
|
||||
throw new InvalidOperationException("Local whisper.cpp STT requires buffered websocket audio frames.");
|
||||
}
|
||||
|
||||
if (!frames.Any(ContainsOpusIdentificationHeader))
|
||||
{
|
||||
throw new InvalidOperationException("Local whisper.cpp STT requires buffered Ogg/Opus audio with an Opus identification header.");
|
||||
}
|
||||
|
||||
var tempDirectory = options.TempDirectory;
|
||||
if (string.IsNullOrWhiteSpace(tempDirectory))
|
||||
{
|
||||
@@ -116,6 +121,11 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategy(
|
||||
: 0;
|
||||
}
|
||||
|
||||
private static bool ContainsOpusIdentificationHeader(byte[] frame)
|
||||
{
|
||||
return frame.AsSpan().IndexOf("OpusHead"u8) >= 0;
|
||||
}
|
||||
|
||||
private static string ExtractTranscript(string standardOutput)
|
||||
{
|
||||
var lines = standardOutput
|
||||
|
||||
@@ -1124,6 +1124,117 @@ public sealed class JiboWebSocketServiceTests
|
||||
Assert.Equal("surprises-ota/want_to_download_now", rules[0].GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BufferedAudio_SharedYesNoPromptWithSttFailure_AutoFinalizesAsLocalNoInput()
|
||||
{
|
||||
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-shared-yesno-noinput-token",
|
||||
Text = """{"type":"LISTEN","transID":"trans-shared-yesno-noinput","data":{"rules":["shared/yes_no","globals/gui_nav","globals/mim_repeat","globals/global_commands_launch"],"asr":{"hints":["$YESNO"]}}}"""
|
||||
});
|
||||
|
||||
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-shared-yesno-noinput-token",
|
||||
Text = """{"type":"CONTEXT","transID":"trans-shared-yesno-noinput","data":{"topic":"conversation"}}"""
|
||||
});
|
||||
|
||||
for (var index = 0; index < 4; index += 1)
|
||||
{
|
||||
var interimReplies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-shared-yesno-noinput-token",
|
||||
Binary = new byte[3000]
|
||||
});
|
||||
|
||||
Assert.Empty(interimReplies);
|
||||
}
|
||||
|
||||
var session = _store.FindSessionByToken("hub-shared-yesno-noinput-token");
|
||||
Assert.NotNull(session);
|
||||
session.TurnState.FirstAudioReceivedUtc = DateTimeOffset.UtcNow - TimeSpan.FromSeconds(2);
|
||||
session.TurnState.LastSttError = "ffmpeg decode failed";
|
||||
|
||||
var replies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-shared-yesno-noinput-token",
|
||||
Binary = new byte[3000]
|
||||
});
|
||||
|
||||
Assert.Equal(2, replies.Count);
|
||||
Assert.Equal("LISTEN", ReadReplyType(replies[0]));
|
||||
Assert.Equal("EOS", ReadReplyType(replies[1]));
|
||||
|
||||
using var listenPayload = JsonDocument.Parse(replies[0].Text!);
|
||||
var rules = listenPayload.RootElement.GetProperty("data").GetProperty("nlu").GetProperty("rules");
|
||||
Assert.Single(rules.EnumerateArray());
|
||||
Assert.Equal("shared/yes_no", rules[0].GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ClientAsr_CreateKeeperRepeatedNoInput_RedirectsToIdle()
|
||||
{
|
||||
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-create-noinput-token",
|
||||
Text = """{"type":"LISTEN","transID":"trans-create-noinput-1","data":{"rules":["create/is_it_a_keeper","globals/gui_nav","globals/mim_repeat","globals/global_commands_launch"]}}"""
|
||||
});
|
||||
|
||||
var firstReplies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-create-noinput-token",
|
||||
Text = """{"type":"CLIENT_ASR","transID":"trans-create-noinput-1","data":{}}"""
|
||||
});
|
||||
|
||||
Assert.Equal(2, firstReplies.Count);
|
||||
Assert.Equal("LISTEN", ReadReplyType(firstReplies[0]));
|
||||
Assert.Equal("EOS", ReadReplyType(firstReplies[1]));
|
||||
|
||||
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-create-noinput-token",
|
||||
Text = """{"type":"LISTEN","transID":"trans-create-noinput-2","data":{"rules":["create/is_it_a_keeper","globals/gui_nav","globals/mim_repeat","globals/global_commands_launch"]}}"""
|
||||
});
|
||||
|
||||
var secondReplies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||
{
|
||||
HostName = "neo-hub.jibo.com",
|
||||
Path = "/listen",
|
||||
Kind = "neo-hub-listen",
|
||||
Token = "hub-create-noinput-token",
|
||||
Text = """{"type":"CLIENT_ASR","transID":"trans-create-noinput-2","data":{}}"""
|
||||
});
|
||||
|
||||
Assert.Equal(3, secondReplies.Count);
|
||||
Assert.Equal("LISTEN", ReadReplyType(secondReplies[0]));
|
||||
Assert.Equal("EOS", ReadReplyType(secondReplies[1]));
|
||||
Assert.Equal("SKILL_REDIRECT", ReadReplyType(secondReplies[2]));
|
||||
|
||||
using var redirectPayload = JsonDocument.Parse(secondReplies[2].Text!);
|
||||
Assert.Equal("@be/idle", redirectPayload.RootElement.GetProperty("data").GetProperty("match").GetProperty("skillID").GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ClientAsr_SurprisesDateOfferPrompt_MapsYesWithoutGlobalRuleLeak()
|
||||
{
|
||||
|
||||
@@ -53,6 +53,30 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategyTests
|
||||
Assert.False(strategy.CanHandle(turn));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void CanHandle_ReturnsFalse_WhenBufferedAudioHasNoOpusIdentificationHeader()
|
||||
{
|
||||
var strategy = new LocalWhisperCppBufferedAudioSttStrategy(
|
||||
new BufferedAudioSttOptions
|
||||
{
|
||||
EnableLocalWhisperCpp = true,
|
||||
FfmpegPath = "ffmpeg",
|
||||
WhisperCliPath = "whisper-cli",
|
||||
WhisperModelPath = "model.bin"
|
||||
},
|
||||
new FakeExternalProcessRunner());
|
||||
|
||||
var turn = new TurnContext
|
||||
{
|
||||
Attributes = new Dictionary<string, object?>
|
||||
{
|
||||
["bufferedAudioFrames"] = new[] { BuildMinimalOggPageWithoutOpusHead() }
|
||||
}
|
||||
};
|
||||
|
||||
Assert.False(strategy.CanHandle(turn));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task TranscribeAsync_UsesFfmpegAndWhisperCpp_WhenConfigured()
|
||||
{
|
||||
@@ -119,6 +143,13 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategyTests
|
||||
];
|
||||
}
|
||||
|
||||
private static byte[] BuildMinimalOggPageWithoutOpusHead()
|
||||
{
|
||||
var page = BuildMinimalOggPage();
|
||||
"NotAudio"u8.CopyTo(page.AsSpan(28, 8));
|
||||
return page;
|
||||
}
|
||||
|
||||
private sealed class FakeExternalProcessRunner : IExternalProcessRunner
|
||||
{
|
||||
public List<(string FileName, IReadOnlyList<string> Arguments)> Calls { get; } = [];
|
||||
|
||||
Reference in New Issue
Block a user