enhanced skill and yes/no routing
This commit is contained in:
@@ -69,6 +69,27 @@ Near-term ASR work should stay staged:
|
|||||||
|
|
||||||
That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
|
That keeps Node as the reverse-engineering oracle while letting the long-term `.NET` cloud gain real STT seams without pretending they are finished.
|
||||||
|
|
||||||
|
## Latest Capture Findings
|
||||||
|
|
||||||
|
The latest live test round tightened up three priorities:
|
||||||
|
|
||||||
|
- yes/no turns need explicit constrained follow-up handling instead of generic chat routing
|
||||||
|
- skill invocation still depends too much on narrow phrase matching and is vulnerable to STT drift
|
||||||
|
- local buffered-audio STT in `.NET` is useful for discovery, but it is not yet stable enough to be the default live-test assumption
|
||||||
|
|
||||||
|
Evidence from the latest `2026-04-18` captures:
|
||||||
|
|
||||||
|
- several buffered-audio turns never produced a usable transcript because the local `whisper.cpp` path was missing or the temporary normalized Ogg file was rejected by `ffmpeg`
|
||||||
|
- some recognized phrases fell into placeholder provider replies because the intent was recognized but the feature path behind it is still a stub
|
||||||
|
- short yes/no responses need the same session-aware treatment already prototyped in Node, especially for create-flow style follow-ups
|
||||||
|
|
||||||
|
Near-term interaction work should now prioritize:
|
||||||
|
|
||||||
|
1. preserve and interpret yes/no turn constraints from observed listen rules
|
||||||
|
2. broaden phrase-to-intent matching for the small set of known working skills before moving to larger NLU ambitions
|
||||||
|
3. keep synthetic transcript hints as the most reliable parity path when captures already provide them
|
||||||
|
4. continue evaluating whether local preprocessing is worth further investment or whether managed STT should replace it for the next serious testing phase
|
||||||
|
|
||||||
## Working Cloud Framework
|
## Working Cloud Framework
|
||||||
|
|
||||||
The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
|
The current evidence in captures, fixtures, and Node behavior supports three main cloud interaction paths:
|
||||||
|
|||||||
@@ -130,6 +130,23 @@ python3 ./scripts/cloud/import-websocket-capture-fixture.py \
|
|||||||
- whether EOS timing matched expectations
|
- whether EOS timing matched expectations
|
||||||
- whether any unexpected message families appeared
|
- whether any unexpected message families appeared
|
||||||
|
|
||||||
|
## Latest Test Notes To Carry Forward
|
||||||
|
|
||||||
|
The most recent live round showed that startup and some Q-and-A paths are progressing, but audio-turn reliability is still uneven.
|
||||||
|
|
||||||
|
Carry these expectations into the next run:
|
||||||
|
|
||||||
|
- constrained yes/no replies should be tested intentionally because they need special handling and are easy to miss if STT drifts
|
||||||
|
- phrases intended to trigger known skills should be repeated using a small, documented wording set so we can separate routing issues from Whisper errors
|
||||||
|
- provider-backed placeholder answers are still expected for weather, commute, calendar, news, and similar routes unless that feature path is explicitly implemented
|
||||||
|
|
||||||
|
For STT during live testing:
|
||||||
|
|
||||||
|
- prefer runs where `audioTranscriptHint` or other synthetic replay cues are available
|
||||||
|
- do not assume local `whisper.cpp` success means the audio pipeline is stable overall
|
||||||
|
- if many turns stay pending or `ffmpeg` rejects normalized Ogg files, treat that as a speech-pipeline issue first, not an intent-mapping issue
|
||||||
|
- keep the Node server available as the comparison path for yes/no and audio-preprocessing behavior
|
||||||
|
|
||||||
## What To Do If The Test Fails
|
## What To Do If The Test Fails
|
||||||
|
|
||||||
If the robot does not connect or the first turn fails:
|
If the robot does not connect or the first turn fails:
|
||||||
|
|||||||
54
OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
Normal file
54
OpenJibo/docs/prompts/cloud-deploy-and-jibo-rcm-path.md
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
# Cloud Deploy And Jibo RCM Path Prompt
|
||||||
|
|
||||||
|
Prepare OpenJibo for a lightweight v1 cloud deployment and the cleanest practical Jibo configuration path for group testing.
|
||||||
|
|
||||||
|
Current repo context:
|
||||||
|
|
||||||
|
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
|
||||||
|
- the current `.NET` cloud is the target runtime
|
||||||
|
- the Node server remains a discovery oracle and fallback
|
||||||
|
- latest live-test guidance is in:
|
||||||
|
- `docs/live-jibo-test-runbook.md`
|
||||||
|
- `docs/live-jibo-capture.md`
|
||||||
|
- `docs/device-bootstrap.md`
|
||||||
|
- `docs/development-plan.md`
|
||||||
|
- `src/Jibo.Cloud/dotnet/README.md`
|
||||||
|
|
||||||
|
What we need from this workstream:
|
||||||
|
|
||||||
|
1. define the smallest, cleanest, easiest-to-repeat deployment path for a v1 hosted OpenJibo cloud
|
||||||
|
2. define the lightest reliable way to configure Jibo devices to use that cloud, with as few manual error-prone steps as possible
|
||||||
|
3. produce scripts and docs that make it realistic for additional revival-group testers to get connected quickly
|
||||||
|
|
||||||
|
Important goals:
|
||||||
|
|
||||||
|
- prefer a path that is easy for non-experts in the revival group to follow
|
||||||
|
- minimize hand-edited device changes and confusing setup steps
|
||||||
|
- preserve a clear fallback path when a deployment or routing change fails
|
||||||
|
- keep the deployment practical for a small testing cohort first; enterprise polish can come later
|
||||||
|
|
||||||
|
Areas to review:
|
||||||
|
|
||||||
|
- current API host and routing logic in `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Api/Program.cs`
|
||||||
|
- existing scripts under:
|
||||||
|
- `scripts/cloud/`
|
||||||
|
- `scripts/bootstrap/`
|
||||||
|
- docs around routing and bootstrap in:
|
||||||
|
- `docs/device-bootstrap.md`
|
||||||
|
- `docs/live-jibo-test-runbook.md`
|
||||||
|
- `docs/live-jibo-capture.md`
|
||||||
|
|
||||||
|
Deliverables:
|
||||||
|
|
||||||
|
- a concrete v1 deployment recommendation
|
||||||
|
- any needed deployment scripts or setup helpers
|
||||||
|
- a clean Jibo configuration / routing / RCM procedure with the fewest practical steps
|
||||||
|
- validation steps that clearly distinguish cloud issues from robot/network issues
|
||||||
|
- doc updates aimed at making group adoption fast and low-risk
|
||||||
|
|
||||||
|
Constraints:
|
||||||
|
|
||||||
|
- do not over-design for full production scale yet
|
||||||
|
- avoid adding multiple competing deployment paths unless there is a strong reason
|
||||||
|
- optimize for reliability, repeatability, and low support burden for the next round of testers
|
||||||
|
- keep the Node oracle available as a troubleshooting fallback until `.NET` parity is clearly strong enough
|
||||||
47
OpenJibo/docs/prompts/stt-upgrade-path.md
Normal file
47
OpenJibo/docs/prompts/stt-upgrade-path.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# STT Upgrade Path Prompt
|
||||||
|
|
||||||
|
Improve the OpenJibo `.NET` speech-to-text path for live robot testing.
|
||||||
|
|
||||||
|
Current repo context:
|
||||||
|
|
||||||
|
- workspace root: `C:\Projects\JiboExperiments\OpenJibo`
|
||||||
|
- current live captures from `2026-04-18` showed that some turns succeeded, but many buffered-audio turns failed before producing a usable transcript
|
||||||
|
- the current local `.NET` STT path is in:
|
||||||
|
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/LocalWhisperCppBufferedAudioSttStrategy.cs`
|
||||||
|
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Infrastructure/Audio/OggOpusAudioNormalizer.cs`
|
||||||
|
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/WebSocketTurnFinalizationService.cs`
|
||||||
|
- `src/Jibo.Cloud/dotnet/src/Jibo.Cloud.Application/Services/DefaultSttStrategySelector.cs`
|
||||||
|
- Node remains the oracle for current behavior:
|
||||||
|
- `src/Jibo.Cloud/node/open-jibo-link.js`
|
||||||
|
- live test evidence and guidance are documented in:
|
||||||
|
- `docs/development-plan.md`
|
||||||
|
- `docs/live-jibo-test-runbook.md`
|
||||||
|
- `src/Jibo.Cloud/dotnet/README.md`
|
||||||
|
|
||||||
|
Observed problems to ground the work:
|
||||||
|
|
||||||
|
- one captured run could not find `whisper-cli` at the configured rooted path
|
||||||
|
- many buffered-audio turns failed because `ffmpeg` rejected the normalized Ogg output
|
||||||
|
- we need a more reliable path for testing than the current partially working local whisper chain
|
||||||
|
|
||||||
|
Goals:
|
||||||
|
|
||||||
|
1. review the current `.NET` STT seam and compare it against the Node preprocessing flow
|
||||||
|
2. recommend and implement the best next STT path for testing, preferring reliability and simplicity over novelty
|
||||||
|
3. keep the STT integration behind the existing abstractions so we can swap providers later
|
||||||
|
4. preserve or improve telemetry so failed turns clearly show whether the problem is decode, tool lookup, provider failure, or unusable transcript quality
|
||||||
|
5. update tests and docs to match the chosen direction
|
||||||
|
|
||||||
|
Constraints:
|
||||||
|
|
||||||
|
- do not remove the synthetic transcript-hint path; it is still valuable for fixture replay and parity
|
||||||
|
- do not assume Azure-hosted STT is automatically the answer unless the codebase and testing needs support that choice
|
||||||
|
- prefer an implementation that is easy for other revival-group testers to run consistently
|
||||||
|
- avoid large speculative architecture changes that are not needed for a near-term v1 testable cloud
|
||||||
|
|
||||||
|
Deliverables:
|
||||||
|
|
||||||
|
- code changes for the improved STT path
|
||||||
|
- tests covering strategy selection, success, and failure handling
|
||||||
|
- doc updates with exact setup guidance and a recommendation on whether local whisper remains optional, fallback-only, or deprecated for testing
|
||||||
|
- a short summary of the tradeoffs and why the chosen path is the best next step
|
||||||
@@ -108,6 +108,9 @@ Current raw-audio behavior is still a compatibility bridge:
|
|||||||
- if buffered audio has a synthetic transcript hint, the server now auto-finalizes the turn and emits `LISTEN` + `EOS` + `SKILL_ACTION`
|
- if buffered audio has a synthetic transcript hint, the server now auto-finalizes the turn and emits `LISTEN` + `EOS` + `SKILL_ACTION`
|
||||||
- if buffered audio crosses the finalize threshold without a usable transcript, the server now emits a Node-style fallback completion with `EOS` instead of hanging the turn forever
|
- if buffered audio crosses the finalize threshold without a usable transcript, the server now emits a Node-style fallback completion with `EOS` instead of hanging the turn forever
|
||||||
- this is intentionally not a claim of real ASR parity
|
- this is intentionally not a claim of real ASR parity
|
||||||
|
- follow-up turns now preserve enough constraint state to distinguish yes/no-style replies from ordinary free-form chat
|
||||||
|
- create-flow yes/no turns now preserve `create/is_it_a_keeper` and `domain=create` in the outbound synthetic `LISTEN` payload
|
||||||
|
- phrase matching has been widened slightly for known test prompts such as joke, dance, surprise, weather, calendar, commute, and news variants
|
||||||
|
|
||||||
## Buffered Audio STT
|
## Buffered Audio STT
|
||||||
|
|
||||||
@@ -138,6 +141,13 @@ Configuration lives under `OpenJibo:Stt`:
|
|||||||
|
|
||||||
This is not yet a claim of production-ready onboard ASR. It is a `.NET` discovery seam that keeps us compatible with the Node oracle while we evaluate longer-term options such as Azure-hosted STT or a managed decode/transcribe stack.
|
This is not yet a claim of production-ready onboard ASR. It is a `.NET` discovery seam that keeps us compatible with the Node oracle while we evaluate longer-term options such as Azure-hosted STT or a managed decode/transcribe stack.
|
||||||
|
|
||||||
|
Latest live-capture guidance after the `2026-04-18` round:
|
||||||
|
|
||||||
|
- prefer synthetic transcript hints when they are present in the observed turn
|
||||||
|
- only use local `whisper.cpp` when the configured tool paths are real and the decode chain is behaving
|
||||||
|
- treat `ffmpeg` decode failures on normalized Ogg captures as evidence that the local audio path still needs more hardening before it can be the default live-test expectation
|
||||||
|
- keep the Node implementation as the oracle for yes/no turn semantics and audio preprocessing details until the `.NET` port catches up
|
||||||
|
|
||||||
## Current Interaction Paths
|
## Current Interaction Paths
|
||||||
|
|
||||||
The working cloud model currently looks like three main paths:
|
The working cloud model currently looks like three main paths:
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
using Jibo.Cloud.Application.Abstractions;
|
using Jibo.Cloud.Application.Abstractions;
|
||||||
using Jibo.Runtime.Abstractions;
|
using Jibo.Runtime.Abstractions;
|
||||||
|
using System.Text.Json;
|
||||||
|
|
||||||
namespace Jibo.Cloud.Application.Services;
|
namespace Jibo.Cloud.Application.Services;
|
||||||
|
|
||||||
@@ -15,8 +16,9 @@ public sealed class JiboInteractionService(
|
|||||||
var clientIntent = turn.Attributes.TryGetValue("clientIntent", out var rawClientIntent)
|
var clientIntent = turn.Attributes.TryGetValue("clientIntent", out var rawClientIntent)
|
||||||
? rawClientIntent?.ToString()
|
? rawClientIntent?.ToString()
|
||||||
: null;
|
: null;
|
||||||
|
var isYesNoTurn = IsYesNoTurn(turn);
|
||||||
|
|
||||||
var semanticIntent = ResolveSemanticIntent(lowered, clientIntent);
|
var semanticIntent = ResolveSemanticIntent(lowered, clientIntent, isYesNoTurn);
|
||||||
return semanticIntent switch
|
return semanticIntent switch
|
||||||
{
|
{
|
||||||
"joke" => BuildJokeDecision(catalog),
|
"joke" => BuildJokeDecision(catalog),
|
||||||
@@ -25,6 +27,8 @@ public sealed class JiboInteractionService(
|
|||||||
"date" => new JiboInteractionDecision("date", $"Today is {DateTime.Now:dddd, MMMM d}."),
|
"date" => new JiboInteractionDecision("date", $"Today is {DateTime.Now:dddd, MMMM d}."),
|
||||||
"hello" => new JiboInteractionDecision("hello", randomizer.Choose(catalog.GreetingReplies)),
|
"hello" => new JiboInteractionDecision("hello", randomizer.Choose(catalog.GreetingReplies)),
|
||||||
"how_are_you" => new JiboInteractionDecision("how_are_you", randomizer.Choose(catalog.HowAreYouReplies)),
|
"how_are_you" => new JiboInteractionDecision("how_are_you", randomizer.Choose(catalog.HowAreYouReplies)),
|
||||||
|
"yes" => new JiboInteractionDecision("yes", "Yes."),
|
||||||
|
"no" => new JiboInteractionDecision("no", "No."),
|
||||||
"surprise" => new JiboInteractionDecision("surprise", randomizer.Choose(catalog.SurpriseReplies)),
|
"surprise" => new JiboInteractionDecision("surprise", randomizer.Choose(catalog.SurpriseReplies)),
|
||||||
"personal_report" => new JiboInteractionDecision("personal_report", randomizer.Choose(catalog.PersonalReportReplies)),
|
"personal_report" => new JiboInteractionDecision("personal_report", randomizer.Choose(catalog.PersonalReportReplies)),
|
||||||
"weather" => new JiboInteractionDecision("weather", randomizer.Choose(catalog.WeatherReplies)),
|
"weather" => new JiboInteractionDecision("weather", randomizer.Choose(catalog.WeatherReplies)),
|
||||||
@@ -86,7 +90,7 @@ public sealed class JiboInteractionService(
|
|||||||
.Replace("{transcript}", transcript, StringComparison.Ordinal);
|
.Replace("{transcript}", transcript, StringComparison.Ordinal);
|
||||||
}
|
}
|
||||||
|
|
||||||
private static string ResolveSemanticIntent(string loweredTranscript, string? clientIntent)
|
private static string ResolveSemanticIntent(string loweredTranscript, string? clientIntent, bool isYesNoTurn)
|
||||||
{
|
{
|
||||||
if (string.Equals(clientIntent, "askForTime", StringComparison.OrdinalIgnoreCase))
|
if (string.Equals(clientIntent, "askForTime", StringComparison.OrdinalIgnoreCase))
|
||||||
{
|
{
|
||||||
@@ -98,72 +102,112 @@ public sealed class JiboInteractionService(
|
|||||||
return "date";
|
return "date";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("joke", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "joke", "funny", "make me laugh"))
|
||||||
{
|
{
|
||||||
return "joke";
|
return "joke";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("dance", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "dance", "boogie"))
|
||||||
{
|
{
|
||||||
return "dance";
|
return "dance";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("surprise", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "surprise", "surprise me", "show me something fun"))
|
||||||
{
|
{
|
||||||
return "surprise";
|
return "surprise";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("personal report", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "personal report", "my report", "daily report", "my update"))
|
||||||
{
|
{
|
||||||
return "personal_report";
|
return "personal_report";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("weather", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "weather", "forecast", "weather report", "is it raining"))
|
||||||
{
|
{
|
||||||
return "weather";
|
return "weather";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("calendar", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "calendar", "schedule", "what's on my calendar", "what is on my calendar"))
|
||||||
{
|
{
|
||||||
return "calendar";
|
return "calendar";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("commute", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "commute", "traffic", "drive to work", "how long to work"))
|
||||||
{
|
{
|
||||||
return "commute";
|
return "commute";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("news", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "news", "headlines", "news update", "tell me the news"))
|
||||||
{
|
{
|
||||||
return "news";
|
return "news";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("how are you", StringComparison.Ordinal) ||
|
if (MatchesAny(loweredTranscript, "how are you", "what's up", "what s up", "what up"))
|
||||||
loweredTranscript.Contains("what's up", StringComparison.Ordinal) ||
|
|
||||||
loweredTranscript.Contains("what s up", StringComparison.Ordinal))
|
|
||||||
{
|
{
|
||||||
return "how_are_you";
|
return "how_are_you";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("hello", StringComparison.Ordinal) ||
|
if (MatchesAny(loweredTranscript, "hello", "hi", "hey"))
|
||||||
loweredTranscript.Contains("hi", StringComparison.Ordinal) ||
|
|
||||||
loweredTranscript.Contains("hey", StringComparison.Ordinal))
|
|
||||||
{
|
{
|
||||||
return "hello";
|
return "hello";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("time", StringComparison.Ordinal))
|
if (isYesNoTurn && MatchesAny(loweredTranscript, "yes", "yeah", "yup", "sure", "uh huh"))
|
||||||
|
{
|
||||||
|
return "yes";
|
||||||
|
}
|
||||||
|
|
||||||
|
if (isYesNoTurn && MatchesAny(loweredTranscript, "no", "nope", "nah"))
|
||||||
|
{
|
||||||
|
return "no";
|
||||||
|
}
|
||||||
|
|
||||||
|
if (MatchesAny(loweredTranscript, "what time is it", "current time", "the time", "time is it") ||
|
||||||
|
loweredTranscript.Contains("time", StringComparison.Ordinal))
|
||||||
{
|
{
|
||||||
return "time";
|
return "time";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (loweredTranscript.Contains("date", StringComparison.Ordinal) || loweredTranscript.Contains("day", StringComparison.Ordinal))
|
if (MatchesAny(loweredTranscript, "what day is it", "what is the date", "today s date", "today's date") ||
|
||||||
|
loweredTranscript.Contains("date", StringComparison.Ordinal) ||
|
||||||
|
loweredTranscript.Contains("day", StringComparison.Ordinal))
|
||||||
{
|
{
|
||||||
return "date";
|
return "date";
|
||||||
}
|
}
|
||||||
|
|
||||||
return "chat";
|
return "chat";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static bool IsYesNoTurn(TurnContext turn)
|
||||||
|
{
|
||||||
|
return ReadRules(turn, "listenRules").Concat(ReadRules(turn, "clientRules"))
|
||||||
|
.Any(static rule =>
|
||||||
|
string.Equals(rule, "$YESNO", StringComparison.OrdinalIgnoreCase) ||
|
||||||
|
string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase));
|
||||||
|
}
|
||||||
|
|
||||||
|
private static IEnumerable<string> ReadRules(TurnContext turn, string key)
|
||||||
|
{
|
||||||
|
if (!turn.Attributes.TryGetValue(key, out var value) || value is null)
|
||||||
|
{
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
return value switch
|
||||||
|
{
|
||||||
|
IReadOnlyList<string> typed => typed,
|
||||||
|
IEnumerable<string> strings => strings,
|
||||||
|
JsonElement { ValueKind: JsonValueKind.Array } json => json.EnumerateArray()
|
||||||
|
.Where(static item => item.ValueKind == JsonValueKind.String)
|
||||||
|
.Select(static item => item.GetString() ?? string.Empty),
|
||||||
|
_ => []
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
private static bool MatchesAny(string loweredTranscript, params string[] candidates)
|
||||||
|
{
|
||||||
|
return candidates.Any(candidate => loweredTranscript.Contains(candidate, StringComparison.Ordinal));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
public sealed record JiboInteractionDecision(
|
public sealed record JiboInteractionDecision(
|
||||||
|
|||||||
@@ -17,13 +17,20 @@ public sealed class ResponsePlanToSocketMessagesMapper
|
|||||||
var transcript = turn.NormalizedTranscript ?? turn.RawTranscript ?? string.Empty;
|
var transcript = turn.NormalizedTranscript ?? turn.RawTranscript ?? string.Empty;
|
||||||
var clientIntent = ReadAttribute(turn, "clientIntent");
|
var clientIntent = ReadAttribute(turn, "clientIntent");
|
||||||
var rules = ReadRules(turn, messageType);
|
var rules = ReadRules(turn, messageType);
|
||||||
|
var yesNoCreateRule = ReadYesNoCreateRule(turn);
|
||||||
|
var isYesNoTurn = !string.IsNullOrWhiteSpace(yesNoCreateRule);
|
||||||
|
var isYesNoIntent = string.Equals(plan.IntentName, "yes", StringComparison.OrdinalIgnoreCase) ||
|
||||||
|
string.Equals(plan.IntentName, "no", StringComparison.OrdinalIgnoreCase);
|
||||||
var outboundIntent = string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase) && !string.IsNullOrWhiteSpace(clientIntent)
|
var outboundIntent = string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase) && !string.IsNullOrWhiteSpace(clientIntent)
|
||||||
? clientIntent
|
? clientIntent
|
||||||
: plan.IntentName ?? "unknown";
|
: plan.IntentName ?? "unknown";
|
||||||
var outboundAsrText = string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase) && !string.IsNullOrWhiteSpace(clientIntent)
|
var outboundAsrText = isYesNoTurn && isYesNoIntent
|
||||||
? clientIntent
|
? transcript
|
||||||
: transcript;
|
: string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase) && !string.IsNullOrWhiteSpace(clientIntent)
|
||||||
var entities = ReadEntities(turn, messageType);
|
? clientIntent
|
||||||
|
: transcript;
|
||||||
|
var outboundRules = isYesNoTurn && isYesNoIntent ? [yesNoCreateRule!] : rules;
|
||||||
|
var entities = ReadEntities(turn, messageType, isYesNoTurn && isYesNoIntent);
|
||||||
var messages = new List<SocketReplyPlan>
|
var messages = new List<SocketReplyPlan>
|
||||||
{
|
{
|
||||||
new(JsonSerializer.Serialize(new
|
new(JsonSerializer.Serialize(new
|
||||||
@@ -42,13 +49,13 @@ public sealed class ResponsePlanToSocketMessagesMapper
|
|||||||
{
|
{
|
||||||
confidence = 0.95,
|
confidence = 0.95,
|
||||||
intent = outboundIntent,
|
intent = outboundIntent,
|
||||||
rules,
|
rules = outboundRules,
|
||||||
entities
|
entities
|
||||||
},
|
},
|
||||||
match = new
|
match = new
|
||||||
{
|
{
|
||||||
intent = outboundIntent,
|
intent = outboundIntent,
|
||||||
rule = rules.FirstOrDefault() ?? string.Empty,
|
rule = outboundRules.FirstOrDefault() ?? string.Empty,
|
||||||
score = 0.95
|
score = 0.95
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -135,8 +142,16 @@ public sealed class ResponsePlanToSocketMessagesMapper
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
private static object ReadEntities(TurnContext turn, string? messageType)
|
private static object ReadEntities(TurnContext turn, string? messageType, bool yesNoCreateTurn)
|
||||||
{
|
{
|
||||||
|
if (yesNoCreateTurn)
|
||||||
|
{
|
||||||
|
return new Dictionary<string, object?>
|
||||||
|
{
|
||||||
|
["domain"] = "create"
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
if (!string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase))
|
if (!string.Equals(messageType, "CLIENT_NLU", StringComparison.OrdinalIgnoreCase))
|
||||||
{
|
{
|
||||||
return new Dictionary<string, object?>();
|
return new Dictionary<string, object?>();
|
||||||
@@ -155,6 +170,35 @@ public sealed class ResponsePlanToSocketMessagesMapper
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static string? ReadYesNoCreateRule(TurnContext turn)
|
||||||
|
{
|
||||||
|
return ReadRuleValues(turn)
|
||||||
|
.FirstOrDefault(static rule => string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase));
|
||||||
|
}
|
||||||
|
|
||||||
|
private static IEnumerable<string> ReadRuleValues(TurnContext turn)
|
||||||
|
{
|
||||||
|
return ReadRuleValues(turn, "listenRules").Concat(ReadRuleValues(turn, "clientRules"));
|
||||||
|
}
|
||||||
|
|
||||||
|
private static IEnumerable<string> ReadRuleValues(TurnContext turn, string key)
|
||||||
|
{
|
||||||
|
if (!turn.Attributes.TryGetValue(key, out var value) || value is null)
|
||||||
|
{
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
return value switch
|
||||||
|
{
|
||||||
|
IReadOnlyList<string> typedRules => typedRules,
|
||||||
|
IEnumerable<string> rules => rules,
|
||||||
|
JsonElement { ValueKind: JsonValueKind.Array } jsonElement => jsonElement.EnumerateArray()
|
||||||
|
.Where(static item => item.ValueKind == JsonValueKind.String)
|
||||||
|
.Select(static item => item.GetString() ?? string.Empty),
|
||||||
|
_ => []
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
private static string? ReadAttribute(TurnContext turn, string key)
|
private static string? ReadAttribute(TurnContext turn, string key)
|
||||||
{
|
{
|
||||||
return turn.Attributes.TryGetValue(key, out var value)
|
return turn.Attributes.TryGetValue(key, out var value)
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ using System.Text.Json;
|
|||||||
using Jibo.Cloud.Application.Abstractions;
|
using Jibo.Cloud.Application.Abstractions;
|
||||||
using Jibo.Cloud.Domain.Models;
|
using Jibo.Cloud.Domain.Models;
|
||||||
using Jibo.Runtime.Abstractions;
|
using Jibo.Runtime.Abstractions;
|
||||||
|
using System.Text.RegularExpressions;
|
||||||
|
|
||||||
namespace Jibo.Cloud.Application.Services;
|
namespace Jibo.Cloud.Application.Services;
|
||||||
|
|
||||||
@@ -302,6 +303,32 @@ public sealed class WebSocketTurnFinalizationService(
|
|||||||
{
|
{
|
||||||
var turn = ProtocolToTurnContextMapper.MapListenMessage(envelope, session, messageType);
|
var turn = ProtocolToTurnContextMapper.MapListenMessage(envelope, session, messageType);
|
||||||
var finalizedTurn = await ResolveTranscriptAsync(turn, session, cancellationToken);
|
var finalizedTurn = await ResolveTranscriptAsync(turn, session, cancellationToken);
|
||||||
|
if (!IsTranscriptUsable(finalizedTurn))
|
||||||
|
{
|
||||||
|
finalizedTurn = new TurnContext
|
||||||
|
{
|
||||||
|
TurnId = finalizedTurn.TurnId,
|
||||||
|
SessionId = finalizedTurn.SessionId,
|
||||||
|
TimestampUtc = finalizedTurn.TimestampUtc,
|
||||||
|
InputMode = finalizedTurn.InputMode,
|
||||||
|
SourceKind = finalizedTurn.SourceKind,
|
||||||
|
WakePhrase = finalizedTurn.WakePhrase,
|
||||||
|
RawTranscript = null,
|
||||||
|
NormalizedTranscript = null,
|
||||||
|
DeviceId = finalizedTurn.DeviceId,
|
||||||
|
HostName = finalizedTurn.HostName,
|
||||||
|
RequestId = finalizedTurn.RequestId,
|
||||||
|
ProtocolService = finalizedTurn.ProtocolService,
|
||||||
|
ProtocolOperation = finalizedTurn.ProtocolOperation,
|
||||||
|
FirmwareVersion = finalizedTurn.FirmwareVersion,
|
||||||
|
ApplicationVersion = finalizedTurn.ApplicationVersion,
|
||||||
|
Locale = finalizedTurn.Locale,
|
||||||
|
TimeZone = finalizedTurn.TimeZone,
|
||||||
|
IsFollowUpEligible = finalizedTurn.IsFollowUpEligible,
|
||||||
|
Attributes = finalizedTurn.Attributes
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
var turnState = session.TurnState;
|
var turnState = session.TurnState;
|
||||||
if (string.IsNullOrWhiteSpace(finalizedTurn.NormalizedTranscript) &&
|
if (string.IsNullOrWhiteSpace(finalizedTurn.NormalizedTranscript) &&
|
||||||
string.IsNullOrWhiteSpace(finalizedTurn.RawTranscript))
|
string.IsNullOrWhiteSpace(finalizedTurn.RawTranscript))
|
||||||
@@ -460,4 +487,63 @@ public sealed class WebSocketTurnFinalizationService(
|
|||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static bool IsTranscriptUsable(TurnContext turn)
|
||||||
|
{
|
||||||
|
var transcript = NormalizeTranscript(turn.NormalizedTranscript ?? turn.RawTranscript);
|
||||||
|
if (string.IsNullOrWhiteSpace(transcript))
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (transcript.Length >= 6)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (IsYesNoTurn(turn) && transcript is "yes" or "no" or "sure" or "nope" or "yup" or "uh huh" or "yeah" or "nah")
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
return transcript is "joke" or "dance" or "time" or "date" or "today" or "day" or "hello" or "hi" or "hey";
|
||||||
|
}
|
||||||
|
|
||||||
|
private static bool IsYesNoTurn(TurnContext turn)
|
||||||
|
{
|
||||||
|
return ReadRules(turn, "listenRules").Concat(ReadRules(turn, "clientRules"))
|
||||||
|
.Any(static rule =>
|
||||||
|
string.Equals(rule, "$YESNO", StringComparison.OrdinalIgnoreCase) ||
|
||||||
|
string.Equals(rule, "create/is_it_a_keeper", StringComparison.OrdinalIgnoreCase));
|
||||||
|
}
|
||||||
|
|
||||||
|
private static IEnumerable<string> ReadRules(TurnContext turn, string key)
|
||||||
|
{
|
||||||
|
if (!turn.Attributes.TryGetValue(key, out var value) || value is null)
|
||||||
|
{
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
return value switch
|
||||||
|
{
|
||||||
|
IReadOnlyList<string> typed => typed,
|
||||||
|
IEnumerable<string> strings => strings,
|
||||||
|
JsonElement { ValueKind: JsonValueKind.Array } json => json.EnumerateArray()
|
||||||
|
.Where(static item => item.ValueKind == JsonValueKind.String)
|
||||||
|
.Select(static item => item.GetString() ?? string.Empty),
|
||||||
|
_ => []
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
private static string NormalizeTranscript(string? transcript)
|
||||||
|
{
|
||||||
|
if (string.IsNullOrWhiteSpace(transcript))
|
||||||
|
{
|
||||||
|
return string.Empty;
|
||||||
|
}
|
||||||
|
|
||||||
|
return Regex.Replace(transcript.Trim().ToLowerInvariant(), @"[^\w\s]", " ")
|
||||||
|
.Replace(" ", " ", StringComparison.Ordinal)
|
||||||
|
.Trim();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -12,9 +12,9 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategy(
|
|||||||
public bool CanHandle(TurnContext turn)
|
public bool CanHandle(TurnContext turn)
|
||||||
{
|
{
|
||||||
return options.EnableLocalWhisperCpp &&
|
return options.EnableLocalWhisperCpp &&
|
||||||
!string.IsNullOrWhiteSpace(options.FfmpegPath) &&
|
IsConfiguredPathAvailable(options.FfmpegPath, checkFileExists: false) &&
|
||||||
!string.IsNullOrWhiteSpace(options.WhisperCliPath) &&
|
IsConfiguredPathAvailable(options.WhisperCliPath, checkFileExists: true) &&
|
||||||
!string.IsNullOrWhiteSpace(options.WhisperModelPath) &&
|
IsConfiguredPathAvailable(options.WhisperModelPath, checkFileExists: true) &&
|
||||||
ReadBufferedAudioFrames(turn).Count > 0;
|
ReadBufferedAudioFrames(turn).Count > 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -148,4 +148,19 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategy(
|
|||||||
// Best-effort cleanup only.
|
// Best-effort cleanup only.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static bool IsConfiguredPathAvailable(string? path, bool checkFileExists)
|
||||||
|
{
|
||||||
|
if (string.IsNullOrWhiteSpace(path))
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!Path.IsPathRooted(path))
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
return checkFileExists ? File.Exists(path) : true;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -31,8 +31,8 @@ public static class ServiceCollectionExtensions
|
|||||||
services.AddSingleton<JiboInteractionService>();
|
services.AddSingleton<JiboInteractionService>();
|
||||||
services.AddSingleton<IConversationBroker, DemoConversationBroker>();
|
services.AddSingleton<IConversationBroker, DemoConversationBroker>();
|
||||||
services.AddSingleton<IExternalProcessRunner, ExternalProcessRunner>();
|
services.AddSingleton<IExternalProcessRunner, ExternalProcessRunner>();
|
||||||
services.AddSingleton<ISttStrategy, LocalWhisperCppBufferedAudioSttStrategy>();
|
|
||||||
services.AddSingleton<ISttStrategy, SyntheticBufferedAudioSttStrategy>();
|
services.AddSingleton<ISttStrategy, SyntheticBufferedAudioSttStrategy>();
|
||||||
|
services.AddSingleton<ISttStrategy, LocalWhisperCppBufferedAudioSttStrategy>();
|
||||||
services.AddSingleton<ISttStrategySelector, DefaultSttStrategySelector>();
|
services.AddSingleton<ISttStrategySelector, DefaultSttStrategySelector>();
|
||||||
services.AddSingleton<IWebSocketTelemetrySink, FileWebSocketTelemetrySink>();
|
services.AddSingleton<IWebSocketTelemetrySink, FileWebSocketTelemetrySink>();
|
||||||
services.AddSingleton<IProtocolTelemetrySink, FileProtocolTelemetrySink>();
|
services.AddSingleton<IProtocolTelemetrySink, FileProtocolTelemetrySink>();
|
||||||
|
|||||||
@@ -56,6 +56,39 @@ public sealed class JiboInteractionServiceTests
|
|||||||
Assert.Contains("Today is", decision.ReplyText, StringComparison.Ordinal);
|
Assert.Contains("Today is", decision.ReplyText, StringComparison.Ordinal);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task BuildDecisionAsync_YesNoFollowUp_MapsShortAffirmationToYesIntent()
|
||||||
|
{
|
||||||
|
var service = CreateService();
|
||||||
|
|
||||||
|
var decision = await service.BuildDecisionAsync(new TurnContext
|
||||||
|
{
|
||||||
|
RawTranscript = "yeah",
|
||||||
|
NormalizedTranscript = "yeah",
|
||||||
|
Attributes = new Dictionary<string, object?>
|
||||||
|
{
|
||||||
|
["listenRules"] = new[] { "create/is_it_a_keeper" }
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
Assert.Equal("yes", decision.IntentName);
|
||||||
|
Assert.Equal("Yes.", decision.ReplyText);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task BuildDecisionAsync_SkillPhraseVariant_MapsToKnownIntent()
|
||||||
|
{
|
||||||
|
var service = CreateService();
|
||||||
|
|
||||||
|
var decision = await service.BuildDecisionAsync(new TurnContext
|
||||||
|
{
|
||||||
|
RawTranscript = "make me laugh",
|
||||||
|
NormalizedTranscript = "make me laugh"
|
||||||
|
});
|
||||||
|
|
||||||
|
Assert.Equal("joke", decision.IntentName);
|
||||||
|
}
|
||||||
|
|
||||||
private static JiboInteractionService CreateService()
|
private static JiboInteractionService CreateService()
|
||||||
{
|
{
|
||||||
return new JiboInteractionService(
|
return new JiboInteractionService(
|
||||||
|
|||||||
@@ -342,6 +342,37 @@ public sealed class JiboWebSocketServiceTests
|
|||||||
Assert.Equal("clock/clock_menu", listenPayload.RootElement.GetProperty("data").GetProperty("match").GetProperty("rule").GetString());
|
Assert.Equal("clock/clock_menu", listenPayload.RootElement.GetProperty("data").GetProperty("match").GetProperty("rule").GetString());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ClientAsr_YesNoCreateFlow_PreservesCreateRuleAndDomain()
|
||||||
|
{
|
||||||
|
await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||||
|
{
|
||||||
|
HostName = "neo-hub.jibo.com",
|
||||||
|
Path = "/listen",
|
||||||
|
Kind = "neo-hub-listen",
|
||||||
|
Token = "hub-yesno-token",
|
||||||
|
Text = """{"type":"LISTEN","transID":"trans-yesno","data":{"rules":["create/is_it_a_keeper","$YESNO"]}}"""
|
||||||
|
});
|
||||||
|
|
||||||
|
var replies = await _service.HandleMessageAsync(new WebSocketMessageEnvelope
|
||||||
|
{
|
||||||
|
HostName = "neo-hub.jibo.com",
|
||||||
|
Path = "/listen",
|
||||||
|
Kind = "neo-hub-listen",
|
||||||
|
Token = "hub-yesno-token",
|
||||||
|
Text = """{"type":"CLIENT_ASR","transID":"trans-yesno","data":{"text":"yeah"}}"""
|
||||||
|
});
|
||||||
|
|
||||||
|
Assert.Equal(3, replies.Count);
|
||||||
|
|
||||||
|
using var listenPayload = JsonDocument.Parse(replies[0].Text!);
|
||||||
|
Assert.Equal("yeah", listenPayload.RootElement.GetProperty("data").GetProperty("asr").GetProperty("text").GetString());
|
||||||
|
Assert.Equal("yes", listenPayload.RootElement.GetProperty("data").GetProperty("nlu").GetProperty("intent").GetString());
|
||||||
|
Assert.Equal("create/is_it_a_keeper", listenPayload.RootElement.GetProperty("data").GetProperty("nlu").GetProperty("rules")[0].GetString());
|
||||||
|
Assert.Equal("create", listenPayload.RootElement.GetProperty("data").GetProperty("nlu").GetProperty("entities").GetProperty("domain").GetString());
|
||||||
|
Assert.Equal("create/is_it_a_keeper", listenPayload.RootElement.GetProperty("data").GetProperty("match").GetProperty("rule").GetString());
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public async Task BufferedAudio_WithSyntheticTranscriptHint_FinalizesThroughSttSeam()
|
public async Task BufferedAudio_WithSyntheticTranscriptHint_FinalizesThroughSttSeam()
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -29,6 +29,30 @@ public sealed class LocalWhisperCppBufferedAudioSttStrategyTests
|
|||||||
Assert.False(strategy.CanHandle(turn));
|
Assert.False(strategy.CanHandle(turn));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void CanHandle_ReturnsFalse_WhenConfiguredAbsoluteWhisperPathIsMissing()
|
||||||
|
{
|
||||||
|
var strategy = new LocalWhisperCppBufferedAudioSttStrategy(
|
||||||
|
new BufferedAudioSttOptions
|
||||||
|
{
|
||||||
|
EnableLocalWhisperCpp = true,
|
||||||
|
FfmpegPath = "/usr/bin/ffmpeg",
|
||||||
|
WhisperCliPath = "/path/that/does/not/exist/whisper-cli",
|
||||||
|
WhisperModelPath = "/path/that/does/not/exist/model.bin"
|
||||||
|
},
|
||||||
|
new FakeExternalProcessRunner());
|
||||||
|
|
||||||
|
var turn = new TurnContext
|
||||||
|
{
|
||||||
|
Attributes = new Dictionary<string, object?>
|
||||||
|
{
|
||||||
|
["bufferedAudioFrames"] = new[] { BuildMinimalOggPage() }
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
Assert.False(strategy.CanHandle(turn));
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public async Task TranscribeAsync_UsesFfmpegAndWhisperCpp_WhenConfigured()
|
public async Task TranscribeAsync_UsesFfmpegAndWhisperCpp_WhenConfigured()
|
||||||
{
|
{
|
||||||
|
|||||||
Reference in New Issue
Block a user