19 KiB
Hub Service Design Document
Overview
The Hub Service is the central orchestrator of the Jibo cloud system. It coordinates all communication between the robot and cloud services, managing speech recognition, natural language understanding, skill routing, and proactive behaviors. The Hub exposes WebSocket endpoints for real-time bidirectional communication with the robot.
Location
packages/hub/src/
Key Components
HubService (HubService.ts)
Main service class extending BaseService from @jibo/utils. Initializes and manages all hub components.
HubComponents (dependency injection container):
parser: ParserClient- NLU service clientskillConfigManager: SkillConfigManager- Manages skill configurationsintentRouter: IntentRouter- Routes intents to skillsskillRequestMaker: SkillRequestMaker- Makes HTTP requests to skillshistory: HistoryServiceClient- History service clienthubSettings: HubSettings- Hub configurationsettingsClient: SettingsClient- Settings service client
WebSocket Handlers
- ListenHandler (
listen/ListenHandler.ts) - Handles/listenand/v1/listenendpoints - ProactiveSocketRequestHandler (
proactive/ProactiveSocketRequestHandler.ts) - Handles/proactiveand/v1/proactiveendpoints
Transaction Handlers
- ListenTransactionHandler (
listen/ListenTransactionHandler.ts) - State machine for listen transactions - ProactiveTransactionHandler (
proactive/ProactiveTransactionHandler.ts) - Handles proactive action selection
WebSocket Endpoints
Listen Endpoint
URL: ws://hub:9000/listen or ws://hub:9000/v1/listen
Authentication: Bearer JWT token in Authorization header
Headers:
x-jibo-transid- Transaction IDx-jibo-robotid- Robot IDx-jibo-logging-config- Log level configuration
Proactive Endpoint
URL: ws://hub:9000/proactive or ws://hub:9000/v1/proactive
Authentication: Same as listen endpoint
Listen Transaction Flow
The listen transaction follows a state machine with the following states:
WAIT_LISTEN → ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_NLU → ROUTE → DONE
State Machine Implementation
File: packages/hub/src/listen/ListenTransactionHandler.ts
States:
WAIT_LISTEN- Waiting for LISTEN message from robotWAIT_CLIENT_ASR- Waiting for client-provided ASR resultWAIT_CLIENT_NLU- Waiting for client-provided NLU resultASR- Performing speech recognitionNLU- Performing natural language understandingROUTE- Routing to appropriate skillDONE- Transaction completeSTOP- Transaction stopped
Timeouts:
- ASR: 40 seconds (configurable via sosTimeout, maxSpeechTimeout)
- Parser: 10 seconds
- Context: 5 seconds
- Skill: 10 seconds
- Transaction: 60 seconds (default)
Robot-to-Hub Messages (Listen Flow)
-
LISTEN - Initiates listen transaction
{ type: "LISTEN", msgID: "uuid", ts: 1234567890, data: { mode: "default" | "CLIENT_ASR" | "CLIENT_NLU", lang: "en-US", hotphrase: boolean, rules: string[], asr: { sosTimeout: number, maxSpeechTimeout: number, hints: string[], earlyEOS: string[] }, agents: ExternalAgentRequest[] } } -
Audio Packets - Binary audio data streamed after LISTEN
-
CONTEXT - Runtime context from robot
{ type: "CONTEXT", msgID: "uuid", ts: 1234567890, data: { general: { accountID: string, robotID: string, lang: string, release: string }, runtime: { character: { emotion, motivation }, location: { city, state, country, lat, lng }, loop: { users, jibo, owner, loopId }, perception: { speaker, peoplePresent }, dialog: { referent } }, skill: { id: string, session: { id, nodeID, data, trace } } } } -
CLIENT_ASR - Client-provided ASR result (for menu clicks, etc.)
{ type: "CLIENT_ASR", msgID: "uuid", ts: 1234567890, data: { text: string } } -
CLIENT_NLU - Client-provided NLU result
{ type: "CLIENT_NLU", msgID: "uuid", ts: 1234567890, data: { intent: string, entities: {}, rules: [] } }
Hub-to-Robot Messages (Listen Flow)
1. SOS (Start of Speech)
Emitted when: Speech is detected during ASR
Location: ListenTransactionHandler.emitSOS()
{
type: "SOS",
msgID: "uuid",
ts: 1234567890,
data: null,
timings: {
total: number
}
}
Trigger conditions:
- Google Cloud Speech API detects start of speech
- ASRSession calls
onStartOfSpeechcallback - Clears SOS timeout timer
2. EOS (End of Speech)
Emitted when: Speech ends during ASR
Location: ListenTransactionHandler.emitEOS()
{
type: "EOS",
msgID: "uuid",
ts: 1234567890,
data: null,
timings: {
total: number
}
}
Trigger conditions:
- Google Cloud Speech API detects end of speech
- ASRSession calls
onEndOfSpeechcallback - Clears max speech timeout timer
3. LISTEN Response (ASR/NLU Result)
Emitted when: ASR and NLU processing complete
Location: ListenTransactionHandler.emitListenResult()
{
type: "LISTEN",
msgID: "uuid",
ts: 1234567890,
data: {
asr: {
text: string,
confidence: number,
annotation: "GARBAGE" | "SOS_TIMEOUT" | "MAX_SPEECH_TIMEOUT"
},
nlu: {
intent: string,
entities: {},
rules: []
},
match: {
skillID: string,
launch: boolean,
onRobot: boolean
} | null
},
final: boolean,
timings: {
total: number,
asr: number,
nlu: number
}
}
Emission scenarios:
- No match:
match: null, final: true- No skill matched the NLU result - On-robot skill:
match.onRobot: true, final: true- Skill runs on robot, Hub done - Cloud skill:
match.onRobot: false, final: false- Skill runs in cloud, Hub will send skill actions
4. SKILL_ACTION
Emitted when: Cloud skill returns an action to execute
Location: TransactionHandler.emitSkillResult()
{
type: "SKILL_ACTION",
msgID: "uuid",
ts: 1234567890,
data: {
action: {
type: "JCP",
config: {
version: "1.0.0",
jcp: SupportedBehaviors // SLIM, Sequence, Parallel, SetPresentPerson, ImpactEmotion
}
},
analytics?: AnalyticsData,
fireAndForget?: boolean
},
final: boolean,
timings: {
total: number,
skill: number
}
}
JCP Behavior Types:
SLIM- Single behavior executionSequence- Sequential behavior executionParallel- Parallel behavior executionSetPresentPerson- Set focused personImpactEmotion- Modify Jibo's emotional state
Emission scenarios:
- Non-final:
final: false- Robot should execute action and send CMD_RESULT back - Final:
final: true- Transaction complete, no more actions expected
5. SKILL_REDIRECT
Emitted when: Skill redirects to another skill
Location: TransactionHandler.emitSkillRedirectNotification()
{
type: "SKILL_REDIRECT",
msgID: "uuid",
ts: 1234567890,
data: {
match: {
skillID: string,
launch: boolean,
onRobot: boolean
},
nlu: NLUResult,
asr: ASRResult,
memo: any
},
final: boolean
}
Emission scenarios:
- Skill returns
SKILL_REDIRECTresponse - Hub launches new skill with provided context
- Only one level of redirect supported (error on second redirect)
6. ERROR
Emitted when: An error occurs during transaction
Location: TransactionHandler.emitSkillResult() (error case)
{
type: "ERROR",
msgID: "uuid",
ts: 1234567890,
data: {
message: string
},
final: true,
timings: {
total: number
}
}
Listen Transaction State Transitions
WAIT_LISTEN → ASR
Trigger: LISTEN message received with mode="default"
Actions:
- Initialize ASRSession with Google Cloud Speech API
- Start audio streaming
- Set up SOS timeout (if configured)
- Set up max speech timeout (if configured)
WAIT_LISTEN → WAIT_CLIENT_ASR
Trigger: LISTEN message received with mode="CLIENT_ASR"
Actions:
- Emit fake SOS (immediate)
- Wait for CLIENT_ASR message from robot
WAIT_LISTEN → WAIT_CLIENT_NLU
Trigger: LISTEN message received with mode="CLIENT_NLU"
Actions:
- Emit fake SOS (immediate)
- Wait for CLIENT_NLU message from robot
ASR → NLU
Trigger: ASR completes successfully
Actions:
- Stop ASR session
- Normalize ASR text
- Check for garbage annotation (skip NLU if garbage)
- Wait for CONTEXT message (5 second timeout)
- Send ASR text to Parser service
WAIT_CLIENT_ASR → NLU
Trigger: CLIENT_ASR message received
Actions:
- Use provided ASR text
- Emit fake EOS
- Proceed to NLU
WAIT_CLIENT_NLU → ROUTE
Trigger: CLIENT_NLU message received
Actions:
- Use provided NLU result
- Emit fake EOS
- Skip NLU, proceed to routing
NLU → ROUTE
Trigger: Parser returns NLU result
Actions:
- Wait for CONTEXT message (5 second timeout)
- Call IntentRouter to match skill
- Apply DecisionMediator for external factors
- Route to matched skill or context skill
ROUTE → DONE
Trigger: Routing complete
Actions:
- For on-robot skills: Emit LISTEN with match, transaction done
- For cloud skills: Get skill response, emit SKILL_ACTION, transaction done
- For no match: Emit LISTEN with match=null, transaction done
Intent Routing
IntentRouter (intent/IntentRouter.ts)
Matches NLU results to registered cloud skills.
Routing Logic:
- Check if NLU has intent and 'launch' rule
- Query all skill configurations
- Match intent against skill intent configurations
- Match entities against skill entity configurations
- Return first matching skill decision
DecisionMediator (intent/DecisionMediator.ts):
- Can alter routing decisions based on external factors
- Considers robot release version
- May redirect to different skill based on context
IRDecisionMaker (intent/IRDecisionMaker.ts):
- Core matching algorithm
- Compares intent names and entity values
- Supports exact match and NOT match rules
Skill Request Maker (skill/SkillRequestMaker.ts)
Makes HTTP requests to cloud skills.
Methods:
skillLaunch(skillID, data, jiboHeaders, log)- Launch new skillskillLaunchOrUpdate(skillID, data, jiboHeaders, log, update)- Launch or update skillproactiveLaunch(skillID, data, jiboHeaders, log)- Proactive launch
Request Format:
{
type: "LISTEN_LAUNCH" | "LISTEN_UPDATE" | "PROACTIVE_LAUNCH",
msgID: "uuid",
ts: 1234567890,
data: {
general: { accountID, robotID, lang, release },
runtime: { character, location, loop, perception, dialog },
skill: { id, session? },
result?: any, // For UPDATE
nlu: NLUResult,
asr: ASRResult,
memo?: any
}
}
Timeout: 10 seconds (configurable)
Error Handling:
SKILL_NOT_FOUND- Skill does not exist or is on-robotTIMEOUT- Skill request timeout
Proactive Flow
Proactive Transaction Handler (proactive/ProactiveTransactionHandler.ts)
Handles proactive action selection based on context, history, and settings.
Robot-to-Hub Messages (Proactive Flow)
-
TRIGGER - Initiates proactive selection
{ type: "TRIGGER", msgID: "uuid", ts: 1234567890, data: { triggerData: { triggerType: string, looperID?: string }, triggerSource: "SURPRISE" | "OTHER" } } -
CONTEXT - Runtime context (same as listen flow)
Hub-to-Robot Messages (Proactive Flow)
PROACTIVE Match Response
Emitted when: Proactive action selected
Location: ProactiveTransactionHandler.emitMatchResponse()
{
type: "PROACTIVE",
msgID: "uuid",
ts: 1234567890,
data: {
match: {
skillID: string,
onRobot: boolean,
isProactive: true,
launch: true,
skipSurprises: boolean
}
},
final: boolean
}
Emission scenarios:
- On-robot skill:
final: true- Robot handles skill, Hub done - Cloud skill:
final: false- Hub will send skill actions
PROACTIVE No-Action Response
Emitted when: No eligible proactive action found
Location: ProactiveTransactionHandler.emitNoActionResponse()
{
type: "PROACTIVE",
msgID: "uuid",
ts: 1234567890,
data: {},
final: true
}
Proactive Action Selection Algorithm
File: ProactiveTransactionHandler.getEligibleActions()
Steps:
-
Get all proactive skill configurations
- Query SkillConfigManager for skills with proactive registrations
-
Gather transaction data
- Extract focused person, present people, loop ID, robot ID
- Use ContextTools to extract context fields
-
Fetch user settings (if focused person)
- Batch request to SettingsClient for all skill settings
- Consolidate into skill settings map
-
Filter by context rules
- Check time-based rules (time of day, day of week)
- Check location rules
- Check people present rules
- Check robot state rules
-
Filter by interaction history rules
- Query History service for past interactions
- Check frequency rules (e.g., "at most once per hour")
- Check recency rules (e.g., "not in last 10 minutes")
- Check sequence rules (e.g., "after greeting skill")
-
Filter by settings rules
- Check user preferences for each skill
- Check enabled/disabled status
- Check custom parameters
-
Select action
- Currently: Random selection from eligible actions
- Future: Heuristics based on context, engagement, topics
Context Tools (proactive/tools/ContextTools.ts)
Helper functions for context rule evaluation:
extractContextData(field, context, requestData, log)- Extract specific context fieldcheckContextRules(registration, context, requestData, log)- Evaluate all context rules
History Rules Checker (proactive/tools/IHRulesChecker.ts)
Evaluates interaction history rules:
checkIHRules(registrations, IHQueries, data, log)- Filter by history rules- Queries History service for past skill launches
- Applies frequency, recency, and sequence constraints
Settings Rules Checker (proactive/tools/SettingsRulesChecker.ts)
Evaluates user settings:
getSkillSettingsMap(skillConfigs, accountID, loopID, transID)- Batch fetch settingscheckSettingsRegistrations(registrations, skillSettingsMap)- Filter by settings
Skill Interaction Flow (Cloud Skills)
Initial Launch
- Hub sends LISTEN_LAUNCH request to skill
- Skill processes request, returns SKILL_ACTION
- Hub sends SKILL_ACTION to robot
- Robot executes action, sends CMD_RESULT to Hub
- Hub sends LISTEN_UPDATE request to skill with action result
- Skill processes result, returns next SKILL_ACTION or final=true
- Repeat steps 3-6 until skill returns final=true
Skill Redirect
- Skill returns SKILL_REDIRECT response
- Hub emits SKILL_REDIRECT notification to robot
- Hub sends launch request to new skill
- New skill proceeds with normal flow
- Error if second redirect attempted
Message Timing
Listen Transaction Timing
Timings tracked:
total- Total transaction timeasr- ASR processing timenlu- NLU processing timeskill- Skill processing time
Timing emission:
- SOS/EOS include timing from start
- LISTEN response includes ASR and NLU timings
- SKILL_ACTION includes skill timing
Proactive Transaction Timing
Timings tracked:
total- Total transaction timeskill- Skill processing time
Error Handling
Hub Error Codes (HubErrorCode.ts)
TIMEOUT_ASR- ASR timeout (40 seconds)TIMEOUT_PARSER- Parser timeout (10 seconds)TIMEOUT_CONTEXT- Context timeout (5 seconds)TIMEOUT_SKILL- Skill timeout (10 seconds)PARSER- Parser errorASR- ASR error
Error Response Format
{
type: "ERROR",
msgID: "uuid",
ts: 1234567890,
data: {
message: string,
code?: string
},
final: true,
timings: {
total: number
}
}
Speech History Recording
Optional Features
Configuration:
ETCO_hub_recordLaunchHistory- Record skill launches to MongoDBETCO_hub_recordSpeechHistory- Record speech interactions to MongoDBETCO_hub_recordSpeechLogBucket- Upload speech logs to S3
Speech History Record
Data recorded:
- Robot ID, account ID, transaction ID
- Timestamp
- ASR result
- NLU result
- Match data
- Skill response
- Redirect data
- Error (if any)
S3 Upload
Format: JSON with audio as base64
Path: {robotID}/year={year}/month={month}/day={day}/{timestamp}-{transID}.json
Hub Configuration
Environment Variables
Hub Settings:
ETCO_hub_recordLaunchHistory- Enable launch historyETCO_hub_recordSpeechHistory- Enable speech historyETCO_hub_recordSpeechLogBucket- S3 bucket for speech logs
Authentication:
ETCO_server_hubTokenSecret- JWT secret for token verification
Skill Configuration
Sources:
skills-local.json- Local development configuration- Environment variables - Production configuration
- Settings service - Dynamic configuration
Skill Config Structure:
{
id: string,
intents: [{
name: string,
entities?: [{ name, value, matchRule }],
memo?: any
}],
proactives?: [{
triggerType: string,
contextRules?: ContextRule[],
IHRules?: IHRule[],
settingsRules?: SettingsRule[],
memo?: any
}],
IHQueries?: IHQueryDefinitions,
onRobot?: boolean,
URL: string,
settings?: ManifestSettings
}
Summary of Server-to-Robot Communication
Listen Flow
- SOS - Speech detected
- EOS - Speech ended
- LISTEN - ASR/NLU result with match data
- SKILL_ACTION - JCP action to execute (repeated for multi-turn)
- SKILL_REDIRECT - Skill redirect notification
- ERROR - Error occurred
Proactive Flow
- PROACTIVE - Match or no-action response
- SKILL_ACTION - JCP action to execute (if cloud skill)
- SKILL_REDIRECT - Skill redirect notification
- ERROR - Error occurred
Key Design Principles
- State Machine - Clear state transitions with validation
- Timeouts - Every operation has a timeout to prevent hanging
- Error Handling - Errors propagate to robot with clear messages
- Timing - All operations are timed for monitoring
- History - All interactions are recorded for analysis
- Flexibility - Supports on-robot and cloud skills
- Proactivity - Context-aware action selection