Files
JiboExperiments/OpenJibo/docs/DesignDoc/hub-service-design.md

19 KiB

Hub Service Design Document

Overview

The Hub Service is the central orchestrator of the Jibo cloud system. It coordinates all communication between the robot and cloud services, managing speech recognition, natural language understanding, skill routing, and proactive behaviors. The Hub exposes WebSocket endpoints for real-time bidirectional communication with the robot.

Location

packages/hub/src/

Key Components

HubService (HubService.ts)

Main service class extending BaseService from @jibo/utils. Initializes and manages all hub components.

HubComponents (dependency injection container):

  • parser: ParserClient - NLU service client
  • skillConfigManager: SkillConfigManager - Manages skill configurations
  • intentRouter: IntentRouter - Routes intents to skills
  • skillRequestMaker: SkillRequestMaker - Makes HTTP requests to skills
  • history: HistoryServiceClient - History service client
  • hubSettings: HubSettings - Hub configuration
  • settingsClient: SettingsClient - Settings service client

WebSocket Handlers

  • ListenHandler (listen/ListenHandler.ts) - Handles /listen and /v1/listen endpoints
  • ProactiveSocketRequestHandler (proactive/ProactiveSocketRequestHandler.ts) - Handles /proactive and /v1/proactive endpoints

Transaction Handlers

  • ListenTransactionHandler (listen/ListenTransactionHandler.ts) - State machine for listen transactions
  • ProactiveTransactionHandler (proactive/ProactiveTransactionHandler.ts) - Handles proactive action selection

WebSocket Endpoints

Listen Endpoint

URL: ws://hub:9000/listen or ws://hub:9000/v1/listen

Authentication: Bearer JWT token in Authorization header

Headers:

  • x-jibo-transid - Transaction ID
  • x-jibo-robotid - Robot ID
  • x-jibo-logging-config - Log level configuration

Proactive Endpoint

URL: ws://hub:9000/proactive or ws://hub:9000/v1/proactive

Authentication: Same as listen endpoint

Listen Transaction Flow

The listen transaction follows a state machine with the following states:

WAIT_LISTEN → ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_NLU → ROUTE → DONE

State Machine Implementation

File: packages/hub/src/listen/ListenTransactionHandler.ts

States:

  • WAIT_LISTEN - Waiting for LISTEN message from robot
  • WAIT_CLIENT_ASR - Waiting for client-provided ASR result
  • WAIT_CLIENT_NLU - Waiting for client-provided NLU result
  • ASR - Performing speech recognition
  • NLU - Performing natural language understanding
  • ROUTE - Routing to appropriate skill
  • DONE - Transaction complete
  • STOP - Transaction stopped

Timeouts:

  • ASR: 40 seconds (configurable via sosTimeout, maxSpeechTimeout)
  • Parser: 10 seconds
  • Context: 5 seconds
  • Skill: 10 seconds
  • Transaction: 60 seconds (default)

Robot-to-Hub Messages (Listen Flow)

  1. LISTEN - Initiates listen transaction

    {
      type: "LISTEN",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        mode: "default" | "CLIENT_ASR" | "CLIENT_NLU",
        lang: "en-US",
        hotphrase: boolean,
        rules: string[],
        asr: {
          sosTimeout: number,
          maxSpeechTimeout: number,
          hints: string[],
          earlyEOS: string[]
        },
        agents: ExternalAgentRequest[]
      }
    }
    
  2. Audio Packets - Binary audio data streamed after LISTEN

  3. CONTEXT - Runtime context from robot

    {
      type: "CONTEXT",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        general: {
          accountID: string,
          robotID: string,
          lang: string,
          release: string
        },
        runtime: {
          character: { emotion, motivation },
          location: { city, state, country, lat, lng },
          loop: { users, jibo, owner, loopId },
          perception: { speaker, peoplePresent },
          dialog: { referent }
        },
        skill: {
          id: string,
          session: { id, nodeID, data, trace }
        }
      }
    }
    
  4. CLIENT_ASR - Client-provided ASR result (for menu clicks, etc.)

    {
      type: "CLIENT_ASR",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        text: string
      }
    }
    
  5. CLIENT_NLU - Client-provided NLU result

    {
      type: "CLIENT_NLU",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        intent: string,
        entities: {},
        rules: []
      }
    }
    

Hub-to-Robot Messages (Listen Flow)

1. SOS (Start of Speech)

Emitted when: Speech is detected during ASR

Location: ListenTransactionHandler.emitSOS()

{
  type: "SOS",
  msgID: "uuid",
  ts: 1234567890,
  data: null,
  timings: {
    total: number
  }
}

Trigger conditions:

  • Google Cloud Speech API detects start of speech
  • ASRSession calls onStartOfSpeech callback
  • Clears SOS timeout timer

2. EOS (End of Speech)

Emitted when: Speech ends during ASR

Location: ListenTransactionHandler.emitEOS()

{
  type: "EOS",
  msgID: "uuid",
  ts: 1234567890,
  data: null,
  timings: {
    total: number
  }
}

Trigger conditions:

  • Google Cloud Speech API detects end of speech
  • ASRSession calls onEndOfSpeech callback
  • Clears max speech timeout timer

3. LISTEN Response (ASR/NLU Result)

Emitted when: ASR and NLU processing complete

Location: ListenTransactionHandler.emitListenResult()

{
  type: "LISTEN",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    asr: {
      text: string,
      confidence: number,
      annotation: "GARBAGE" | "SOS_TIMEOUT" | "MAX_SPEECH_TIMEOUT"
    },
    nlu: {
      intent: string,
      entities: {},
      rules: []
    },
    match: {
      skillID: string,
      launch: boolean,
      onRobot: boolean
    } | null
  },
  final: boolean,
  timings: {
    total: number,
    asr: number,
    nlu: number
  }
}

Emission scenarios:

  • No match: match: null, final: true - No skill matched the NLU result
  • On-robot skill: match.onRobot: true, final: true - Skill runs on robot, Hub done
  • Cloud skill: match.onRobot: false, final: false - Skill runs in cloud, Hub will send skill actions

4. SKILL_ACTION

Emitted when: Cloud skill returns an action to execute

Location: TransactionHandler.emitSkillResult()

{
  type: "SKILL_ACTION",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    action: {
      type: "JCP",
      config: {
        version: "1.0.0",
        jcp: SupportedBehaviors  // SLIM, Sequence, Parallel, SetPresentPerson, ImpactEmotion
      }
    },
    analytics?: AnalyticsData,
    fireAndForget?: boolean
  },
  final: boolean,
  timings: {
    total: number,
    skill: number
  }
}

JCP Behavior Types:

  • SLIM - Single behavior execution
  • Sequence - Sequential behavior execution
  • Parallel - Parallel behavior execution
  • SetPresentPerson - Set focused person
  • ImpactEmotion - Modify Jibo's emotional state

Emission scenarios:

  • Non-final: final: false - Robot should execute action and send CMD_RESULT back
  • Final: final: true - Transaction complete, no more actions expected

5. SKILL_REDIRECT

Emitted when: Skill redirects to another skill

Location: TransactionHandler.emitSkillRedirectNotification()

{
  type: "SKILL_REDIRECT",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    match: {
      skillID: string,
      launch: boolean,
      onRobot: boolean
    },
    nlu: NLUResult,
    asr: ASRResult,
    memo: any
  },
  final: boolean
}

Emission scenarios:

  • Skill returns SKILL_REDIRECT response
  • Hub launches new skill with provided context
  • Only one level of redirect supported (error on second redirect)

6. ERROR

Emitted when: An error occurs during transaction

Location: TransactionHandler.emitSkillResult() (error case)

{
  type: "ERROR",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    message: string
  },
  final: true,
  timings: {
    total: number
  }
}

Listen Transaction State Transitions

WAIT_LISTEN → ASR

Trigger: LISTEN message received with mode="default"

Actions:

  • Initialize ASRSession with Google Cloud Speech API
  • Start audio streaming
  • Set up SOS timeout (if configured)
  • Set up max speech timeout (if configured)

WAIT_LISTEN → WAIT_CLIENT_ASR

Trigger: LISTEN message received with mode="CLIENT_ASR"

Actions:

  • Emit fake SOS (immediate)
  • Wait for CLIENT_ASR message from robot

WAIT_LISTEN → WAIT_CLIENT_NLU

Trigger: LISTEN message received with mode="CLIENT_NLU"

Actions:

  • Emit fake SOS (immediate)
  • Wait for CLIENT_NLU message from robot

ASR → NLU

Trigger: ASR completes successfully

Actions:

  • Stop ASR session
  • Normalize ASR text
  • Check for garbage annotation (skip NLU if garbage)
  • Wait for CONTEXT message (5 second timeout)
  • Send ASR text to Parser service

WAIT_CLIENT_ASR → NLU

Trigger: CLIENT_ASR message received

Actions:

  • Use provided ASR text
  • Emit fake EOS
  • Proceed to NLU

WAIT_CLIENT_NLU → ROUTE

Trigger: CLIENT_NLU message received

Actions:

  • Use provided NLU result
  • Emit fake EOS
  • Skip NLU, proceed to routing

NLU → ROUTE

Trigger: Parser returns NLU result

Actions:

  • Wait for CONTEXT message (5 second timeout)
  • Call IntentRouter to match skill
  • Apply DecisionMediator for external factors
  • Route to matched skill or context skill

ROUTE → DONE

Trigger: Routing complete

Actions:

  • For on-robot skills: Emit LISTEN with match, transaction done
  • For cloud skills: Get skill response, emit SKILL_ACTION, transaction done
  • For no match: Emit LISTEN with match=null, transaction done

Intent Routing

IntentRouter (intent/IntentRouter.ts)

Matches NLU results to registered cloud skills.

Routing Logic:

  1. Check if NLU has intent and 'launch' rule
  2. Query all skill configurations
  3. Match intent against skill intent configurations
  4. Match entities against skill entity configurations
  5. Return first matching skill decision

DecisionMediator (intent/DecisionMediator.ts):

  • Can alter routing decisions based on external factors
  • Considers robot release version
  • May redirect to different skill based on context

IRDecisionMaker (intent/IRDecisionMaker.ts):

  • Core matching algorithm
  • Compares intent names and entity values
  • Supports exact match and NOT match rules

Skill Request Maker (skill/SkillRequestMaker.ts)

Makes HTTP requests to cloud skills.

Methods:

  • skillLaunch(skillID, data, jiboHeaders, log) - Launch new skill
  • skillLaunchOrUpdate(skillID, data, jiboHeaders, log, update) - Launch or update skill
  • proactiveLaunch(skillID, data, jiboHeaders, log) - Proactive launch

Request Format:

{
  type: "LISTEN_LAUNCH" | "LISTEN_UPDATE" | "PROACTIVE_LAUNCH",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    general: { accountID, robotID, lang, release },
    runtime: { character, location, loop, perception, dialog },
    skill: { id, session? },
    result?: any,  // For UPDATE
    nlu: NLUResult,
    asr: ASRResult,
    memo?: any
  }
}

Timeout: 10 seconds (configurable)

Error Handling:

  • SKILL_NOT_FOUND - Skill does not exist or is on-robot
  • TIMEOUT - Skill request timeout

Proactive Flow

Proactive Transaction Handler (proactive/ProactiveTransactionHandler.ts)

Handles proactive action selection based on context, history, and settings.

Robot-to-Hub Messages (Proactive Flow)

  1. TRIGGER - Initiates proactive selection

    {
      type: "TRIGGER",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        triggerData: {
          triggerType: string,
          looperID?: string
        },
        triggerSource: "SURPRISE" | "OTHER"
      }
    }
    
  2. CONTEXT - Runtime context (same as listen flow)

Hub-to-Robot Messages (Proactive Flow)

PROACTIVE Match Response

Emitted when: Proactive action selected

Location: ProactiveTransactionHandler.emitMatchResponse()

{
  type: "PROACTIVE",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    match: {
      skillID: string,
      onRobot: boolean,
      isProactive: true,
      launch: true,
      skipSurprises: boolean
    }
  },
  final: boolean
}

Emission scenarios:

  • On-robot skill: final: true - Robot handles skill, Hub done
  • Cloud skill: final: false - Hub will send skill actions

PROACTIVE No-Action Response

Emitted when: No eligible proactive action found

Location: ProactiveTransactionHandler.emitNoActionResponse()

{
  type: "PROACTIVE",
  msgID: "uuid",
  ts: 1234567890,
  data: {},
  final: true
}

Proactive Action Selection Algorithm

File: ProactiveTransactionHandler.getEligibleActions()

Steps:

  1. Get all proactive skill configurations

    • Query SkillConfigManager for skills with proactive registrations
  2. Gather transaction data

    • Extract focused person, present people, loop ID, robot ID
    • Use ContextTools to extract context fields
  3. Fetch user settings (if focused person)

    • Batch request to SettingsClient for all skill settings
    • Consolidate into skill settings map
  4. Filter by context rules

    • Check time-based rules (time of day, day of week)
    • Check location rules
    • Check people present rules
    • Check robot state rules
  5. Filter by interaction history rules

    • Query History service for past interactions
    • Check frequency rules (e.g., "at most once per hour")
    • Check recency rules (e.g., "not in last 10 minutes")
    • Check sequence rules (e.g., "after greeting skill")
  6. Filter by settings rules

    • Check user preferences for each skill
    • Check enabled/disabled status
    • Check custom parameters
  7. Select action

    • Currently: Random selection from eligible actions
    • Future: Heuristics based on context, engagement, topics

Context Tools (proactive/tools/ContextTools.ts)

Helper functions for context rule evaluation:

  • extractContextData(field, context, requestData, log) - Extract specific context field
  • checkContextRules(registration, context, requestData, log) - Evaluate all context rules

History Rules Checker (proactive/tools/IHRulesChecker.ts)

Evaluates interaction history rules:

  • checkIHRules(registrations, IHQueries, data, log) - Filter by history rules
  • Queries History service for past skill launches
  • Applies frequency, recency, and sequence constraints

Settings Rules Checker (proactive/tools/SettingsRulesChecker.ts)

Evaluates user settings:

  • getSkillSettingsMap(skillConfigs, accountID, loopID, transID) - Batch fetch settings
  • checkSettingsRegistrations(registrations, skillSettingsMap) - Filter by settings

Skill Interaction Flow (Cloud Skills)

Initial Launch

  1. Hub sends LISTEN_LAUNCH request to skill
  2. Skill processes request, returns SKILL_ACTION
  3. Hub sends SKILL_ACTION to robot
  4. Robot executes action, sends CMD_RESULT to Hub
  5. Hub sends LISTEN_UPDATE request to skill with action result
  6. Skill processes result, returns next SKILL_ACTION or final=true
  7. Repeat steps 3-6 until skill returns final=true

Skill Redirect

  1. Skill returns SKILL_REDIRECT response
  2. Hub emits SKILL_REDIRECT notification to robot
  3. Hub sends launch request to new skill
  4. New skill proceeds with normal flow
  5. Error if second redirect attempted

Message Timing

Listen Transaction Timing

Timings tracked:

  • total - Total transaction time
  • asr - ASR processing time
  • nlu - NLU processing time
  • skill - Skill processing time

Timing emission:

  • SOS/EOS include timing from start
  • LISTEN response includes ASR and NLU timings
  • SKILL_ACTION includes skill timing

Proactive Transaction Timing

Timings tracked:

  • total - Total transaction time
  • skill - Skill processing time

Error Handling

Hub Error Codes (HubErrorCode.ts)

  • TIMEOUT_ASR - ASR timeout (40 seconds)
  • TIMEOUT_PARSER - Parser timeout (10 seconds)
  • TIMEOUT_CONTEXT - Context timeout (5 seconds)
  • TIMEOUT_SKILL - Skill timeout (10 seconds)
  • PARSER - Parser error
  • ASR - ASR error

Error Response Format

{
  type: "ERROR",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    message: string,
    code?: string
  },
  final: true,
  timings: {
    total: number
  }
}

Speech History Recording

Optional Features

Configuration:

  • ETCO_hub_recordLaunchHistory - Record skill launches to MongoDB
  • ETCO_hub_recordSpeechHistory - Record speech interactions to MongoDB
  • ETCO_hub_recordSpeechLogBucket - Upload speech logs to S3

Speech History Record

Data recorded:

  • Robot ID, account ID, transaction ID
  • Timestamp
  • ASR result
  • NLU result
  • Match data
  • Skill response
  • Redirect data
  • Error (if any)

S3 Upload

Format: JSON with audio as base64

Path: {robotID}/year={year}/month={month}/day={day}/{timestamp}-{transID}.json

Hub Configuration

Environment Variables

Hub Settings:

  • ETCO_hub_recordLaunchHistory - Enable launch history
  • ETCO_hub_recordSpeechHistory - Enable speech history
  • ETCO_hub_recordSpeechLogBucket - S3 bucket for speech logs

Authentication:

  • ETCO_server_hubTokenSecret - JWT secret for token verification

Skill Configuration

Sources:

  • skills-local.json - Local development configuration
  • Environment variables - Production configuration
  • Settings service - Dynamic configuration

Skill Config Structure:

{
  id: string,
  intents: [{
    name: string,
    entities?: [{ name, value, matchRule }],
    memo?: any
  }],
  proactives?: [{
    triggerType: string,
    contextRules?: ContextRule[],
    IHRules?: IHRule[],
    settingsRules?: SettingsRule[],
    memo?: any
  }],
  IHQueries?: IHQueryDefinitions,
  onRobot?: boolean,
  URL: string,
  settings?: ManifestSettings
}

Summary of Server-to-Robot Communication

Listen Flow

  1. SOS - Speech detected
  2. EOS - Speech ended
  3. LISTEN - ASR/NLU result with match data
  4. SKILL_ACTION - JCP action to execute (repeated for multi-turn)
  5. SKILL_REDIRECT - Skill redirect notification
  6. ERROR - Error occurred

Proactive Flow

  1. PROACTIVE - Match or no-action response
  2. SKILL_ACTION - JCP action to execute (if cloud skill)
  3. SKILL_REDIRECT - Skill redirect notification
  4. ERROR - Error occurred

Key Design Principles

  1. State Machine - Clear state transitions with validation
  2. Timeouts - Every operation has a timeout to prevent hanging
  3. Error Handling - Errors propagate to robot with clear messages
  4. Timing - All operations are timed for monitoring
  5. History - All interactions are recorded for analysis
  6. Flexibility - Supports on-robot and cloud skills
  7. Proactivity - Context-aware action selection