Files
JiboExperiments/OpenJibo/docs/DesignDoc/skill-framework-design.md

21 KiB

Skill Framework Design Document

Overview

The Skill Framework provides the foundation for building cloud-based skills for the Jibo robot. It consists of a base class for all skills, a graph-based state machine for complex conversational flows, and a system for generating JCP (Jibo Command Protocol) actions that are sent to the robot.

Location

packages/baseskill/src/

Core Components

BaseSkill (BaseSkill.ts)

Abstract base class that all cloud skills must extend.

Purpose: Provides common HTTP handling and error handling for all skills.

Key Features:

  • Extends BaseHttpHandler from @jibo/utils
  • Registers POST handler at / endpoint
  • Validates request structure
  • Tracks timing for each request
  • Provides error response builder

Constructor:

constructor(public name: string)

Abstract Method:

protected abstract handle(request: PegasusRequest<SkillRequest>): Promise<SkillResponse>;

Lifecycle Methods:

  • init(): Promise<void> - Override to initialize resources (load files, connect to services)
  • buildErrorResponse(err: Error): ErrorResponse - Builds standardized error response

HTTP Handler:

  • Accepts POST requests at /
  • Logs request type
  • Calls handle() method
  • Adds timing information
  • Catches errors and returns error response

GraphSkill (GraphSkill.ts)

Extends BaseSkill with a graph-based state machine for complex conversational flows.

Purpose: Enables skills to define their logic as a series of interconnected nodes (states) with transitions.

Key Features:

  • Implements GraphFactory interface
  • Manages graph execution via GraphManager singleton
  • Supports skill redirects
  • Tracks analytics events
  • Supports supplemental behaviors (parallel/sequence)
  • Handles both launch and update requests

Constructor:

constructor(name: string)

Abstract Method:

abstract createGraph(): Graph<ExitTransition>

Request Handling:

Launch Requests (LISTEN_LAUNCH or PROACTIVE_LAUNCH):

  1. Validates request data (accountID, robotID, skill ID)
  2. Initializes skill session data
  3. Tracks SKILL_ENTRY analytics event
  4. Calls GraphManager.instance.start(graph, data) to begin graph execution
  5. Returns SKILL_ACTION or SKILL_REDIRECT response

Update Requests (LISTEN_UPDATE):

  1. Validates request data
  2. Calls GraphManager.instance.exitNode(data) to process action results
  3. Returns next SKILL_ACTION or final response

Response Types:

  1. SKILL_REDIRECT - Redirects to another skill

    {
      type: "SKILL_REDIRECT",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        skillID: string,
        nlu?: NLUResult,
        asr?: ASRResult,
        memo?: any
      }
    }
    
  2. SKILL_ACTION - Returns JCP action for robot to execute

    {
      type: "SKILL_ACTION",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        action: JCPAction,
        analytics: AnalyticsData,
        final: boolean,
        fireAndForget: boolean
      }
    }
    
  3. Final Response - No action, transaction complete

    {
      type: "SKILL_ACTION",
      msgID: "uuid",
      ts: 1234567890,
      data: {
        action: null,
        analytics: AnalyticsData,
        final: true,
        fireAndForget: true
      }
    }
    

Convenience Methods:

  • track(data, event, properties) - Track analytics event
  • overrideSpeaker(data, id) - Override current speaker in context
  • addParallelBehavior(data, behavior) - Add behavior to execute in parallel
  • addSequenceBehavior(data, behavior) - Add behavior to execute in sequence

Supplemental Behaviors Injection:

When a skill returns a JCP action, the framework injects any supplemental behaviors that were added during execution:

  1. If sequence behaviors exist, wraps main action in a Sequence
  2. If parallel behaviors exist, wraps result in a Parallel
  3. Final JCP action is sent to robot

Example:

// Skill adds parallel behavior
this.addParallelBehavior(data, SetPresentPersonBehavior);

// Skill returns main action
return { action: SayTextBehavior };

// Framework injects: Parallel([SetPresentPersonBehavior, SayTextBehavior])

Graph System

Graph (graph/Graph.ts)

Represents a directed graph of connected nodes (states).

Purpose: Defines the structure of a skill's conversation flow.

Key Properties:

  • name: string - Graph name
  • initial: Node - Starting node
  • nodes: Set<Node> - All nodes in graph
  • exitTransitions: Map<ExitTransition, TransitionContainer[]> - Exit transition mappings

Constructor:

constructor(name: string, exitTransitionNames: ExitTransition[])

Methods:

  • setInitialNode(node) - Sets the starting node
  • addNode(node, transitionMapping) - Adds a node and connects its transitions
  • addSubGraph(subGraph, transitionMapping) - Adds a subgraph and connects its exits
  • finalize() - Validates graph and locks it for execution
  • writeDotFile(filePath) - Generates GraphViz dot file for visualization

Transition Mapping:

[
  [TransitionName, DestinationNode],  // Transition to another node
  [TransitionName, ExitTransition]   // Exit from graph
]

Validation (in finalize):

  • All nodes must be reachable from initial node
  • All exit transitions must be connected
  • All transitions must have valid destinations
  • No duplicate transition names

Subgraphs:

  • Graphs can be nested within other graphs
  • Subgraph exit transitions connect to parent graph nodes
  • Enables hierarchical organization of complex flows
  • Nodes can belong to multiple graphs (for subgraph sharing)

GraphViz Visualization:

  • Generates .dot files for graph visualization
  • Color-codes initial node, regular nodes, and exit states
  • Shows hierarchical structure with clusters
  • Labels transitions with their names

GraphManager (graph/GraphManager.ts)

Singleton that manages graph execution and skill sessions.

Purpose: Coordinates node execution and maintains session state.

Singleton Pattern:

GraphManager.instance  // Access singleton

Key Responsibilities:

  • Assigns unique IDs to all nodes
  • Maps node IDs to node instances
  • Manages skill session lifecycle
  • Executes node enter/exit lifecycle
  • Handles transitions between nodes

Session Structure:

{
  id: string,           // Session UUID
  nodeID: number,      // Current node ID
  data: any,           // Skill-specific session data
  trace: [             // History of transitions
    { nodeID: number, transition: string }
  ]
}

Execution Flow:

Start Graph (launch request):

start(graph, data)
   Creates new session
   Sets initial node
   Calls enterNode()

Enter Node:

enterNode(data)
   Fetches current node
   Calls node.enter(data)
   Updates trace
   If action returned: return action
   Else: call exitNode()

Exit Node:

exitNode(data)
   Fetches current node
   Calls node.exit(data)
   If transition returned: executeTransition()
   Else: return (terminal)

Execute Transition:

executeTransition(node, result, data)
   Validates transition exists
   Updates trace with transition name
   If terminal: return null
   Else: update nodeID, call enterNode()

Node ID Assignment:

  • Counter starts at 0, increments for each node
  • Bidirectional mapping: node ↔ ID
  • Enables serialization of session state

Node (graph/nodes/Node.ts)

Abstract base class for all graph nodes.

Purpose: Defines a state in the skill's conversation flow.

Key Properties:

  • id: number - Unique ID assigned by GraphManager
  • name: string - Node name
  • transitionNames: Transition[] - Valid exit transitions
  • graphs: Graph[] - Graphs this node belongs to
  • transitions: Map<Transition, TransitionContainer> - Transition destinations

Constructor:

constructor(name: string, transitionNames: Transition[])

Abstract Methods:

abstract async enter(data: Data): Promise<EnterResponse>
  • Called when node is entered
  • Returns action to execute, redirect, or nothing
abstract async exit(data: Data): Promise<ExitResponse>
  • Called with action results (if action was issued)
  • Returns next transition or nothing (terminal)

Data Structure:

Data = {
  // From request
  general: { accountID, robotID, lang, release },
  runtime: { character, location, loop, perception, dialog },
  skill: { id, session },
  result?: any,  // Action results for UPDATE
  
  // Added by framework
  req: PegasusRequest,
  log: Log,
  local: any,           // Skill-local data
  analytics: {},        // Analytics events
  behaviors: {          // Supplemental behaviors
    parallel: [],
    sequence: []
  }
}

Response Types:

EnterResponse:

{
  action?: Action,      // JCP action to execute
  redirect?: RedirectData,  // Redirect to another skill
  final?: boolean       // Is this the final response?
}

ExitResponse:

{
  transition?: string,  // Next transition to take
  result?: any,         // Result to pass to next node
  redirect?: RedirectData
}

Built-in Node Types:

  1. DefaultNode - Simple terminal node

    • Returns no action
    • Transitions to Done
  2. NoOpNode - No operation node

    • Returns no action
    • Can have custom transitions
  3. JCPNode - Returns a JCP action

    • Returns specified JCP behavior
    • Can be terminal or continue
  4. TrueFalseNode - Conditional branching

    • Evaluates condition
    • Transitions based on true/false
  5. SetLooperIDNode - Sets speaker ID

    • Updates perception.speaker in context
    • Useful for multi-turn conversations

Node Traversal:

  • forEachDescendent(handler) - BFS traversal of all descendant nodes
  • Used for graph validation and analysis

Skill Request/Response Protocol

Skill Request Types

Location: packages/interfaces/src/skill/request.ts

MessageType:

  • LISTEN_LAUNCH - Launch skill from listen interaction
  • LISTEN_UPDATE - Update skill with action results
  • PROACTIVE_LAUNCH - Launch skill proactively

Request Structure:

{
  type: MessageType,
  msgID: "uuid",
  ts: 1234567890,
  data: {
    general: {
      accountID: string,
      robotID: string,
      lang: string,
      release: string
    },
    runtime: {
      character: { emotion, motivation },
      location: { city, state, country, lat, lng },
      loop: { users, jibo, owner, loopId },
      perception: { speaker, peoplePresent },
      dialog: { referent }
    },
    skill: {
      id: string,
      session?: {
        id: string,
        nodeID: number,
        data: any,
        trace: [{ nodeID, transition }]
      }
    },
    result?: any,  // Action results for UPDATE
    nlu?: NLUResult,
    asr?: ASRResult,
    memo?: any
  }
}

Skill Response Types

Location: packages/interfaces/src/skill/response.ts

ResponseType:

  • SKILL_ACTION - Returns action to execute
  • SKILL_REDIRECT - Redirects to another skill
  • ERROR - Error response

SKILL_ACTION Response:

{
  type: "SKILL_ACTION",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    action: JCPAction | null,
    analytics: AnalyticsData,
    final: boolean,
    fireAndForget: boolean
  }
}

SKILL_REDIRECT Response:

{
  type: "SKILL_REDIRECT",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    skillID: string,
    nlu?: NLUResult,
    asr?: ASRResult,
    memo?: any
  }
}

ERROR Response:

{
  type: "ERROR",
  msgID: "uuid",
  ts: 1234567890,
  data: {
    message: string,
    skill: { id: string }
  }
}

JCP Actions

Location: packages/interfaces/src/skill/action.ts

Purpose: Defines behaviors that the robot should execute.

ActionType:

  • JCP - Jibo Command Protocol action

JCPAction Structure:

{
  type: "JCP",
  config: {
    version: "1.0.0",
    jcp: SupportedBehaviors
  }
}

SupportedBehaviors:

  • SLIM - Single behavior execution
  • Sequence - Sequential behavior execution
  • Parallel - Parallel behavior execution
  • SetPresentPerson - Set focused person
  • ImpactEmotion - Modify Jibo's emotional state

Helper Function:

generateJCPAction(behavior): JCPAction

Wraps a behavior as a JCP action with version 2.0.

MIM (Motion Interaction Model) System

Location: packages/baseskill/src/graph/mims/

Purpose: Provides pre-built graph structures for playing MIM animations.

MIM Files:

  • .mim files contain animation definitions
  • Organized in directories:
    • scripted-responses - Pre-scripted responses
    • emotion-responses - Emotion-based responses
    • core-responses - Fallback responses

MIM Factories:

ANFactory - Animation Node Factory

  • Creates graph for playing a single MIM
  • Supports prompt data injection
  • Can be final or continue

MANFactory - Multiple Animation Node Factory

  • Creates graph for playing multiple MIMs
  • Supports random selection
  • Can be final or continue

MIMFactory - General MIM Factory

  • Creates graph for MIM playback
  • Supports semi-specific responses
  • Handles category-based selection

QNFactory - Question Node Factory

  • Creates graph for asking questions
  • Supports opt-in flows
  • Handles user responses

OptInFactory - Opt-In Node Factory

  • Creates graph for opt-in offers
  • Tracks user acceptance/rejection
  • Handles analytics

MIM Factory Options:

{
  mimDataProvider: (data) => string[],  // Function to get MIM paths
  promptDataProvider?: (data) => any,   // Function to get prompt data
  final: boolean                        // Is this the final action?
}

Example Usage (Chitchat Skill):

const doMIMOptions: MimFactoryOptions = {
  mimDataProvider: (data) => data.local.path,
  promptDataProvider: (data) => data.local.promptData,
  final: true
};
const doMIM = new ANFactory('Do MIM', doMIMOptions).createGraph();

Semi-Specific Responses:

  • MIMs with _SS_ suffix are semi-specific
  • Match specific categories (e.g., time, weather)
  • CSV files define category members
  • Enables context-aware responses

SkillService (SkillService.ts)

Service wrapper that hosts a skill as an HTTP service.

Purpose: Provides the service infrastructure for running a skill.

Constructor:

constructor(private skillV1: BaseSkill)

HTTP Handler:

  • Registers skill at /v1/main endpoint
  • No authentication required (handled by Hub)

Initialization:

async init(port: number)
   Starts HTTP server
   Calls skill.init()

Analytics

Location: packages/interfaces/src/skill/analytics.ts

Purpose: Track skill events for analysis.

AnalyticsData Structure:

{
  [skillName: string]: [
    {
      event: string,
      properties: any
    }
  ]
}

Built-in Events:

  • SKILL_ENTRY - Skill launched
  • SKILL_OFFER - Opt-in offer presented

Skill Entry Analytics:

{
  initial_intent: string,
  domain: string,
  was_hey_jibo_launch: boolean,
  user_initiated: boolean,
  last_skill: string
}

Tracking:

this.track(data, 'CustomEvent', { key: value });

Events are automatically included in SKILL_ACTION responses.

Server-to-Robot Communication Flow

Skill Response to Hub

When a skill returns a response, the Hub forwards it to the robot:

SKILL_ACTION Response:

  1. Skill returns SKILL_ACTION with JCP behavior
  2. Hub adds timing information
  3. Hub sends SKILL_ACTION to robot via WebSocket
  4. Robot executes JCP behavior
  5. Robot sends CMD_RESULT back to Hub
  6. Hub sends LISTEN_UPDATE to skill
  7. Skill processes result, returns next action

Final SKILL_ACTION:

  1. Skill returns SKILL_ACTION with final: true
  2. Hub sends to robot
  3. Robot executes (if action present)
  4. Transaction complete

SKILL_REDIRECT:

  1. Skill returns SKILL_REDIRECT
  2. Hub emits SKILL_REDIRECT notification to robot
  3. Hub launches new skill
  4. New skill proceeds normally

JCP Action Execution

Single Behavior (SLIM):

{
  type: "JCP",
  config: {
    version: "1.0.0",
    jcp: SayTextBehavior
  }
}

Robot executes single behavior immediately.

Sequence Behavior:

{
  type: "JCP",
  config: {
    version: "1.0.0",
    jcp: Sequence([
      LookAtBehavior,
      SayTextBehavior,
      GestureBehavior
    ])
  }
}

Robot executes behaviors in order.

Parallel Behavior:

{
  type: "JCP",
  config: {
    version: "1.0.0",
    jcp: Parallel([
      SetPresentPersonBehavior,
      SayTextBehavior
    ])
  }
}

Robot executes behaviors simultaneously.

Supplemental Behaviors

Skills can add behaviors that execute alongside the main action:

Parallel Supplemental:

this.addParallelBehavior(data, SetPresentPersonBehavior);
// Main action: SayTextBehavior
// Result: Parallel([SetPresentPersonBehavior, SayTextBehavior])

Sequence Supplemental:

this.addSequenceBehavior(data, LookAtBehavior);
// Main action: SayTextBehavior
// Result: Sequence([LookAtBehavior, SayTextBehavior])

Combined:

this.addSequenceBehavior(data, LookAtBehavior);
this.addParallelBehavior(data, SetPresentPersonBehavior);
// Result: Parallel([SetPresentPersonBehavior, Sequence([LookAtBehavior, SayTextBehavior])])

Example Skill Implementation

Chitchat Skill

Location: packages/chitchat-skill/src/Chitchat.ts

Purpose: Handles conversational interactions with the robot.

Graph Structure:

  1. IntentSplitNode - Splits based on intent type
  2. ProcessQueryNode - Processes user query, selects response
  3. DoMIM (ANFactory) - Plays selected MIM animation
  4. Complete (DefaultNode) - Terminates skill

Initialization:

  • Loads MIM files from directories
  • Builds semi-specific mappings
  • Reads category CSV files

Response Selection:

  • Scripted responses for common queries
  • Emotion responses for emotional queries
  • Semi-specific responses for context-aware queries
  • Fallback responses for unknown queries

MIM Selection:

  • Based on intent and entities
  • Considers semi-specific categories
  • Falls back to core responses

Skill Development Guide

Creating a Simple Skill

import { BaseSkill } from '@jibo/baseskill';
import { skill } from '@jibo/interfaces';

export class MySkill extends BaseSkill {
  constructor() {
    super('my-skill');
  }

  protected async handle(req: PegasusRequest<SkillRequest>): Promise<SkillResponse> {
    const data = req.body.data;
    
    // Process request
    const action = generateJCPAction(SayTextBehavior("Hello!"));
    
    return {
      type: skill.response.ResponseType.SKILL_ACTION,
      data: {
        action: action,
        final: true,
        fireAndForget: true
      },
      ts: Date.now(),
      msgID: getUUID()
    };
  }
}

Creating a Graph Skill

import { GraphSkill, graph } from '@jibo/baseskill';

enum Transition {
  Done = 'Done',
  Retry = 'Retry'
}

export class MyGraphSkill extends GraphSkill<Transition> {
  constructor() {
    super('my-graph-skill');
  }

  createGraph(): graph.Graph<Transition> {
    const g = new graph.Graph('My Skill', generateTransitions(Transition));
    
    const startNode = new MyStartNode('Start');
    const endNode = new graph.nodes.dn.DefaultNode('End');
    
    g.addNode(startNode, [[Transition.Done, endNode]]);
    g.addNode(endNode, [[graph.nodes.dn.Transition.Done, Transition.Done]]);
    
    g.finalize();
    return g;
  }
}

Creating a Custom Node

import { Node, Data, EnterResponse, ExitResponse } from '@jibo/baseskill';

enum MyTransition {
  Success = 'Success',
  Failure = 'Failure'
}

class MyNode extends Node<MyTransition> {
  constructor() {
    super('MyNode', [MyTransition.Success, MyTransition.Failure]);
  }

  async enter(data: Data): Promise<EnterResponse> {
    // Perform logic
    const action = generateJCPAction(SayTextBehavior("Processing..."));
    return { action };
  }

  async exit(data: Data): Promise<ExitResponse> {
    // Process action results
    if (data.result.success) {
      return { transition: MyTransition.Success };
    } else {
      return { transition: MyTransition.Failure };
    }
  }
}

Key Design Principles

  1. State Machine - Graph-based state machine for complex flows
  2. Single Responsibility - Each node handles one piece of logic
  3. Reusability - Subgraphs and node types can be reused
  4. Testability - Nodes can be tested independently
  5. Visualization - GraphViz generation for debugging
  6. Analytics - Built-in event tracking
  7. Flexibility - Supports both simple and complex skills
  8. Supplemental Behaviors - Easy to add parallel/sequence actions