24 KiB
Original Jibo Server (Pegasus) Design Document
Executive Summary
The original Jibo server, codenamed "Pegasus" (formerly V1.X), is a cloud-based microservices architecture that powers the Jibo social robot's conversational AI capabilities. It is built as a Lerna monorepo using Node.js/TypeScript and deployed via Docker containers. The system processes speech, performs natural language understanding, routes to appropriate skills, and manages proactive behaviors.
Architecture Overview
Monorepo Structure
The codebase is organized as a Lerna monorepo with the following main packages:
- packages/hub - Central orchestration service
- packages/parser - NLU (Natural Language Understanding) service
- packages/history - Data persistence service (MongoDB)
- packages/baseskill - Base class and framework for cloud skills
- packages/interfaces - TypeScript interfaces and API contracts
- packages/utils - Shared utility libraries
- packages/chitchat-skill - Example conversational skill
- packages/report-skill - Reporting skill
- packages/lasso - External data integration service
- packages/hub-client - Client library for hub communication
- packages/history-client - Client library for history service
- packages/test-utils - Testing utilities
Technology Stack
- Language: TypeScript 2.5.3
- Runtime: Node.js 8.9.4
- Package Manager: Yarn 1.7.0
- Containerization: Docker
- Orchestration: Docker Compose (local), AWS ECS (production)
- Database: MongoDB 3.6.0
- Cache: Redis 3
- NLU: Dialogflow (API.ai)
- ASR: Google Cloud Speech API
- WebSocket: ws library
- HTTP: Express.js
- Authentication: JWT (jsonwebtoken)
Core Services
1. Hub Service (packages/hub)
The Hub is the central orchestrator that coordinates all interactions between the robot and cloud services.
Key Components
HubService (HubService.ts)
- Main service class extending
BaseService - Initializes and manages all hub components
- Registers WebSocket and HTTP handlers
HubComponents - Dependency injection container:
parser: ParserClient- NLU service clientskillConfigManager: SkillConfigManager- Manages skill configurationsintentRouter: IntentRouter- Routes intents to skillsskillRequestMaker: SkillRequestMaker- Makes HTTP requests to skillshistory: HistoryServiceClient- History service clienthubSettings: HubSettings- Hub configurationsettingsClient: SettingsClient- Settings service client
Endpoints
WebSocket Endpoints:
/listenand/v1/listen- Handles speech recognition and NLU/proactiveand/v1/proactive- Handles proactive triggers
HTTP Endpoints:
/skillsand/v1/skills- Lists available skills/healthcheck- Service health check
Listen Flow
The listen transaction follows a state machine implemented in ListenTransactionHandler:
States:
WAIT_LISTEN → ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_ASR → NLU → ROUTE → DONE
WAIT_LISTEN → WAIT_CLIENT_NLU → ROUTE → DONE
State Transitions:
- WAIT_LISTEN - Receives LISTEN message from robot
- ASR - Performs Automatic Speech Recognition using Google Cloud Speech API
- Streams audio packets
- Emits SOS (Start of Speech) when speech detected
- Emits EOS (End of Speech) when speech ends
- Handles timeouts (SOS timeout, max speech timeout)
- NLU - Sends ASR text to Parser service for intent recognition
- Includes context (loop users, perception, etc.)
- Supports external Dialogflow agents
- ROUTE - Intent Router determines which skill to launch
- Matches NLU result against skill intent configurations
- Decision Mediator can alter decisions based on external factors
- Routes to on-robot skills or cloud skills
- DONE - Transaction complete
Listen Transaction Handler (ListenTransactionHandler.ts):
- Manages audio streaming via
AudioBuffer - Creates
ASRSessionfor speech recognition - Handles timeouts (ASR: 40s, Parser: 10s, Context: 5s, Skill: 10s)
- Records speech history to MongoDB and optionally S3
- Supports client-provided ASR/NLU (for menu clicks, etc.)
- Handles skill redirects
Proactive Flow
The proactive system allows Jibo to initiate conversations based on context, history, and triggers.
Proactive Transaction Handler (ProactiveTransactionHandler.ts):
- Receives TRIGGER message from robot
- Waits for CONTEXT message (robot state)
- Action Selection:
- Gets all proactive skill configurations
- Filters by context rules (time, location, people present, etc.)
- Filters by interaction history rules (frequency, recency)
- Filters by user settings
- Randomly selects from eligible actions
- Launches selected skill (on-robot or cloud)
- Returns match response or no-action response
Proactive Registration: Skills register proactive behaviors with:
- Trigger types (time-based, event-based, surprise)
- Context rules (when this can trigger)
- Interaction history rules (how often it can trigger)
- Settings rules (user preferences)
2. Parser Service (packages/parser)
The Parser service performs Natural Language Understanding using Dialogflow.
ParserService (ParserService.ts):
- Starts RobustParser process on port 8787 (optional)
- Initializes Dialogflow client
- Initializes Robust Parser client
- Handles POST requests to
/v1/parse - Exposes state at
/stateendpoint
NLU Pipeline:
- Receives text, rules, and context
- Queries Dialogflow with configured agents
- Optionally queries Robust Parser (custom NLU)
- Returns intent, entities, and rules
Configuration:
- Dialogflow API key
- Robust Parser enable/disable
- Multiple external agents support
3. History Service (packages/history)
The History service persists interaction data to MongoDB.
HistoryService (HistoryService.ts):
- Two database clients:
SkillLaunchDBClient- Records skill launchesSpeechHistoryDBClient- Records speech interactions (optional)
- HTTP endpoints:
/v1/skill/launch- Skill launch history/v1/speech- Speech history (if enabled)
- Health check endpoint
Data Stored:
- Skill launches (skill ID, intent, timestamp, robot ID, account ID)
- Speech interactions (ASR result, NLU result, audio file URL, error tracking)
4. Lasso Service (packages/lasso)
Lasso provides external data integration for skills.
Features:
- OAuth2 credential management
- Calendar client integration
- Weather data (Dark Sky API)
- Maps data (Google Maps API)
- News data (AP News)
- MongoDB for credential storage
- Redis for caching
LassoService (LassoService.ts):
- Manages OAuth2 flows
- Provides relay endpoints for external APIs
- Caches responses in Redis
Skill Framework
BaseSkill (packages/baseskill)
BaseSkill (BaseSkill.ts):
- Abstract base class for all cloud skills
- Extends
BaseHttpHandler - Handles POST requests to
/ - Provides error handling
- Tracks timing
GraphSkill (GraphSkill.ts):
- Extends BaseSkill with graph-based state machine
- Implements node-based conversation flow
- Supports skill redirects
- Tracks analytics events
- Supports supplemental behaviors (parallel/sequence)
Graph System
The graph system provides a state machine framework for skills.
Graph (Graph.ts):
- Directed graph of connected nodes
- Supports subgraphs (hierarchical)
- Exit transitions for graph termination
- Validation (reachability, transition completeness)
- GraphViz dot file generation
GraphManager (GraphManager.ts):
- Singleton per skill
- Manages node IDs and mappings
- Executes graph:
start()- Creates session, enters initial nodeenterNode()- Calls node's enter methodexitNode()- Calls node's exit method with action resultsexecuteTransition()- Moves to next node
- Maintains session state (node ID, data, trace)
Node (Node.ts):
- Abstract base class for graph nodes
- Has transition names and destinations
- Two lifecycle methods:
enter(data)- Called when node is entered, returns action or redirectexit(data)- Called with action results, returns next transition
- Supports graph traversal (BFS)
Built-in Node Types:
DefaultNode- Simple terminal nodeJCPNode- Returns JCP actionNoOpNode- No operationTrueFalseNode- Conditional branchingSetLooperIDNode- Sets speaker ID
MIM (Motion Interaction Model) System:
ANFactory- Creates graph for playing MIM animations- Supports scripted responses, emotion responses, fallback responses
- Semi-specific responses (context-aware)
Skill Request/Response Protocol
Skill Request Types (skill/request.ts):
LISTEN_LAUNCH- Launch skill from listen interactionLISTEN_UPDATE- Update skill with action resultsPROACTIVE_LAUNCH- Launch skill proactively
Skill Request Data:
{
type: MessageType,
msgID: UUID,
ts: number,
data: {
general: { accountID, robotID, lang, release },
runtime: { character, location, loop, perception, dialog },
skill: { id, session? },
result: any, // Action results for UPDATE
nlu: NLUResult,
asr: ASRResult,
memo?: any
}
}
Skill Response Types (skill/response.ts):
SKILL_ACTION- Returns action to executeSKILL_REDIRECT- Redirects to another skillERROR- Error response
Skill Action Data:
{
action: JCPAction, // JCP protocol behavior
analytics?: AnalyticsData,
final?: boolean, // Is this the final response?
fireAndForget?: boolean
}
JCP Action (skill/action.ts):
{
type: ActionType.JCP,
config: {
version: "1.0.0",
jcp: SupportedBehaviors // SLIM, Sequence, Parallel, SetPresentPerson, ImpactEmotion
}
}
Skill Configuration
SkillConfig (skill/config.ts):
{
id: SkillID,
intents: [{
name: IntentName,
entities?: EntityConfig[],
memo?: any
}],
proactives?: ProactiveRegistration[],
IHQueries?: IHQueryDefinitions,
onRobot?: boolean,
URL: string,
settings?: ManifestSettings
}
Entity Config:
name- Entity namevalue- Expected valuematchRule- 'EXACT' or 'NOT'
Proactive Registration:
- Trigger type and conditions
- Context rules
- Interaction history rules
- Settings rules
Interfaces Package
The interfaces package defines all TypeScript interfaces for communication between services.
Key Interface Modules
service.ts - Base message types:
BaseMessage<T, D>- Generic message with type, msgID, timestamp, dataBaseResponse<T, D>- Response with final flag and timingsIAuthDetails- Authentication details (account ID, access keys)
hub/ - Hub-specific interfaces:
request.ts- LISTEN, CONTEXT, CLIENT_ASR, CLIENT_NLU messagesresponse.ts- ASR, NLU, LISTEN, SKILL_REDIRECT, ERROR responsesMessageType.ts- Message type enumsHubErrorCode.ts- Error code enums
skill/ - Skill-specific interfaces:
request.ts- LISTEN_LAUNCH, LISTEN_UPDATE, PROACTIVE_LAUNCHresponse.ts- SKILL_ACTION, SKILL_REDIRECT, ERRORaction.ts- JCP action typesconfig.ts- Skill configurationbehaviors.ts- Supported JCP behaviorsanalytics.ts- Analytics event types
nlu.ts - NLU interfaces:
NLURequestData- Text, rules, loop users, external agentsNLUResult- Intent, entities, rulesExternalAgentRequest- External Dialogflow agent config
asr.ts - ASR interfaces:
ASRResult- Text, confidence, annotationASRConfig- Language, hints, timeouts
jibo/ - Jibo-specific data:
data.ts- GeneralData (account, robot, language), SkillData (session, trace)runtime.ts- RuntimeContext (character, location, loop, perception, dialog)
proactive/ - Proactive interfaces:
- Context field definitions
- History rules
- Settings rules
- Proactive trigger/request/response
history/ - History interfaces:
- Skill launch data
- Speech history data
Utils Package
The utils package provides shared functionality.
BaseService (utils/service/BaseService.ts)
Base class for all Pegasus services:
Features:
- Express.js HTTP server
- WebSocket server (ws library)
- JWT authentication
- Request/response logging with jibo-log
- New Relic monitoring
- Health check endpoint
- Error handling middleware
Methods:
addSocketHandler(path, handler)- Register WebSocket handleraddHttpHandler(path, handler)- Register HTTP handlerinit(port)- Start serverclose()- Stop server
Authentication:
- JWT token verification
- Bearer token scheme
- Configurable secret via
ETCO_server_hubTokenSecret
Logging:
- Per-request log instances
- Transaction ID tracking
- Robot ID tracking
- Configurable log levels per namespace
Other Utils
PegasusRequest- Enhanced Express request with Jibo headersPegasusWebSocket- Enhanced WebSocket with auth and loggingJiboHeaders- Parses Jibo-specific headers (transID, robotID, logging config)ResponseWrapper- Wraps WebSocket responsesHttpError- HTTP error with status code
Communication Protocols
WebSocket Protocol
Connection:
- URL:
ws://hub:9000/listenorws://hub:9000/proactive - Authentication: Bearer token in Authorization header
- Headers:
x-jibo-transid,x-jibo-robotid,x-jibo-logging-config
Message Format:
{
"type": "MESSAGE_TYPE",
"msgID": "uuid",
"ts": 1234567890,
"data": { ... }
}
Listen Flow Messages:
- Robot → Hub: LISTEN (with ASR config, rules, language)
- Robot → Hub: Audio packets (binary)
- Hub → Robot: SOS (Start of Speech)
- Robot → Hub: CONTEXT (runtime context)
- Hub → Robot: EOS (End of Speech)
- Hub → Robot: LISTEN (with ASR result, NLU result, match)
- Hub → Robot: SKILL_ACTION (if cloud skill)
- Robot → Hub: CMD_RESULT (action results)
- Hub → Robot: SKILL_ACTION (next action) or final
Proactive Flow Messages:
- Robot → Hub: TRIGGER (trigger data)
- Robot → Hub: CONTEXT (runtime context)
- Hub → Robot: PROACTIVE (match or no-action)
- Hub → Robot: SKILL_ACTION (if cloud skill)
HTTP Protocol
Skill Request:
- Method: POST
- URL:
http://skill-host:port/ - Headers: Authorization, x-jibo-transid, x-jibo-robotid
- Body: SkillRequest JSON
Parser Request:
- Method: POST
- URL:
http://parser:8080/v1/parse - Body: NLURequestData JSON
Authentication & Security
JWT Authentication
Token Format:
{
"id": "account-id",
"accessKeyId": "client-id",
"secretAccessKey": "client-secret",
"friendlyId": "robot-name"
}
Verification:
- Secret:
ETCO_server_hubTokenSecretenvironment variable - Scheme: Bearer
- Applied to WebSocket connections and HTTP endpoints
Network Security
- All services run in Docker containers
- Services communicate via Docker network (pegasus-nw)
- External access via load balancer
- TLS termination at load balancer
Deployment
Docker Compose (Local Development)
Services:
hub- Hub service (port 9000)parser- Parser service (port 9005)history- History service (port 9006)chitchat-skill- Chitchat skill (port 9004)report-skill- Report skill (port 9003)lasso- Lasso service (port 9007)redis- Redis cache (port 6379)mongo_lasso- MongoDB for Lasso (port 27017)history_cluster- MongoDB for History (from docker-compose-history-db.yml)
Configuration:
- Environment variables prefixed with
ETCO_(ETCO = Environment TO Configuration) - Volume mounting:
./:/pegasus:consistentfor live code editing - Debug ports: 5850-5855 for Node.js debugging
Build Process
Commands:
docker build -t pegasus_base:latest .
yarn docker:bootstrap
yarn docker:build
./pegasus.js build-docker-image --services hub
CLI Tool (cli/):
bootstrap- Install dependenciesbuild- Build TypeScripttest- Run testsdocker-run- Run commands in Dockerbuild-docker-image- Build Docker images for services
Production Deployment
- AWS ECS (Elastic Container Service)
- ECR (Elastic Container Registry) for Docker images
- Application Load Balancer
- MongoDB Atlas for production databases
- ElastiCache for Redis
- CloudWatch for logging
- New Relic for monitoring
Data Flow Examples
Example 1: User Says "Tell Me a Joke"
- Robot → Hub: LISTEN message with ASR config
- Robot → Hub: Audio stream
- Hub: Detects SOS, emits SOS message
- Hub: Streams audio to Google Cloud Speech API
- Hub: Detects EOS, emits EOS message
- Robot → Hub: CONTEXT message (runtime state)
- Hub → Parser: POST /v1/parse with text "tell me a joke"
- Parser → Dialogflow: Query with "joke" intent rules
- Dialogflow → Parser: Intent="joke_tell", entities={}
- Parser → Hub: NLU result
- Hub → IntentRouter: Match intent to "joke-skill"
- Hub → joke-skill: POST LISTEN_LAUNCH request
- joke-skill: Executes graph, selects joke
- joke-skill → Hub: SKILL_ACTION with JCP behavior (SayText)
- Hub → Robot: SKILL_ACTION message
- Robot: Executes behavior, speaks joke
- Robot → Hub: CMD_RESULT with action result
- Hub → joke-skill: POST LISTEN_UPDATE request
- joke-skill: Returns final=true
- Hub → Robot: Final SKILL_ACTION
Example 2: Proactive Greeting
- Robot: Detects person entering room
- Robot → Hub: TRIGGER message with trigger data
- Robot → Hub: CONTEXT message (runtime state)
- Hub: Queries all proactive skill configs
- Hub: Filters by context (time, people present)
- Hub: Filters by history (last greeting time)
- Hub: Filters by settings (user greeting preference)
- Hub: Selects "greeting-skill"
- Hub → greeting-skill: POST PROACTIVE_LAUNCH request
- greeting-skill → Hub: SKILL_ACTION with greeting behavior
- Hub → Robot: PROACTIVE response with match
- Hub → Robot: SKILL_ACTION message
- Robot: Executes greeting
Error Handling
Error Types
Hub Error Codes (HubErrorCode.ts):
TIMEOUT_ASR- ASR timeoutTIMEOUT_PARSER- Parser timeoutTIMEOUT_CONTEXT- Context timeoutTIMEOUT_SKILL- Skill timeoutPARSER- Parser errorASR- ASR error
Skill Request Errors (SkillRequestError):
SKILL_NOT_FOUND- Skill does not existTIMEOUT- Skill request timeout
Error Response Format
{
"type": "ERROR",
"msgID": "uuid",
"ts": 1234567890,
"final": true,
"data": {
"message": "Error description",
"code": "ERROR_CODE"
},
"timings": {
"total": 1234
}
}
Timeout Handling
- ASR: 40 seconds (configurable via sosTimeout, maxSpeechTimeout)
- Parser: 10 seconds
- Context: 5 seconds
- Skill: 10 seconds
- Transaction: 60 seconds (configurable)
Monitoring & Logging
Logging
jibo-log Integration:
- Per-namespace log levels
- Transaction ID correlation
- Robot ID tracking
- Structured logging support
Log Levels:
- Configured via
x-jibo-logging-configheader - Per-namespace granularity
- Environment variable:
ETCO_server_logLevel
Monitoring
New Relic:
- HTTP request tracking
- WebSocket transaction tracking
- Error tracking
- Custom attributes (transID, robotID)
Health Checks:
/healthcheckendpoint on all services- Returns service-specific health data
- Database connection status
Speech History Recording
Optional Features:
- Record skill launches to MongoDB
- Record speech interactions to MongoDB
- Upload speech logs to S3 (JSON with audio base64)
Configuration:
ETCO_hub_recordLaunchHistory- Enable launch historyETCO_hub_recordSpeechHistory- Enable speech historyETCO_hub_recordSpeechLogBucket- S3 bucket for speech logs
Skill Development Guide
Creating a New Skill
- Extend GraphSkill:
export class MySkill extends GraphSkill<Transition> {
constructor() {
super('my-skill');
}
createGraph(): Graph<Transition> {
const g = new Graph('My Skill', generateTransitions<Transition>(Transition));
// Add nodes and transitions
g.finalize();
return g;
}
}
- Define Transitions:
enum Transition {
Done = 'Done',
Retry = 'Retry'
}
- Create Nodes:
class MyNode extends Node<Transition> {
async enter(data: Data): Promise<EnterResponse> {
// Return action or redirect
return { action: myJCPAction };
}
async exit(data: Data): Promise<ExitResponse> {
// Return next transition
return { transition: Transition.Done };
}
}
- Create Skill Manifest:
{
"id": "my-skill",
"intents": [
{
"name": "my_intent",
"entities": []
}
],
"onRobot": false
}
- Register with Hub:
- Add skill config to skills-local.json or environment
- Deploy skill service
- Hub will load configuration
Skill Best Practices
- Use graph for complex flows, direct responses for simple ones
- Track analytics events for monitoring
- Handle errors gracefully with try-catch
- Use supplemental behaviors for parallel actions
- Set appropriate timeouts
- Log important events
- Test with both LISTEN_LAUNCH and PROACTIVE_LAUNCH
Key Design Decisions
Why Graph-Based Skills?
- State Management: Explicit state machine with session tracking
- Visualization: GraphViz generation for debugging
- Reusability: Subgraphs for common patterns
- Testability: Isolated node testing
- Maintainability: Clear flow structure
Why WebSocket for Robot Communication?
- Low Latency: Real-time bidirectional communication
- Audio Streaming: Binary message support for audio
- Stateful: Single connection per transaction
- Efficiency: No HTTP overhead for each message
Why Separate Services?
- Scalability: Scale each service independently
- Isolation: Failure in one service doesn't affect others
- Technology: Different services can use different tech stacks
- Deployment: Independent deployment cycles
Why Lerna Monorepo?
- Code Sharing: Easy to share interfaces and utils
- Versioning: Linked versioning for interdependent packages
- Development: Single repository for all services
- Testing: Integration tests across packages
Limitations & Known Issues
- Single Graph Manager: Skills cannot have concurrent sessions (singleton pattern)
- Sequential Skill Redirects: Only one level of redirect supported
- No Skill-to-Skill Communication: Skills must go through hub
- Fixed Timeouts: Hardcoded timeouts in some places
- No Skill Hot-Reload: Requires container rebuild for skill changes
- Limited NLU: Dialogflow dependency, no custom model training
- No Skill Versioning: Skills identified by ID only
- Synchronous Skill Requests: Hub waits for skill response (no async)
Future Considerations
- Skill Versioning: Support multiple versions of same skill
- Skill-to-Skill Direct Communication: Allow skills to call each other
- Async Skill Responses: Long-running skills with callback pattern
- Custom NLU Models: Support for custom trained models
- Skill Hot-Reload: Dynamic skill loading without restart
- Multi-Session Skills: Support concurrent skill sessions
- Skill Marketplace: Third-party skill distribution
- A/B Testing: Framework for testing skill variations
Conclusion
The original Jibo server (Pegasus) is a well-architected microservices system that provides a robust foundation for conversational AI on the Jibo robot. The graph-based skill framework offers flexibility and maintainability, while the separation of concerns enables independent scaling and development. The system successfully handles real-time speech processing, natural language understanding, skill routing, and proactive behaviors in a distributed cloud environment.