Original server design doc
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -420,3 +420,4 @@ FodyWeavers.xsd
|
||||
OpenJibo/captures/
|
||||
OpenJibo/.tmp/
|
||||
|
||||
OpenJibo/docs/DesignDoc/original server
|
||||
|
||||
792
OpenJibo/docs/DesignDoc/original-server-design.md
Normal file
792
OpenJibo/docs/DesignDoc/original-server-design.md
Normal file
@@ -0,0 +1,792 @@
|
||||
# Original Jibo Server (Pegasus) Design Document
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The original Jibo server, codenamed "Pegasus" (formerly V1.X), is a cloud-based microservices architecture that powers the Jibo social robot's conversational AI capabilities. It is built as a Lerna monorepo using Node.js/TypeScript and deployed via Docker containers. The system processes speech, performs natural language understanding, routes to appropriate skills, and manages proactive behaviors.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Monorepo Structure
|
||||
|
||||
The codebase is organized as a Lerna monorepo with the following main packages:
|
||||
|
||||
- **packages/hub** - Central orchestration service
|
||||
- **packages/parser** - NLU (Natural Language Understanding) service
|
||||
- **packages/history** - Data persistence service (MongoDB)
|
||||
- **packages/baseskill** - Base class and framework for cloud skills
|
||||
- **packages/interfaces** - TypeScript interfaces and API contracts
|
||||
- **packages/utils** - Shared utility libraries
|
||||
- **packages/chitchat-skill** - Example conversational skill
|
||||
- **packages/report-skill** - Reporting skill
|
||||
- **packages/lasso** - External data integration service
|
||||
- **packages/hub-client** - Client library for hub communication
|
||||
- **packages/history-client** - Client library for history service
|
||||
- **packages/test-utils** - Testing utilities
|
||||
|
||||
### Technology Stack
|
||||
|
||||
- **Language**: TypeScript 2.5.3
|
||||
- **Runtime**: Node.js 8.9.4
|
||||
- **Package Manager**: Yarn 1.7.0
|
||||
- **Containerization**: Docker
|
||||
- **Orchestration**: Docker Compose (local), AWS ECS (production)
|
||||
- **Database**: MongoDB 3.6.0
|
||||
- **Cache**: Redis 3
|
||||
- **NLU**: Dialogflow (API.ai)
|
||||
- **ASR**: Google Cloud Speech API
|
||||
- **WebSocket**: ws library
|
||||
- **HTTP**: Express.js
|
||||
- **Authentication**: JWT (jsonwebtoken)
|
||||
|
||||
## Core Services
|
||||
|
||||
### 1. Hub Service (`packages/hub`)
|
||||
|
||||
The Hub is the central orchestrator that coordinates all interactions between the robot and cloud services.
|
||||
|
||||
#### Key Components
|
||||
|
||||
**HubService** (`HubService.ts`)
|
||||
- Main service class extending `BaseService`
|
||||
- Initializes and manages all hub components
|
||||
- Registers WebSocket and HTTP handlers
|
||||
|
||||
**HubComponents** - Dependency injection container:
|
||||
- `parser: ParserClient` - NLU service client
|
||||
- `skillConfigManager: SkillConfigManager` - Manages skill configurations
|
||||
- `intentRouter: IntentRouter` - Routes intents to skills
|
||||
- `skillRequestMaker: SkillRequestMaker` - Makes HTTP requests to skills
|
||||
- `history: HistoryServiceClient` - History service client
|
||||
- `hubSettings: HubSettings` - Hub configuration
|
||||
- `settingsClient: SettingsClient` - Settings service client
|
||||
|
||||
#### Endpoints
|
||||
|
||||
**WebSocket Endpoints:**
|
||||
- `/listen` and `/v1/listen` - Handles speech recognition and NLU
|
||||
- `/proactive` and `/v1/proactive` - Handles proactive triggers
|
||||
|
||||
**HTTP Endpoints:**
|
||||
- `/skills` and `/v1/skills` - Lists available skills
|
||||
- `/healthcheck` - Service health check
|
||||
|
||||
#### Listen Flow
|
||||
|
||||
The listen transaction follows a state machine implemented in `ListenTransactionHandler`:
|
||||
|
||||
```
|
||||
States:
|
||||
WAIT_LISTEN → ASR → NLU → ROUTE → DONE
|
||||
WAIT_LISTEN → WAIT_CLIENT_ASR → NLU → ROUTE → DONE
|
||||
WAIT_LISTEN → WAIT_CLIENT_NLU → ROUTE → DONE
|
||||
```
|
||||
|
||||
**State Transitions:**
|
||||
|
||||
1. **WAIT_LISTEN** - Receives LISTEN message from robot
|
||||
2. **ASR** - Performs Automatic Speech Recognition using Google Cloud Speech API
|
||||
- Streams audio packets
|
||||
- Emits SOS (Start of Speech) when speech detected
|
||||
- Emits EOS (End of Speech) when speech ends
|
||||
- Handles timeouts (SOS timeout, max speech timeout)
|
||||
3. **NLU** - Sends ASR text to Parser service for intent recognition
|
||||
- Includes context (loop users, perception, etc.)
|
||||
- Supports external Dialogflow agents
|
||||
4. **ROUTE** - Intent Router determines which skill to launch
|
||||
- Matches NLU result against skill intent configurations
|
||||
- Decision Mediator can alter decisions based on external factors
|
||||
- Routes to on-robot skills or cloud skills
|
||||
5. **DONE** - Transaction complete
|
||||
|
||||
**Listen Transaction Handler** (`ListenTransactionHandler.ts`):
|
||||
- Manages audio streaming via `AudioBuffer`
|
||||
- Creates `ASRSession` for speech recognition
|
||||
- Handles timeouts (ASR: 40s, Parser: 10s, Context: 5s, Skill: 10s)
|
||||
- Records speech history to MongoDB and optionally S3
|
||||
- Supports client-provided ASR/NLU (for menu clicks, etc.)
|
||||
- Handles skill redirects
|
||||
|
||||
#### Proactive Flow
|
||||
|
||||
The proactive system allows Jibo to initiate conversations based on context, history, and triggers.
|
||||
|
||||
**Proactive Transaction Handler** (`ProactiveTransactionHandler.ts`):
|
||||
|
||||
1. Receives TRIGGER message from robot
|
||||
2. Waits for CONTEXT message (robot state)
|
||||
3. **Action Selection**:
|
||||
- Gets all proactive skill configurations
|
||||
- Filters by context rules (time, location, people present, etc.)
|
||||
- Filters by interaction history rules (frequency, recency)
|
||||
- Filters by user settings
|
||||
- Randomly selects from eligible actions
|
||||
4. Launches selected skill (on-robot or cloud)
|
||||
5. Returns match response or no-action response
|
||||
|
||||
**Proactive Registration**:
|
||||
Skills register proactive behaviors with:
|
||||
- Trigger types (time-based, event-based, surprise)
|
||||
- Context rules (when this can trigger)
|
||||
- Interaction history rules (how often it can trigger)
|
||||
- Settings rules (user preferences)
|
||||
|
||||
### 2. Parser Service (`packages/parser`)
|
||||
|
||||
The Parser service performs Natural Language Understanding using Dialogflow.
|
||||
|
||||
**ParserService** (`ParserService.ts`):
|
||||
- Starts RobustParser process on port 8787 (optional)
|
||||
- Initializes Dialogflow client
|
||||
- Initializes Robust Parser client
|
||||
- Handles POST requests to `/v1/parse`
|
||||
- Exposes state at `/state` endpoint
|
||||
|
||||
**NLU Pipeline:**
|
||||
1. Receives text, rules, and context
|
||||
2. Queries Dialogflow with configured agents
|
||||
3. Optionally queries Robust Parser (custom NLU)
|
||||
4. Returns intent, entities, and rules
|
||||
|
||||
**Configuration:**
|
||||
- Dialogflow API key
|
||||
- Robust Parser enable/disable
|
||||
- Multiple external agents support
|
||||
|
||||
### 3. History Service (`packages/history`)
|
||||
|
||||
The History service persists interaction data to MongoDB.
|
||||
|
||||
**HistoryService** (`HistoryService.ts`):
|
||||
- Two database clients:
|
||||
- `SkillLaunchDBClient` - Records skill launches
|
||||
- `SpeechHistoryDBClient` - Records speech interactions (optional)
|
||||
- HTTP endpoints:
|
||||
- `/v1/skill/launch` - Skill launch history
|
||||
- `/v1/speech` - Speech history (if enabled)
|
||||
- Health check endpoint
|
||||
|
||||
**Data Stored:**
|
||||
- Skill launches (skill ID, intent, timestamp, robot ID, account ID)
|
||||
- Speech interactions (ASR result, NLU result, audio file URL, error tracking)
|
||||
|
||||
### 4. Lasso Service (`packages/lasso`)
|
||||
|
||||
Lasso provides external data integration for skills.
|
||||
|
||||
**Features:**
|
||||
- OAuth2 credential management
|
||||
- Calendar client integration
|
||||
- Weather data (Dark Sky API)
|
||||
- Maps data (Google Maps API)
|
||||
- News data (AP News)
|
||||
- MongoDB for credential storage
|
||||
- Redis for caching
|
||||
|
||||
**LassoService** (`LassoService.ts`):
|
||||
- Manages OAuth2 flows
|
||||
- Provides relay endpoints for external APIs
|
||||
- Caches responses in Redis
|
||||
|
||||
## Skill Framework
|
||||
|
||||
### BaseSkill (`packages/baseskill`)
|
||||
|
||||
**BaseSkill** (`BaseSkill.ts`):
|
||||
- Abstract base class for all cloud skills
|
||||
- Extends `BaseHttpHandler`
|
||||
- Handles POST requests to `/`
|
||||
- Provides error handling
|
||||
- Tracks timing
|
||||
|
||||
**GraphSkill** (`GraphSkill.ts`):
|
||||
- Extends BaseSkill with graph-based state machine
|
||||
- Implements node-based conversation flow
|
||||
- Supports skill redirects
|
||||
- Tracks analytics events
|
||||
- Supports supplemental behaviors (parallel/sequence)
|
||||
|
||||
### Graph System
|
||||
|
||||
The graph system provides a state machine framework for skills.
|
||||
|
||||
**Graph** (`Graph.ts`):
|
||||
- Directed graph of connected nodes
|
||||
- Supports subgraphs (hierarchical)
|
||||
- Exit transitions for graph termination
|
||||
- Validation (reachability, transition completeness)
|
||||
- GraphViz dot file generation
|
||||
|
||||
**GraphManager** (`GraphManager.ts`):
|
||||
- Singleton per skill
|
||||
- Manages node IDs and mappings
|
||||
- Executes graph:
|
||||
- `start()` - Creates session, enters initial node
|
||||
- `enterNode()` - Calls node's enter method
|
||||
- `exitNode()` - Calls node's exit method with action results
|
||||
- `executeTransition()` - Moves to next node
|
||||
- Maintains session state (node ID, data, trace)
|
||||
|
||||
**Node** (`Node.ts`):
|
||||
- Abstract base class for graph nodes
|
||||
- Has transition names and destinations
|
||||
- Two lifecycle methods:
|
||||
- `enter(data)` - Called when node is entered, returns action or redirect
|
||||
- `exit(data)` - Called with action results, returns next transition
|
||||
- Supports graph traversal (BFS)
|
||||
|
||||
**Built-in Node Types:**
|
||||
- `DefaultNode` - Simple terminal node
|
||||
- `JCPNode` - Returns JCP action
|
||||
- `NoOpNode` - No operation
|
||||
- `TrueFalseNode` - Conditional branching
|
||||
- `SetLooperIDNode` - Sets speaker ID
|
||||
|
||||
**MIM (Motion Interaction Model) System:**
|
||||
- `ANFactory` - Creates graph for playing MIM animations
|
||||
- Supports scripted responses, emotion responses, fallback responses
|
||||
- Semi-specific responses (context-aware)
|
||||
|
||||
### Skill Request/Response Protocol
|
||||
|
||||
**Skill Request Types** (`skill/request.ts`):
|
||||
- `LISTEN_LAUNCH` - Launch skill from listen interaction
|
||||
- `LISTEN_UPDATE` - Update skill with action results
|
||||
- `PROACTIVE_LAUNCH` - Launch skill proactively
|
||||
|
||||
**Skill Request Data:**
|
||||
```typescript
|
||||
{
|
||||
type: MessageType,
|
||||
msgID: UUID,
|
||||
ts: number,
|
||||
data: {
|
||||
general: { accountID, robotID, lang, release },
|
||||
runtime: { character, location, loop, perception, dialog },
|
||||
skill: { id, session? },
|
||||
result: any, // Action results for UPDATE
|
||||
nlu: NLUResult,
|
||||
asr: ASRResult,
|
||||
memo?: any
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Skill Response Types** (`skill/response.ts`):
|
||||
- `SKILL_ACTION` - Returns action to execute
|
||||
- `SKILL_REDIRECT` - Redirects to another skill
|
||||
- `ERROR` - Error response
|
||||
|
||||
**Skill Action Data:**
|
||||
```typescript
|
||||
{
|
||||
action: JCPAction, // JCP protocol behavior
|
||||
analytics?: AnalyticsData,
|
||||
final?: boolean, // Is this the final response?
|
||||
fireAndForget?: boolean
|
||||
}
|
||||
```
|
||||
|
||||
**JCP Action** (`skill/action.ts`):
|
||||
```typescript
|
||||
{
|
||||
type: ActionType.JCP,
|
||||
config: {
|
||||
version: "1.0.0",
|
||||
jcp: SupportedBehaviors // SLIM, Sequence, Parallel, SetPresentPerson, ImpactEmotion
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Skill Configuration
|
||||
|
||||
**SkillConfig** (`skill/config.ts`):
|
||||
```typescript
|
||||
{
|
||||
id: SkillID,
|
||||
intents: [{
|
||||
name: IntentName,
|
||||
entities?: EntityConfig[],
|
||||
memo?: any
|
||||
}],
|
||||
proactives?: ProactiveRegistration[],
|
||||
IHQueries?: IHQueryDefinitions,
|
||||
onRobot?: boolean,
|
||||
URL: string,
|
||||
settings?: ManifestSettings
|
||||
}
|
||||
```
|
||||
|
||||
**Entity Config**:
|
||||
- `name` - Entity name
|
||||
- `value` - Expected value
|
||||
- `matchRule` - 'EXACT' or 'NOT'
|
||||
|
||||
**Proactive Registration**:
|
||||
- Trigger type and conditions
|
||||
- Context rules
|
||||
- Interaction history rules
|
||||
- Settings rules
|
||||
|
||||
## Interfaces Package
|
||||
|
||||
The `interfaces` package defines all TypeScript interfaces for communication between services.
|
||||
|
||||
### Key Interface Modules
|
||||
|
||||
**service.ts** - Base message types:
|
||||
- `BaseMessage<T, D>` - Generic message with type, msgID, timestamp, data
|
||||
- `BaseResponse<T, D>` - Response with final flag and timings
|
||||
- `IAuthDetails` - Authentication details (account ID, access keys)
|
||||
|
||||
**hub/** - Hub-specific interfaces:
|
||||
- `request.ts` - LISTEN, CONTEXT, CLIENT_ASR, CLIENT_NLU messages
|
||||
- `response.ts` - ASR, NLU, LISTEN, SKILL_REDIRECT, ERROR responses
|
||||
- `MessageType.ts` - Message type enums
|
||||
- `HubErrorCode.ts` - Error code enums
|
||||
|
||||
**skill/** - Skill-specific interfaces:
|
||||
- `request.ts` - LISTEN_LAUNCH, LISTEN_UPDATE, PROACTIVE_LAUNCH
|
||||
- `response.ts` - SKILL_ACTION, SKILL_REDIRECT, ERROR
|
||||
- `action.ts` - JCP action types
|
||||
- `config.ts` - Skill configuration
|
||||
- `behaviors.ts` - Supported JCP behaviors
|
||||
- `analytics.ts` - Analytics event types
|
||||
|
||||
**nlu.ts** - NLU interfaces:
|
||||
- `NLURequestData` - Text, rules, loop users, external agents
|
||||
- `NLUResult` - Intent, entities, rules
|
||||
- `ExternalAgentRequest` - External Dialogflow agent config
|
||||
|
||||
**asr.ts** - ASR interfaces:
|
||||
- `ASRResult` - Text, confidence, annotation
|
||||
- `ASRConfig` - Language, hints, timeouts
|
||||
|
||||
**jibo/** - Jibo-specific data:
|
||||
- `data.ts` - GeneralData (account, robot, language), SkillData (session, trace)
|
||||
- `runtime.ts` - RuntimeContext (character, location, loop, perception, dialog)
|
||||
|
||||
**proactive/** - Proactive interfaces:
|
||||
- Context field definitions
|
||||
- History rules
|
||||
- Settings rules
|
||||
- Proactive trigger/request/response
|
||||
|
||||
**history/** - History interfaces:
|
||||
- Skill launch data
|
||||
- Speech history data
|
||||
|
||||
## Utils Package
|
||||
|
||||
The `utils` package provides shared functionality.
|
||||
|
||||
### BaseService (`utils/service/BaseService.ts`)
|
||||
|
||||
Base class for all Pegasus services:
|
||||
|
||||
**Features:**
|
||||
- Express.js HTTP server
|
||||
- WebSocket server (ws library)
|
||||
- JWT authentication
|
||||
- Request/response logging with jibo-log
|
||||
- New Relic monitoring
|
||||
- Health check endpoint
|
||||
- Error handling middleware
|
||||
|
||||
**Methods:**
|
||||
- `addSocketHandler(path, handler)` - Register WebSocket handler
|
||||
- `addHttpHandler(path, handler)` - Register HTTP handler
|
||||
- `init(port)` - Start server
|
||||
- `close()` - Stop server
|
||||
|
||||
**Authentication:**
|
||||
- JWT token verification
|
||||
- Bearer token scheme
|
||||
- Configurable secret via `ETCO_server_hubTokenSecret`
|
||||
|
||||
**Logging:**
|
||||
- Per-request log instances
|
||||
- Transaction ID tracking
|
||||
- Robot ID tracking
|
||||
- Configurable log levels per namespace
|
||||
|
||||
### Other Utils
|
||||
|
||||
- `PegasusRequest` - Enhanced Express request with Jibo headers
|
||||
- `PegasusWebSocket` - Enhanced WebSocket with auth and logging
|
||||
- `JiboHeaders` - Parses Jibo-specific headers (transID, robotID, logging config)
|
||||
- `ResponseWrapper` - Wraps WebSocket responses
|
||||
- `HttpError` - HTTP error with status code
|
||||
|
||||
## Communication Protocols
|
||||
|
||||
### WebSocket Protocol
|
||||
|
||||
**Connection:**
|
||||
- URL: `ws://hub:9000/listen` or `ws://hub:9000/proactive`
|
||||
- Authentication: Bearer token in Authorization header
|
||||
- Headers: `x-jibo-transid`, `x-jibo-robotid`, `x-jibo-logging-config`
|
||||
|
||||
**Message Format:**
|
||||
```json
|
||||
{
|
||||
"type": "MESSAGE_TYPE",
|
||||
"msgID": "uuid",
|
||||
"ts": 1234567890,
|
||||
"data": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
**Listen Flow Messages:**
|
||||
1. Robot → Hub: LISTEN (with ASR config, rules, language)
|
||||
2. Robot → Hub: Audio packets (binary)
|
||||
3. Hub → Robot: SOS (Start of Speech)
|
||||
4. Robot → Hub: CONTEXT (runtime context)
|
||||
5. Hub → Robot: EOS (End of Speech)
|
||||
6. Hub → Robot: LISTEN (with ASR result, NLU result, match)
|
||||
7. Hub → Robot: SKILL_ACTION (if cloud skill)
|
||||
8. Robot → Hub: CMD_RESULT (action results)
|
||||
9. Hub → Robot: SKILL_ACTION (next action) or final
|
||||
|
||||
**Proactive Flow Messages:**
|
||||
1. Robot → Hub: TRIGGER (trigger data)
|
||||
2. Robot → Hub: CONTEXT (runtime context)
|
||||
3. Hub → Robot: PROACTIVE (match or no-action)
|
||||
4. Hub → Robot: SKILL_ACTION (if cloud skill)
|
||||
|
||||
### HTTP Protocol
|
||||
|
||||
**Skill Request:**
|
||||
- Method: POST
|
||||
- URL: `http://skill-host:port/`
|
||||
- Headers: Authorization, x-jibo-transid, x-jibo-robotid
|
||||
- Body: SkillRequest JSON
|
||||
|
||||
**Parser Request:**
|
||||
- Method: POST
|
||||
- URL: `http://parser:8080/v1/parse`
|
||||
- Body: NLURequestData JSON
|
||||
|
||||
## Authentication & Security
|
||||
|
||||
### JWT Authentication
|
||||
|
||||
**Token Format:**
|
||||
```json
|
||||
{
|
||||
"id": "account-id",
|
||||
"accessKeyId": "client-id",
|
||||
"secretAccessKey": "client-secret",
|
||||
"friendlyId": "robot-name"
|
||||
}
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
- Secret: `ETCO_server_hubTokenSecret` environment variable
|
||||
- Scheme: Bearer
|
||||
- Applied to WebSocket connections and HTTP endpoints
|
||||
|
||||
### Network Security
|
||||
|
||||
- All services run in Docker containers
|
||||
- Services communicate via Docker network (pegasus-nw)
|
||||
- External access via load balancer
|
||||
- TLS termination at load balancer
|
||||
|
||||
## Deployment
|
||||
|
||||
### Docker Compose (Local Development)
|
||||
|
||||
**Services:**
|
||||
- `hub` - Hub service (port 9000)
|
||||
- `parser` - Parser service (port 9005)
|
||||
- `history` - History service (port 9006)
|
||||
- `chitchat-skill` - Chitchat skill (port 9004)
|
||||
- `report-skill` - Report skill (port 9003)
|
||||
- `lasso` - Lasso service (port 9007)
|
||||
- `redis` - Redis cache (port 6379)
|
||||
- `mongo_lasso` - MongoDB for Lasso (port 27017)
|
||||
- `history_cluster` - MongoDB for History (from docker-compose-history-db.yml)
|
||||
|
||||
**Configuration:**
|
||||
- Environment variables prefixed with `ETCO_` (ETCO = Environment TO Configuration)
|
||||
- Volume mounting: `./:/pegasus:consistent` for live code editing
|
||||
- Debug ports: 5850-5855 for Node.js debugging
|
||||
|
||||
### Build Process
|
||||
|
||||
**Commands:**
|
||||
```bash
|
||||
docker build -t pegasus_base:latest .
|
||||
yarn docker:bootstrap
|
||||
yarn docker:build
|
||||
./pegasus.js build-docker-image --services hub
|
||||
```
|
||||
|
||||
**CLI Tool** (`cli/`):
|
||||
- `bootstrap` - Install dependencies
|
||||
- `build` - Build TypeScript
|
||||
- `test` - Run tests
|
||||
- `docker-run` - Run commands in Docker
|
||||
- `build-docker-image` - Build Docker images for services
|
||||
|
||||
### Production Deployment
|
||||
|
||||
- AWS ECS (Elastic Container Service)
|
||||
- ECR (Elastic Container Registry) for Docker images
|
||||
- Application Load Balancer
|
||||
- MongoDB Atlas for production databases
|
||||
- ElastiCache for Redis
|
||||
- CloudWatch for logging
|
||||
- New Relic for monitoring
|
||||
|
||||
## Data Flow Examples
|
||||
|
||||
### Example 1: User Says "Tell Me a Joke"
|
||||
|
||||
1. **Robot → Hub**: LISTEN message with ASR config
|
||||
2. **Robot → Hub**: Audio stream
|
||||
3. **Hub**: Detects SOS, emits SOS message
|
||||
4. **Hub**: Streams audio to Google Cloud Speech API
|
||||
5. **Hub**: Detects EOS, emits EOS message
|
||||
6. **Robot → Hub**: CONTEXT message (runtime state)
|
||||
7. **Hub → Parser**: POST /v1/parse with text "tell me a joke"
|
||||
8. **Parser → Dialogflow**: Query with "joke" intent rules
|
||||
9. **Dialogflow → Parser**: Intent="joke_tell", entities={}
|
||||
10. **Parser → Hub**: NLU result
|
||||
11. **Hub → IntentRouter**: Match intent to "joke-skill"
|
||||
12. **Hub → joke-skill**: POST LISTEN_LAUNCH request
|
||||
13. **joke-skill**: Executes graph, selects joke
|
||||
14. **joke-skill → Hub**: SKILL_ACTION with JCP behavior (SayText)
|
||||
15. **Hub → Robot**: SKILL_ACTION message
|
||||
16. **Robot**: Executes behavior, speaks joke
|
||||
17. **Robot → Hub**: CMD_RESULT with action result
|
||||
18. **Hub → joke-skill**: POST LISTEN_UPDATE request
|
||||
19. **joke-skill**: Returns final=true
|
||||
20. **Hub → Robot**: Final SKILL_ACTION
|
||||
|
||||
### Example 2: Proactive Greeting
|
||||
|
||||
1. **Robot**: Detects person entering room
|
||||
2. **Robot → Hub**: TRIGGER message with trigger data
|
||||
3. **Robot → Hub**: CONTEXT message (runtime state)
|
||||
4. **Hub**: Queries all proactive skill configs
|
||||
5. **Hub**: Filters by context (time, people present)
|
||||
6. **Hub**: Filters by history (last greeting time)
|
||||
7. **Hub**: Filters by settings (user greeting preference)
|
||||
8. **Hub**: Selects "greeting-skill"
|
||||
9. **Hub → greeting-skill**: POST PROACTIVE_LAUNCH request
|
||||
10. **greeting-skill → Hub**: SKILL_ACTION with greeting behavior
|
||||
11. **Hub → Robot**: PROACTIVE response with match
|
||||
12. **Hub → Robot**: SKILL_ACTION message
|
||||
13. **Robot**: Executes greeting
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Types
|
||||
|
||||
**Hub Error Codes** (`HubErrorCode.ts`):
|
||||
- `TIMEOUT_ASR` - ASR timeout
|
||||
- `TIMEOUT_PARSER` - Parser timeout
|
||||
- `TIMEOUT_CONTEXT` - Context timeout
|
||||
- `TIMEOUT_SKILL` - Skill timeout
|
||||
- `PARSER` - Parser error
|
||||
- `ASR` - ASR error
|
||||
|
||||
**Skill Request Errors** (`SkillRequestError`):
|
||||
- `SKILL_NOT_FOUND` - Skill does not exist
|
||||
- `TIMEOUT` - Skill request timeout
|
||||
|
||||
### Error Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "ERROR",
|
||||
"msgID": "uuid",
|
||||
"ts": 1234567890,
|
||||
"final": true,
|
||||
"data": {
|
||||
"message": "Error description",
|
||||
"code": "ERROR_CODE"
|
||||
},
|
||||
"timings": {
|
||||
"total": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Timeout Handling
|
||||
|
||||
- ASR: 40 seconds (configurable via sosTimeout, maxSpeechTimeout)
|
||||
- Parser: 10 seconds
|
||||
- Context: 5 seconds
|
||||
- Skill: 10 seconds
|
||||
- Transaction: 60 seconds (configurable)
|
||||
|
||||
## Monitoring & Logging
|
||||
|
||||
### Logging
|
||||
|
||||
**jibo-log Integration:**
|
||||
- Per-namespace log levels
|
||||
- Transaction ID correlation
|
||||
- Robot ID tracking
|
||||
- Structured logging support
|
||||
|
||||
**Log Levels:**
|
||||
- Configured via `x-jibo-logging-config` header
|
||||
- Per-namespace granularity
|
||||
- Environment variable: `ETCO_server_logLevel`
|
||||
|
||||
### Monitoring
|
||||
|
||||
**New Relic:**
|
||||
- HTTP request tracking
|
||||
- WebSocket transaction tracking
|
||||
- Error tracking
|
||||
- Custom attributes (transID, robotID)
|
||||
|
||||
**Health Checks:**
|
||||
- `/healthcheck` endpoint on all services
|
||||
- Returns service-specific health data
|
||||
- Database connection status
|
||||
|
||||
### Speech History Recording
|
||||
|
||||
**Optional Features:**
|
||||
- Record skill launches to MongoDB
|
||||
- Record speech interactions to MongoDB
|
||||
- Upload speech logs to S3 (JSON with audio base64)
|
||||
|
||||
**Configuration:**
|
||||
- `ETCO_hub_recordLaunchHistory` - Enable launch history
|
||||
- `ETCO_hub_recordSpeechHistory` - Enable speech history
|
||||
- `ETCO_hub_recordSpeechLogBucket` - S3 bucket for speech logs
|
||||
|
||||
## Skill Development Guide
|
||||
|
||||
### Creating a New Skill
|
||||
|
||||
1. **Extend GraphSkill:**
|
||||
```typescript
|
||||
export class MySkill extends GraphSkill<Transition> {
|
||||
constructor() {
|
||||
super('my-skill');
|
||||
}
|
||||
|
||||
createGraph(): Graph<Transition> {
|
||||
const g = new Graph('My Skill', generateTransitions<Transition>(Transition));
|
||||
// Add nodes and transitions
|
||||
g.finalize();
|
||||
return g;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **Define Transitions:**
|
||||
```typescript
|
||||
enum Transition {
|
||||
Done = 'Done',
|
||||
Retry = 'Retry'
|
||||
}
|
||||
```
|
||||
|
||||
3. **Create Nodes:**
|
||||
```typescript
|
||||
class MyNode extends Node<Transition> {
|
||||
async enter(data: Data): Promise<EnterResponse> {
|
||||
// Return action or redirect
|
||||
return { action: myJCPAction };
|
||||
}
|
||||
|
||||
async exit(data: Data): Promise<ExitResponse> {
|
||||
// Return next transition
|
||||
return { transition: Transition.Done };
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. **Create Skill Manifest:**
|
||||
```json
|
||||
{
|
||||
"id": "my-skill",
|
||||
"intents": [
|
||||
{
|
||||
"name": "my_intent",
|
||||
"entities": []
|
||||
}
|
||||
],
|
||||
"onRobot": false
|
||||
}
|
||||
```
|
||||
|
||||
5. **Register with Hub:**
|
||||
- Add skill config to skills-local.json or environment
|
||||
- Deploy skill service
|
||||
- Hub will load configuration
|
||||
|
||||
### Skill Best Practices
|
||||
|
||||
- Use graph for complex flows, direct responses for simple ones
|
||||
- Track analytics events for monitoring
|
||||
- Handle errors gracefully with try-catch
|
||||
- Use supplemental behaviors for parallel actions
|
||||
- Set appropriate timeouts
|
||||
- Log important events
|
||||
- Test with both LISTEN_LAUNCH and PROACTIVE_LAUNCH
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### Why Graph-Based Skills?
|
||||
|
||||
- **State Management**: Explicit state machine with session tracking
|
||||
- **Visualization**: GraphViz generation for debugging
|
||||
- **Reusability**: Subgraphs for common patterns
|
||||
- **Testability**: Isolated node testing
|
||||
- **Maintainability**: Clear flow structure
|
||||
|
||||
### Why WebSocket for Robot Communication?
|
||||
|
||||
- **Low Latency**: Real-time bidirectional communication
|
||||
- **Audio Streaming**: Binary message support for audio
|
||||
- **Stateful**: Single connection per transaction
|
||||
- **Efficiency**: No HTTP overhead for each message
|
||||
|
||||
### Why Separate Services?
|
||||
|
||||
- **Scalability**: Scale each service independently
|
||||
- **Isolation**: Failure in one service doesn't affect others
|
||||
- **Technology**: Different services can use different tech stacks
|
||||
- **Deployment**: Independent deployment cycles
|
||||
|
||||
### Why Lerna Monorepo?
|
||||
|
||||
- **Code Sharing**: Easy to share interfaces and utils
|
||||
- **Versioning**: Linked versioning for interdependent packages
|
||||
- **Development**: Single repository for all services
|
||||
- **Testing**: Integration tests across packages
|
||||
|
||||
## Limitations & Known Issues
|
||||
|
||||
1. **Single Graph Manager**: Skills cannot have concurrent sessions (singleton pattern)
|
||||
2. **Sequential Skill Redirects**: Only one level of redirect supported
|
||||
3. **No Skill-to-Skill Communication**: Skills must go through hub
|
||||
4. **Fixed Timeouts**: Hardcoded timeouts in some places
|
||||
5. **No Skill Hot-Reload**: Requires container rebuild for skill changes
|
||||
6. **Limited NLU**: Dialogflow dependency, no custom model training
|
||||
7. **No Skill Versioning**: Skills identified by ID only
|
||||
8. **Synchronous Skill Requests**: Hub waits for skill response (no async)
|
||||
|
||||
## Future Considerations
|
||||
|
||||
1. **Skill Versioning**: Support multiple versions of same skill
|
||||
2. **Skill-to-Skill Direct Communication**: Allow skills to call each other
|
||||
3. **Async Skill Responses**: Long-running skills with callback pattern
|
||||
4. **Custom NLU Models**: Support for custom trained models
|
||||
5. **Skill Hot-Reload**: Dynamic skill loading without restart
|
||||
6. **Multi-Session Skills**: Support concurrent skill sessions
|
||||
7. **Skill Marketplace**: Third-party skill distribution
|
||||
8. **A/B Testing**: Framework for testing skill variations
|
||||
|
||||
## Conclusion
|
||||
|
||||
The original Jibo server (Pegasus) is a well-architected microservices system that provides a robust foundation for conversational AI on the Jibo robot. The graph-based skill framework offers flexibility and maintainability, while the separation of concerns enables independent scaling and development. The system successfully handles real-time speech processing, natural language understanding, skill routing, and proactive behaviors in a distributed cloud environment.
|
||||
Reference in New Issue
Block a user