Original server design doc

2026-05-23 00:21:42 +03:00
parent 764a2b2d4f
commit bca138ecc8
2 changed files with 793 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -420,3 +420,4 @@ FodyWeavers.xsd
 OpenJibo/captures/
 OpenJibo/.tmp/

+OpenJibo/docs/DesignDoc/original server
--- a/OpenJibo/docs/DesignDoc/original-server-design.md
+++ b/OpenJibo/docs/DesignDoc/original-server-design.md
@@ -0,0 +1,792 @@
+# Original Jibo Server (Pegasus) Design Document
+
+## Executive Summary
+
+The original Jibo server, codenamed "Pegasus" (formerly V1.X), is a cloud-based microservices architecture that powers the Jibo social robot's conversational AI capabilities. It is built as a Lerna monorepo using Node.js/TypeScript and deployed via Docker containers. The system processes speech, performs natural language understanding, routes to appropriate skills, and manages proactive behaviors.
+
+## Architecture Overview
+
+### Monorepo Structure
+
+The codebase is organized as a Lerna monorepo with the following main packages:
+
+- **packages/hub** - Central orchestration service
+- **packages/parser** - NLU (Natural Language Understanding) service
+- **packages/history** - Data persistence service (MongoDB)
+- **packages/baseskill** - Base class and framework for cloud skills
+- **packages/interfaces** - TypeScript interfaces and API contracts
+- **packages/utils** - Shared utility libraries
+- **packages/chitchat-skill** - Example conversational skill
+- **packages/report-skill** - Reporting skill
+- **packages/lasso** - External data integration service
+- **packages/hub-client** - Client library for hub communication
+- **packages/history-client** - Client library for history service
+- **packages/test-utils** - Testing utilities
+
+### Technology Stack
+
+- **Language**: TypeScript 2.5.3
+- **Runtime**: Node.js 8.9.4
+- **Package Manager**: Yarn 1.7.0
+- **Containerization**: Docker
+- **Orchestration**: Docker Compose (local), AWS ECS (production)
+- **Database**: MongoDB 3.6.0
+- **Cache**: Redis 3
+- **NLU**: Dialogflow (API.ai)
+- **ASR**: Google Cloud Speech API
+- **WebSocket**: ws library
+- **HTTP**: Express.js
+- **Authentication**: JWT (jsonwebtoken)
+
+## Core Services
+
+### 1. Hub Service (`packages/hub`)
+
+The Hub is the central orchestrator that coordinates all interactions between the robot and cloud services.
+
+#### Key Components
+
+**HubService** (`HubService.ts`)
+- Main service class extending `BaseService`
+- Initializes and manages all hub components
+- Registers WebSocket and HTTP handlers
+
+**HubComponents** - Dependency injection container:
+- `parser: ParserClient` - NLU service client
+- `skillConfigManager: SkillConfigManager` - Manages skill configurations
+- `intentRouter: IntentRouter` - Routes intents to skills
+- `skillRequestMaker: SkillRequestMaker` - Makes HTTP requests to skills
+- `history: HistoryServiceClient` - History service client
+- `hubSettings: HubSettings` - Hub configuration
+- `settingsClient: SettingsClient` - Settings service client
+
+#### Endpoints
+
+**WebSocket Endpoints:**
+- `/listen` and `/v1/listen` - Handles speech recognition and NLU
+- `/proactive` and `/v1/proactive` - Handles proactive triggers
+
+**HTTP Endpoints:**
+- `/skills` and `/v1/skills` - Lists available skills
+- `/healthcheck` - Service health check
+
+#### Listen Flow
+
+The listen transaction follows a state machine implemented in `ListenTransactionHandler`:
+
+```
+States:
+  WAIT_LISTEN → ASR → NLU → ROUTE → DONE
+  WAIT_LISTEN → WAIT_CLIENT_ASR → NLU → ROUTE → DONE
+  WAIT_LISTEN → WAIT_CLIENT_NLU → ROUTE → DONE
+```
+
+**State Transitions:**
+
+1. **WAIT_LISTEN** - Receives LISTEN message from robot
+2. **ASR** - Performs Automatic Speech Recognition using Google Cloud Speech API
+   - Streams audio packets
+   - Emits SOS (Start of Speech) when speech detected
+   - Emits EOS (End of Speech) when speech ends
+   - Handles timeouts (SOS timeout, max speech timeout)
+3. **NLU** - Sends ASR text to Parser service for intent recognition
+   - Includes context (loop users, perception, etc.)
+   - Supports external Dialogflow agents
+4. **ROUTE** - Intent Router determines which skill to launch
+   - Matches NLU result against skill intent configurations
+   - Decision Mediator can alter decisions based on external factors
+   - Routes to on-robot skills or cloud skills
+5. **DONE** - Transaction complete
+
+**Listen Transaction Handler** (`ListenTransactionHandler.ts`):
+- Manages audio streaming via `AudioBuffer`
+- Creates `ASRSession` for speech recognition
+- Handles timeouts (ASR: 40s, Parser: 10s, Context: 5s, Skill: 10s)
+- Records speech history to MongoDB and optionally S3
+- Supports client-provided ASR/NLU (for menu clicks, etc.)
+- Handles skill redirects
+
+#### Proactive Flow
+
+The proactive system allows Jibo to initiate conversations based on context, history, and triggers.
+
+**Proactive Transaction Handler** (`ProactiveTransactionHandler.ts`):
+
+1. Receives TRIGGER message from robot
+2. Waits for CONTEXT message (robot state)
+3. **Action Selection**:
+   - Gets all proactive skill configurations
+   - Filters by context rules (time, location, people present, etc.)
+   - Filters by interaction history rules (frequency, recency)
+   - Filters by user settings
+   - Randomly selects from eligible actions
+4. Launches selected skill (on-robot or cloud)
+5. Returns match response or no-action response
+
+**Proactive Registration**:
+Skills register proactive behaviors with:
+- Trigger types (time-based, event-based, surprise)
+- Context rules (when this can trigger)
+- Interaction history rules (how often it can trigger)
+- Settings rules (user preferences)
+
+### 2. Parser Service (`packages/parser`)
+
+The Parser service performs Natural Language Understanding using Dialogflow.
+
+**ParserService** (`ParserService.ts`):
+- Starts RobustParser process on port 8787 (optional)
+- Initializes Dialogflow client
+- Initializes Robust Parser client
+- Handles POST requests to `/v1/parse`
+- Exposes state at `/state` endpoint
+
+**NLU Pipeline:**
+1. Receives text, rules, and context
+2. Queries Dialogflow with configured agents
+3. Optionally queries Robust Parser (custom NLU)
+4. Returns intent, entities, and rules
+
+**Configuration:**
+- Dialogflow API key
+- Robust Parser enable/disable
+- Multiple external agents support
+
+### 3. History Service (`packages/history`)
+
+The History service persists interaction data to MongoDB.
+
+**HistoryService** (`HistoryService.ts`):
+- Two database clients:
+  - `SkillLaunchDBClient` - Records skill launches
+  - `SpeechHistoryDBClient` - Records speech interactions (optional)
+- HTTP endpoints:
+  - `/v1/skill/launch` - Skill launch history
+  - `/v1/speech` - Speech history (if enabled)
+- Health check endpoint
+
+**Data Stored:**
+- Skill launches (skill ID, intent, timestamp, robot ID, account ID)
+- Speech interactions (ASR result, NLU result, audio file URL, error tracking)
+
+### 4. Lasso Service (`packages/lasso`)
+
+Lasso provides external data integration for skills.
+
+**Features:**
+- OAuth2 credential management
+- Calendar client integration
+- Weather data (Dark Sky API)
+- Maps data (Google Maps API)
+- News data (AP News)
+- MongoDB for credential storage
+- Redis for caching
+
+**LassoService** (`LassoService.ts`):
+- Manages OAuth2 flows
+- Provides relay endpoints for external APIs
+- Caches responses in Redis
+
+## Skill Framework
+
+### BaseSkill (`packages/baseskill`)
+
+**BaseSkill** (`BaseSkill.ts`):
+- Abstract base class for all cloud skills
+- Extends `BaseHttpHandler`
+- Handles POST requests to `/`
+- Provides error handling
+- Tracks timing
+
+**GraphSkill** (`GraphSkill.ts`):
+- Extends BaseSkill with graph-based state machine
+- Implements node-based conversation flow
+- Supports skill redirects
+- Tracks analytics events
+- Supports supplemental behaviors (parallel/sequence)
+
+### Graph System
+
+The graph system provides a state machine framework for skills.
+
+**Graph** (`Graph.ts`):
+- Directed graph of connected nodes
+- Supports subgraphs (hierarchical)
+- Exit transitions for graph termination
+- Validation (reachability, transition completeness)
+- GraphViz dot file generation
+
+**GraphManager** (`GraphManager.ts`):
+- Singleton per skill
+- Manages node IDs and mappings
+- Executes graph:
+  - `start()` - Creates session, enters initial node
+  - `enterNode()` - Calls node's enter method
+  - `exitNode()` - Calls node's exit method with action results
+  - `executeTransition()` - Moves to next node
+- Maintains session state (node ID, data, trace)
+
+**Node** (`Node.ts`):
+- Abstract base class for graph nodes
+- Has transition names and destinations
+- Two lifecycle methods:
+  - `enter(data)` - Called when node is entered, returns action or redirect
+  - `exit(data)` - Called with action results, returns next transition
+- Supports graph traversal (BFS)
+
+**Built-in Node Types:**
+- `DefaultNode` - Simple terminal node
+- `JCPNode` - Returns JCP action
+- `NoOpNode` - No operation
+- `TrueFalseNode` - Conditional branching
+- `SetLooperIDNode` - Sets speaker ID
+
+**MIM (Motion Interaction Model) System:**
+- `ANFactory` - Creates graph for playing MIM animations
+- Supports scripted responses, emotion responses, fallback responses
+- Semi-specific responses (context-aware)
+
+### Skill Request/Response Protocol
+
+**Skill Request Types** (`skill/request.ts`):
+- `LISTEN_LAUNCH` - Launch skill from listen interaction
+- `LISTEN_UPDATE` - Update skill with action results
+- `PROACTIVE_LAUNCH` - Launch skill proactively
+
+**Skill Request Data:**
+```typescript
+{
+  type: MessageType,
+  msgID: UUID,
+  ts: number,
+  data: {
+    general: { accountID, robotID, lang, release },
+    runtime: { character, location, loop, perception, dialog },
+    skill: { id, session? },
+    result: any,  // Action results for UPDATE
+    nlu: NLUResult,
+    asr: ASRResult,
+    memo?: any
+  }
+}
+```
+
+**Skill Response Types** (`skill/response.ts`):
+- `SKILL_ACTION` - Returns action to execute
+- `SKILL_REDIRECT` - Redirects to another skill
+- `ERROR` - Error response
+
+**Skill Action Data:**
+```typescript
+{
+  action: JCPAction,  // JCP protocol behavior
+  analytics?: AnalyticsData,
+  final?: boolean,  // Is this the final response?
+  fireAndForget?: boolean
+}
+```
+
+**JCP Action** (`skill/action.ts`):
+```typescript
+{
+  type: ActionType.JCP,
+  config: {
+    version: "1.0.0",
+    jcp: SupportedBehaviors  // SLIM, Sequence, Parallel, SetPresentPerson, ImpactEmotion
+  }
+}
+```
+
+### Skill Configuration
+
+**SkillConfig** (`skill/config.ts`):
+```typescript
+{
+  id: SkillID,
+  intents: [{
+    name: IntentName,
+    entities?: EntityConfig[],
+    memo?: any
+  }],
+  proactives?: ProactiveRegistration[],
+  IHQueries?: IHQueryDefinitions,
+  onRobot?: boolean,
+  URL: string,
+  settings?: ManifestSettings
+}
+```
+
+**Entity Config**:
+- `name` - Entity name
+- `value` - Expected value
+- `matchRule` - 'EXACT' or 'NOT'
+
+**Proactive Registration**:
+- Trigger type and conditions
+- Context rules
+- Interaction history rules
+- Settings rules
+
+## Interfaces Package
+
+The `interfaces` package defines all TypeScript interfaces for communication between services.
+
+### Key Interface Modules
+
+**service.ts** - Base message types:
+- `BaseMessage<T, D>` - Generic message with type, msgID, timestamp, data
+- `BaseResponse<T, D>` - Response with final flag and timings
+- `IAuthDetails` - Authentication details (account ID, access keys)
+
+**hub/** - Hub-specific interfaces:
+- `request.ts` - LISTEN, CONTEXT, CLIENT_ASR, CLIENT_NLU messages
+- `response.ts` - ASR, NLU, LISTEN, SKILL_REDIRECT, ERROR responses
+- `MessageType.ts` - Message type enums
+- `HubErrorCode.ts` - Error code enums
+
+**skill/** - Skill-specific interfaces:
+- `request.ts` - LISTEN_LAUNCH, LISTEN_UPDATE, PROACTIVE_LAUNCH
+- `response.ts` - SKILL_ACTION, SKILL_REDIRECT, ERROR
+- `action.ts` - JCP action types
+- `config.ts` - Skill configuration
+- `behaviors.ts` - Supported JCP behaviors
+- `analytics.ts` - Analytics event types
+
+**nlu.ts** - NLU interfaces:
+- `NLURequestData` - Text, rules, loop users, external agents
+- `NLUResult` - Intent, entities, rules
+- `ExternalAgentRequest` - External Dialogflow agent config
+
+**asr.ts** - ASR interfaces:
+- `ASRResult` - Text, confidence, annotation
+- `ASRConfig` - Language, hints, timeouts
+
+**jibo/** - Jibo-specific data:
+- `data.ts` - GeneralData (account, robot, language), SkillData (session, trace)
+- `runtime.ts` - RuntimeContext (character, location, loop, perception, dialog)
+
+**proactive/** - Proactive interfaces:
+- Context field definitions
+- History rules
+- Settings rules
+- Proactive trigger/request/response
+
+**history/** - History interfaces:
+- Skill launch data
+- Speech history data
+
+## Utils Package
+
+The `utils` package provides shared functionality.
+
+### BaseService (`utils/service/BaseService.ts`)
+
+Base class for all Pegasus services:
+
+**Features:**
+- Express.js HTTP server
+- WebSocket server (ws library)
+- JWT authentication
+- Request/response logging with jibo-log
+- New Relic monitoring
+- Health check endpoint
+- Error handling middleware
+
+**Methods:**
+- `addSocketHandler(path, handler)` - Register WebSocket handler
+- `addHttpHandler(path, handler)` - Register HTTP handler
+- `init(port)` - Start server
+- `close()` - Stop server
+
+**Authentication:**
+- JWT token verification
+- Bearer token scheme
+- Configurable secret via `ETCO_server_hubTokenSecret`
+
+**Logging:**
+- Per-request log instances
+- Transaction ID tracking
+- Robot ID tracking
+- Configurable log levels per namespace
+
+### Other Utils
+
+- `PegasusRequest` - Enhanced Express request with Jibo headers
+- `PegasusWebSocket` - Enhanced WebSocket with auth and logging
+- `JiboHeaders` - Parses Jibo-specific headers (transID, robotID, logging config)
+- `ResponseWrapper` - Wraps WebSocket responses
+- `HttpError` - HTTP error with status code
+
+## Communication Protocols
+
+### WebSocket Protocol
+
+**Connection:**
+- URL: `ws://hub:9000/listen` or `ws://hub:9000/proactive`
+- Authentication: Bearer token in Authorization header
+- Headers: `x-jibo-transid`, `x-jibo-robotid`, `x-jibo-logging-config`
+
+**Message Format:**
+```json
+{
+  "type": "MESSAGE_TYPE",
+  "msgID": "uuid",
+  "ts": 1234567890,
+  "data": { ... }
+}
+```
+
+**Listen Flow Messages:**
+1. Robot → Hub: LISTEN (with ASR config, rules, language)
+2. Robot → Hub: Audio packets (binary)
+3. Hub → Robot: SOS (Start of Speech)
+4. Robot → Hub: CONTEXT (runtime context)
+5. Hub → Robot: EOS (End of Speech)
+6. Hub → Robot: LISTEN (with ASR result, NLU result, match)
+7. Hub → Robot: SKILL_ACTION (if cloud skill)
+8. Robot → Hub: CMD_RESULT (action results)
+9. Hub → Robot: SKILL_ACTION (next action) or final
+
+**Proactive Flow Messages:**
+1. Robot → Hub: TRIGGER (trigger data)
+2. Robot → Hub: CONTEXT (runtime context)
+3. Hub → Robot: PROACTIVE (match or no-action)
+4. Hub → Robot: SKILL_ACTION (if cloud skill)
+
+### HTTP Protocol
+
+**Skill Request:**
+- Method: POST
+- URL: `http://skill-host:port/`
+- Headers: Authorization, x-jibo-transid, x-jibo-robotid
+- Body: SkillRequest JSON
+
+**Parser Request:**
+- Method: POST
+- URL: `http://parser:8080/v1/parse`
+- Body: NLURequestData JSON
+
+## Authentication & Security
+
+### JWT Authentication
+
+**Token Format:**
+```json
+{
+  "id": "account-id",
+  "accessKeyId": "client-id",
+  "secretAccessKey": "client-secret",
+  "friendlyId": "robot-name"
+}
+```
+
+**Verification:**
+- Secret: `ETCO_server_hubTokenSecret` environment variable
+- Scheme: Bearer
+- Applied to WebSocket connections and HTTP endpoints
+
+### Network Security
+
+- All services run in Docker containers
+- Services communicate via Docker network (pegasus-nw)
+- External access via load balancer
+- TLS termination at load balancer
+
+## Deployment
+
+### Docker Compose (Local Development)
+
+**Services:**
+- `hub` - Hub service (port 9000)
+- `parser` - Parser service (port 9005)
+- `history` - History service (port 9006)
+- `chitchat-skill` - Chitchat skill (port 9004)
+- `report-skill` - Report skill (port 9003)
+- `lasso` - Lasso service (port 9007)
+- `redis` - Redis cache (port 6379)
+- `mongo_lasso` - MongoDB for Lasso (port 27017)
+- `history_cluster` - MongoDB for History (from docker-compose-history-db.yml)
+
+**Configuration:**
+- Environment variables prefixed with `ETCO_` (ETCO = Environment TO Configuration)
+- Volume mounting: `./:/pegasus:consistent` for live code editing
+- Debug ports: 5850-5855 for Node.js debugging
+
+### Build Process
+
+**Commands:**
+```bash
+docker build -t pegasus_base:latest .
+yarn docker:bootstrap
+yarn docker:build
+./pegasus.js build-docker-image --services hub
+```
+
+**CLI Tool** (`cli/`):
+- `bootstrap` - Install dependencies
+- `build` - Build TypeScript
+- `test` - Run tests
+- `docker-run` - Run commands in Docker
+- `build-docker-image` - Build Docker images for services
+
+### Production Deployment
+
+- AWS ECS (Elastic Container Service)
+- ECR (Elastic Container Registry) for Docker images
+- Application Load Balancer
+- MongoDB Atlas for production databases
+- ElastiCache for Redis
+- CloudWatch for logging
+- New Relic for monitoring
+
+## Data Flow Examples
+
+### Example 1: User Says "Tell Me a Joke"
+
+1. **Robot → Hub**: LISTEN message with ASR config
+2. **Robot → Hub**: Audio stream
+3. **Hub**: Detects SOS, emits SOS message
+4. **Hub**: Streams audio to Google Cloud Speech API
+5. **Hub**: Detects EOS, emits EOS message
+6. **Robot → Hub**: CONTEXT message (runtime state)
+7. **Hub → Parser**: POST /v1/parse with text "tell me a joke"
+8. **Parser → Dialogflow**: Query with "joke" intent rules
+9. **Dialogflow → Parser**: Intent="joke_tell", entities={}
+10. **Parser → Hub**: NLU result
+11. **Hub → IntentRouter**: Match intent to "joke-skill"
+12. **Hub → joke-skill**: POST LISTEN_LAUNCH request
+13. **joke-skill**: Executes graph, selects joke
+14. **joke-skill → Hub**: SKILL_ACTION with JCP behavior (SayText)
+15. **Hub → Robot**: SKILL_ACTION message
+16. **Robot**: Executes behavior, speaks joke
+17. **Robot → Hub**: CMD_RESULT with action result
+18. **Hub → joke-skill**: POST LISTEN_UPDATE request
+19. **joke-skill**: Returns final=true
+20. **Hub → Robot**: Final SKILL_ACTION
+
+### Example 2: Proactive Greeting
+
+1. **Robot**: Detects person entering room
+2. **Robot → Hub**: TRIGGER message with trigger data
+3. **Robot → Hub**: CONTEXT message (runtime state)
+4. **Hub**: Queries all proactive skill configs
+5. **Hub**: Filters by context (time, people present)
+6. **Hub**: Filters by history (last greeting time)
+7. **Hub**: Filters by settings (user greeting preference)
+8. **Hub**: Selects "greeting-skill"
+9. **Hub → greeting-skill**: POST PROACTIVE_LAUNCH request
+10. **greeting-skill → Hub**: SKILL_ACTION with greeting behavior
+11. **Hub → Robot**: PROACTIVE response with match
+12. **Hub → Robot**: SKILL_ACTION message
+13. **Robot**: Executes greeting
+
+## Error Handling
+
+### Error Types
+
+**Hub Error Codes** (`HubErrorCode.ts`):
+- `TIMEOUT_ASR` - ASR timeout
+- `TIMEOUT_PARSER` - Parser timeout
+- `TIMEOUT_CONTEXT` - Context timeout
+- `TIMEOUT_SKILL` - Skill timeout
+- `PARSER` - Parser error
+- `ASR` - ASR error
+
+**Skill Request Errors** (`SkillRequestError`):
+- `SKILL_NOT_FOUND` - Skill does not exist
+- `TIMEOUT` - Skill request timeout
+
+### Error Response Format
+
+```json
+{
+  "type": "ERROR",
+  "msgID": "uuid",
+  "ts": 1234567890,
+  "final": true,
+  "data": {
+    "message": "Error description",
+    "code": "ERROR_CODE"
+  },
+  "timings": {
+    "total": 1234
+  }
+}
+```
+
+### Timeout Handling
+
+- ASR: 40 seconds (configurable via sosTimeout, maxSpeechTimeout)
+- Parser: 10 seconds
+- Context: 5 seconds
+- Skill: 10 seconds
+- Transaction: 60 seconds (configurable)
+
+## Monitoring & Logging
+
+### Logging
+
+**jibo-log Integration:**
+- Per-namespace log levels
+- Transaction ID correlation
+- Robot ID tracking
+- Structured logging support
+
+**Log Levels:**
+- Configured via `x-jibo-logging-config` header
+- Per-namespace granularity
+- Environment variable: `ETCO_server_logLevel`
+
+### Monitoring
+
+**New Relic:**
+- HTTP request tracking
+- WebSocket transaction tracking
+- Error tracking
+- Custom attributes (transID, robotID)
+
+**Health Checks:**
+- `/healthcheck` endpoint on all services
+- Returns service-specific health data
+- Database connection status
+
+### Speech History Recording
+
+**Optional Features:**
+- Record skill launches to MongoDB
+- Record speech interactions to MongoDB
+- Upload speech logs to S3 (JSON with audio base64)
+
+**Configuration:**
+- `ETCO_hub_recordLaunchHistory` - Enable launch history
+- `ETCO_hub_recordSpeechHistory` - Enable speech history
+- `ETCO_hub_recordSpeechLogBucket` - S3 bucket for speech logs
+
+## Skill Development Guide
+
+### Creating a New Skill
+
+1. **Extend GraphSkill:**
+```typescript
+export class MySkill extends GraphSkill<Transition> {
+  constructor() {
+    super('my-skill');
+  }
+
+  createGraph(): Graph<Transition> {
+    const g = new Graph('My Skill', generateTransitions<Transition>(Transition));
+    // Add nodes and transitions
+    g.finalize();
+    return g;
+  }
+}
+```
+
+2. **Define Transitions:**
+```typescript
+enum Transition {
+  Done = 'Done',
+  Retry = 'Retry'
+}
+```
+
+3. **Create Nodes:**
+```typescript
+class MyNode extends Node<Transition> {
+  async enter(data: Data): Promise<EnterResponse> {
+    // Return action or redirect
+    return { action: myJCPAction };
+  }
+
+  async exit(data: Data): Promise<ExitResponse> {
+    // Return next transition
+    return { transition: Transition.Done };
+  }
+}
+```
+
+4. **Create Skill Manifest:**
+```json
+{
+  "id": "my-skill",
+  "intents": [
+    {
+      "name": "my_intent",
+      "entities": []
+    }
+  ],
+  "onRobot": false
+}
+```
+
+5. **Register with Hub:**
+- Add skill config to skills-local.json or environment
+- Deploy skill service
+- Hub will load configuration
+
+### Skill Best Practices
+
+- Use graph for complex flows, direct responses for simple ones
+- Track analytics events for monitoring
+- Handle errors gracefully with try-catch
+- Use supplemental behaviors for parallel actions
+- Set appropriate timeouts
+- Log important events
+- Test with both LISTEN_LAUNCH and PROACTIVE_LAUNCH
+
+## Key Design Decisions
+
+### Why Graph-Based Skills?
+
+- **State Management**: Explicit state machine with session tracking
+- **Visualization**: GraphViz generation for debugging
+- **Reusability**: Subgraphs for common patterns
+- **Testability**: Isolated node testing
+- **Maintainability**: Clear flow structure
+
+### Why WebSocket for Robot Communication?
+
+- **Low Latency**: Real-time bidirectional communication
+- **Audio Streaming**: Binary message support for audio
+- **Stateful**: Single connection per transaction
+- **Efficiency**: No HTTP overhead for each message
+
+### Why Separate Services?
+
+- **Scalability**: Scale each service independently
+- **Isolation**: Failure in one service doesn't affect others
+- **Technology**: Different services can use different tech stacks
+- **Deployment**: Independent deployment cycles
+
+### Why Lerna Monorepo?
+
+- **Code Sharing**: Easy to share interfaces and utils
+- **Versioning**: Linked versioning for interdependent packages
+- **Development**: Single repository for all services
+- **Testing**: Integration tests across packages
+
+## Limitations & Known Issues
+
+1. **Single Graph Manager**: Skills cannot have concurrent sessions (singleton pattern)
+2. **Sequential Skill Redirects**: Only one level of redirect supported
+3. **No Skill-to-Skill Communication**: Skills must go through hub
+4. **Fixed Timeouts**: Hardcoded timeouts in some places
+5. **No Skill Hot-Reload**: Requires container rebuild for skill changes
+6. **Limited NLU**: Dialogflow dependency, no custom model training
+7. **No Skill Versioning**: Skills identified by ID only
+8. **Synchronous Skill Requests**: Hub waits for skill response (no async)
+
+## Future Considerations
+
+1. **Skill Versioning**: Support multiple versions of same skill
+2. **Skill-to-Skill Direct Communication**: Allow skills to call each other
+3. **Async Skill Responses**: Long-running skills with callback pattern
+4. **Custom NLU Models**: Support for custom trained models
+5. **Skill Hot-Reload**: Dynamic skill loading without restart
+6. **Multi-Session Skills**: Support concurrent skill sessions
+7. **Skill Marketplace**: Third-party skill distribution
+8. **A/B Testing**: Framework for testing skill variations
+
+## Conclusion
+
+The original Jibo server (Pegasus) is a well-architected microservices system that provides a robust foundation for conversational AI on the Jibo robot. The graph-based skill framework offers flexibility and maintainability, while the separation of concerns enables independent scaling and development. The system successfully handles real-time speech processing, natural language understanding, skill routing, and proactive behaviors in a distributed cloud environment.