# Hybrid Jibo Runtime Plan ## Goal Build a **modern local-first Jibo runtime** while preserving the parts of native Jibo that are still useful: * native wake/turn plumbing where helpful * native skills where helpful * native embodiment and rendering * fast experimentation in **.NET 10** off-robot Jibo’s native runtime already exposes a layered service model centered around **Jetstream** for turn/event flow, **GlobalManagerService** for routing, **SkillsService** for skill lifecycle, and **ExpressionService** for embodiment/rendering. The SSM startup is config-driven and mode-driven, which suggests a hybrid mode is a viable path. --- ## Architecture Direction We will keep the **main experimental runtime in .NET 10** and treat Jibo as an embodied endpoint with a thin bridge layer. That means: * **off-robot**: conversation logic, planning, AI routing, capabilities * **on-robot**: thin adapter/bridge to native Jibo services * **native Jibo**: reuse rendering, skill hosting, and useful event seams --- ## High-Level ASCII Flowchart ```text +--------------------------------------------------------------+ | NATIVE JIBO LAYER | |--------------------------------------------------------------| | Wake / Turn Events | | - Jetstream | | - hjHeard / turn started / turn result | | | | Native Services | | - GlobalManagerService | | - SkillsService | | - ExpressionService | | - TTS / Body / Visual / Motion services | +------------------------------+-------------------------------+ | | events / hooks / commands v +--------------------------------------------------------------+ | JIBO BRIDGE LAYER | |--------------------------------------------------------------| | Thin adapter between Jibo and modern runtime | | | | Responsibilities: | | - receive turn/wake events | | - receive skill context / native state | | - forward normalized events to .NET runtime | | - accept ResponsePlans / commands from .NET runtime | | - invoke native skills / expression / TTS / visuals | +------------------------------+-------------------------------+ | | normalized turn context v +--------------------------------------------------------------+ | MODERN .NET 10 RUNTIME | |--------------------------------------------------------------| | Conversation Broker | | - session state | | - follow-up windows | | - topic/context tracking | | | | STT Strategy Selector | | - native transcript | | - local STT | | - cloud STT | | | | Brain Strategy Selector | | - skill/rules path | | - local AI | | - cloud AI | | - hybrid routing | | | | Action / Orchestration Planner | | - gestures / visuals / ESML / delegation | | - capability/tool calls | | - build final ResponsePlan | | | | Capability Registry | | - weather / time / reminders / tools | | - native skill delegation | | - robot expression helpers | +------------------------------+-------------------------------+ | | ResponsePlan / commands v +--------------------------------------------------------------+ | EXECUTION TARGETS | |--------------------------------------------------------------| | - Native SkillsService | | - Native ExpressionService | | - Native TTS / visuals / motion | | - Local AI backends | | - Cloud AI backends | | - External APIs / tools | +--------------------------------------------------------------+ ``` --- ## Runtime Flow ```text [Wake Word / Turn / Follow-up] | v [Jibo Native Events] | v [Jibo Bridge Layer] | v [Conversation Broker (.NET)] | v [STT Strategy Selection] | v [Brain Strategy Selection] / | \ / | \ [Skill/Rules] [Local AI] [Cloud AI] \ | / \ | / [Planner] | v [ResponsePlan Built] | v [Jibo Bridge Layer] | v [Skills / Expression / TTS / Motion / Visuals] | v [Follow-up Window or Timeout] ``` --- ## Planned Hybrid Mode Jibo’s startup and service composition are mode-driven and config-driven, so the long-term plan is to add a **new custom mode** rather than replacing stock behavior outright. ### Candidate mode names * `hybrid` * `openjibo` * `revival` * `local-first` ### Intent of the mode The custom mode should: * preserve normal mode for stock behavior * preserve developer mode for native debugging * enable the bridge/runtime path for hybrid experiments * allow selective routing between old and new Jibo behavior --- ## Design Principles ### 1. Keep Jibo-specific code at the edges The .NET runtime should know about: * turns * sessions * plans * capabilities * render actions It should **not** depend directly on: * Electron internals * SSM implementation quirks * old Linux deployment constraints ### 2. Reuse native embodiment Native Jibo rendering is valuable. ExpressionService appears to own animation, attention, DOF arbitration, and embodied output, so it should be reused as long as possible. ### 3. Replace cognition before replacing embodiment The first thing to modernize is: * routing * planning * AI selection * follow-up conversation behavior Not necessarily: * body motion * TTS * expression plumbing ### 4. Favor thin robot-side code The bridge on Jibo should stay small and stable. Fast-moving logic belongs in .NET 10. ### 5. Everything should converge to a ResponsePlan Regardless of source: * skill * rules engine * local AI * cloud AI the result should become a single normalized response/output plan. --- ## Native Jibo Mapping Based on current reverse engineering, the native service boundaries map roughly like this: Jetstream is the turn/event seam, GlobalManagerService performs routing and skill-launch logic, SkillsService manages skill lifecycle, and ExpressionService handles embodiment/rendering. ```text Our Concept Native Jibo Equivalent ---------------------------- -------------------------------- Wake / Turn Source Jetstream Conversation Broker split across Jetstream + routing Brain Selection GlobalManagerService + skills Skill Execution SkillsService Renderer / Embodiment ExpressionService ``` --- ## Proposed Project Layout ```text /src /Jibo.Runtime Core runtime orchestration - ConversationBroker - Session state - Turn pipeline - ResponsePlan builder /Jibo.Runtime.Abstractions Interfaces and models - ITurnSource - ISttStrategy - IBrainStrategy - IResponsePlanner - IRobotAdapter - TurnContext - ResponsePlan /Jibo.Bridge Jibo adapter / compatibility layer - robot event ingestion - command dispatch back to Jibo - native hook integration /Jibo.Brain.Rules deterministic routing / skills / decision tree /Jibo.Brain.Local local AI experiments /Jibo.Brain.Cloud cloud AI experiments /Jibo.Capabilities tools and callable capabilities - weather - time - reminders - skill delegation - expression helpers /Jibo.Simulator fake robot target for testing ResponsePlans /docs architecture notes traces ``` --- ## Initial Build Plan ### Phase 1 — Contracts and runtime skeleton Build the core models and interfaces first: * `TurnContext` * `ConversationSession` * `SttResult` * `BrainDecision` * `ResponsePlan` * `RenderAction` * `FollowupPolicy` ### Phase 2 — Minimal broker Implement: * session open/close * follow-up timeout * topic/context tracking ### Phase 3 — Bridge skeleton Create the adapter boundary for: * inbound Jibo events * outbound robot commands Even if the first version is mocked, keep the interface stable. ### Phase 4 — First working path Implement a narrow vertical slice: * input turn * decision/rules path * weather example * TTS response * follow-up window ### Phase 5 — Native integration expansion Add native delegation for: * skills * expression * visuals * gestures * local turn/open follow-up behavior ### Phase 6 — Hybrid AI routing Add: * local AI path * cloud AI path * confidence/routing policy --- ## First Vertical Slice Recommended first demonstration: ### Example User says: > Hey Jibo, what’s the weather? System flow: 1. Jibo event arrives through bridge 2. .NET broker opens a session 3. transcript enters routing 4. weather capability is called 5. planner builds a `ResponsePlan` 6. bridge sends speech + visual action back to Jibo 7. follow-up window stays open Then: > What about the low tonight? The same session stays active without wake word if the follow-up window is still open. --- ## Near-Term Questions to Answer * What is the cleanest robot-side bridge seam: * Jetstream hook * skill hook * local service calls * mixed approach * What is the smallest command set needed to drive Jibo usefully: * speak * gesture * visual * launch skill * keep listening * Which pieces should remain native the longest: * expression * skill hosting * turn engine * wake-word flow * How should custom mode selection activate the hybrid path --- ## Practical Strategy For now: * **develop fast in .NET 10** * **use Jibo as an embodied endpoint** * **keep the robot-side integration thin** * **delay deep on-robot porting until architecture proves itself** This keeps experimentation fast while preserving a path toward deeper integration later. --- ## Current Working Hypothesis The best long-term shape is: ```text stock Jibo embodiment + modern external cognition + thin hybrid bridge ``` That gives us: * rapid iteration * local-first experiments * preserved native robot personality/expression * reduced dependence on brittle legacy cloud paths