12 KiB
Hybrid Jibo Runtime Plan
Goal
Build a modern local-first Jibo runtime while preserving the parts of native Jibo that are still useful:
- native wake/turn plumbing where helpful
- native skills where helpful
- native embodiment and rendering
- fast experimentation in .NET 10 off-robot
Jibo’s native runtime already exposes a layered service model centered around Jetstream for turn/event flow, GlobalManagerService for routing, SkillsService for skill lifecycle, and ExpressionService for embodiment/rendering. The SSM startup is config-driven and mode-driven, which suggests a hybrid mode is a viable path.
Architecture Direction
We will keep the main experimental runtime in .NET 10 and treat Jibo as an embodied endpoint with a thin bridge layer.
That means:
- off-robot: conversation logic, planning, AI routing, capabilities
- on-robot: thin adapter/bridge to native Jibo services
- native Jibo: reuse rendering, skill hosting, and useful event seams
High-Level ASCII Flowchart
+--------------------------------------------------------------+
| NATIVE JIBO LAYER |
|--------------------------------------------------------------|
| Wake / Turn Events |
| - Jetstream |
| - hjHeard / turn started / turn result |
| |
| Native Services |
| - GlobalManagerService |
| - SkillsService |
| - ExpressionService |
| - TTS / Body / Visual / Motion services |
+------------------------------+-------------------------------+
|
| events / hooks / commands
v
+--------------------------------------------------------------+
| JIBO BRIDGE LAYER |
|--------------------------------------------------------------|
| Thin adapter between Jibo and modern runtime |
| |
| Responsibilities: |
| - receive turn/wake events |
| - receive skill context / native state |
| - forward normalized events to .NET runtime |
| - accept ResponsePlans / commands from .NET runtime |
| - invoke native skills / expression / TTS / visuals |
+------------------------------+-------------------------------+
|
| normalized turn context
v
+--------------------------------------------------------------+
| MODERN .NET 10 RUNTIME |
|--------------------------------------------------------------|
| Conversation Broker |
| - session state |
| - follow-up windows |
| - topic/context tracking |
| |
| STT Strategy Selector |
| - native transcript |
| - local STT |
| - cloud STT |
| |
| Brain Strategy Selector |
| - skill/rules path |
| - local AI |
| - cloud AI |
| - hybrid routing |
| |
| Action / Orchestration Planner |
| - gestures / visuals / ESML / delegation |
| - capability/tool calls |
| - build final ResponsePlan |
| |
| Capability Registry |
| - weather / time / reminders / tools |
| - native skill delegation |
| - robot expression helpers |
+------------------------------+-------------------------------+
|
| ResponsePlan / commands
v
+--------------------------------------------------------------+
| EXECUTION TARGETS |
|--------------------------------------------------------------|
| - Native SkillsService |
| - Native ExpressionService |
| - Native TTS / visuals / motion |
| - Local AI backends |
| - Cloud AI backends |
| - External APIs / tools |
+--------------------------------------------------------------+
Runtime Flow
[Wake Word / Turn / Follow-up]
|
v
[Jibo Native Events]
|
v
[Jibo Bridge Layer]
|
v
[Conversation Broker (.NET)]
|
v
[STT Strategy Selection]
|
v
[Brain Strategy Selection]
/ | \
/ | \
[Skill/Rules] [Local AI] [Cloud AI]
\ | /
\ | /
[Planner]
|
v
[ResponsePlan Built]
|
v
[Jibo Bridge Layer]
|
v
[Skills / Expression / TTS / Motion / Visuals]
|
v
[Follow-up Window or Timeout]
Planned Hybrid Mode
Jibo’s startup and service composition are mode-driven and config-driven, so the long-term plan is to add a new custom mode rather than replacing stock behavior outright.
Candidate mode names
hybridopenjiborevivallocal-first
Intent of the mode
The custom mode should:
- preserve normal mode for stock behavior
- preserve developer mode for native debugging
- enable the bridge/runtime path for hybrid experiments
- allow selective routing between old and new Jibo behavior
Design Principles
1. Keep Jibo-specific code at the edges
The .NET runtime should know about:
- turns
- sessions
- plans
- capabilities
- render actions
It should not depend directly on:
- Electron internals
- SSM implementation quirks
- old Linux deployment constraints
2. Reuse native embodiment
Native Jibo rendering is valuable. ExpressionService appears to own animation, attention, DOF arbitration, and embodied output, so it should be reused as long as possible.
3. Replace cognition before replacing embodiment
The first thing to modernize is:
- routing
- planning
- AI selection
- follow-up conversation behavior
Not necessarily:
- body motion
- TTS
- expression plumbing
4. Favor thin robot-side code
The bridge on Jibo should stay small and stable. Fast-moving logic belongs in .NET 10.
5. Everything should converge to a ResponsePlan
Regardless of source:
- skill
- rules engine
- local AI
- cloud AI
the result should become a single normalized response/output plan.
Native Jibo Mapping
Based on current reverse engineering, the native service boundaries map roughly like this: Jetstream is the turn/event seam, GlobalManagerService performs routing and skill-launch logic, SkillsService manages skill lifecycle, and ExpressionService handles embodiment/rendering.
Our Concept Native Jibo Equivalent
---------------------------- --------------------------------
Wake / Turn Source Jetstream
Conversation Broker split across Jetstream + routing
Brain Selection GlobalManagerService + skills
Skill Execution SkillsService
Renderer / Embodiment ExpressionService
Proposed Project Layout
/src
/Jibo.Runtime
Core runtime orchestration
- ConversationBroker
- Session state
- Turn pipeline
- ResponsePlan builder
/Jibo.Runtime.Abstractions
Interfaces and models
- ITurnSource
- ISttStrategy
- IBrainStrategy
- IResponsePlanner
- IRobotAdapter
- TurnContext
- ResponsePlan
/Jibo.Bridge
Jibo adapter / compatibility layer
- robot event ingestion
- command dispatch back to Jibo
- native hook integration
/Jibo.Brain.Rules
deterministic routing / skills / decision tree
/Jibo.Brain.Local
local AI experiments
/Jibo.Brain.Cloud
cloud AI experiments
/Jibo.Capabilities
tools and callable capabilities
- weather
- time
- reminders
- skill delegation
- expression helpers
/Jibo.Simulator
fake robot target for testing ResponsePlans
/docs
architecture
notes
traces
Initial Build Plan
Phase 1 — Contracts and runtime skeleton
Build the core models and interfaces first:
TurnContextConversationSessionSttResultBrainDecisionResponsePlanRenderActionFollowupPolicy
Phase 2 — Minimal broker
Implement:
- session open/close
- follow-up timeout
- topic/context tracking
Phase 3 — Bridge skeleton
Create the adapter boundary for:
- inbound Jibo events
- outbound robot commands
Even if the first version is mocked, keep the interface stable.
Phase 4 — First working path
Implement a narrow vertical slice:
- input turn
- decision/rules path
- weather example
- TTS response
- follow-up window
Phase 5 — Native integration expansion
Add native delegation for:
- skills
- expression
- visuals
- gestures
- local turn/open follow-up behavior
Phase 6 — Hybrid AI routing
Add:
- local AI path
- cloud AI path
- confidence/routing policy
First Vertical Slice
Recommended first demonstration:
Example
User says:
Hey Jibo, what’s the weather?
System flow:
- Jibo event arrives through bridge
- .NET broker opens a session
- transcript enters routing
- weather capability is called
- planner builds a
ResponsePlan - bridge sends speech + visual action back to Jibo
- follow-up window stays open
Then:
What about the low tonight?
The same session stays active without wake word if the follow-up window is still open.
Near-Term Questions to Answer
-
What is the cleanest robot-side bridge seam:
- Jetstream hook
- skill hook
- local service calls
- mixed approach
-
What is the smallest command set needed to drive Jibo usefully:
- speak
- gesture
- visual
- launch skill
- keep listening
-
Which pieces should remain native the longest:
- expression
- skill hosting
- turn engine
- wake-word flow
-
How should custom mode selection activate the hybrid path
Practical Strategy
For now:
- develop fast in .NET 10
- use Jibo as an embodied endpoint
- keep the robot-side integration thin
- delay deep on-robot porting until architecture proves itself
This keeps experimentation fast while preserving a path toward deeper integration later.
Current Working Hypothesis
The best long-term shape is:
stock Jibo embodiment + modern external cognition + thin hybrid bridge
That gives us:
- rapid iteration
- local-first experiments
- preserved native robot personality/expression
- reduced dependence on brittle legacy cloud paths