440 lines
12 KiB
Markdown
440 lines
12 KiB
Markdown
# Hybrid Jibo Runtime Plan
|
||
|
||
## Goal
|
||
|
||
Build a **modern local-first Jibo runtime** while preserving the parts of native Jibo that are still useful:
|
||
|
||
* native wake/turn plumbing where helpful
|
||
* native skills where helpful
|
||
* native embodiment and rendering
|
||
* fast experimentation in **.NET 10** off-robot
|
||
|
||
Jibo’s native runtime already exposes a layered service model centered around **Jetstream** for turn/event flow, **GlobalManagerService** for routing, **SkillsService** for skill lifecycle, and **ExpressionService** for embodiment/rendering. The SSM startup is config-driven and mode-driven, which suggests a hybrid mode is a viable path.
|
||
|
||
---
|
||
|
||
## Architecture Direction
|
||
|
||
We will keep the **main experimental runtime in .NET 10** and treat Jibo as an embodied endpoint with a thin bridge layer.
|
||
|
||
That means:
|
||
|
||
* **off-robot**: conversation logic, planning, AI routing, capabilities
|
||
* **on-robot**: thin adapter/bridge to native Jibo services
|
||
* **native Jibo**: reuse rendering, skill hosting, and useful event seams
|
||
|
||
---
|
||
|
||
## High-Level ASCII Flowchart
|
||
|
||
```text
|
||
+--------------------------------------------------------------+
|
||
| NATIVE JIBO LAYER |
|
||
|--------------------------------------------------------------|
|
||
| Wake / Turn Events |
|
||
| - Jetstream |
|
||
| - hjHeard / turn started / turn result |
|
||
| |
|
||
| Native Services |
|
||
| - GlobalManagerService |
|
||
| - SkillsService |
|
||
| - ExpressionService |
|
||
| - TTS / Body / Visual / Motion services |
|
||
+------------------------------+-------------------------------+
|
||
|
|
||
| events / hooks / commands
|
||
v
|
||
+--------------------------------------------------------------+
|
||
| JIBO BRIDGE LAYER |
|
||
|--------------------------------------------------------------|
|
||
| Thin adapter between Jibo and modern runtime |
|
||
| |
|
||
| Responsibilities: |
|
||
| - receive turn/wake events |
|
||
| - receive skill context / native state |
|
||
| - forward normalized events to .NET runtime |
|
||
| - accept ResponsePlans / commands from .NET runtime |
|
||
| - invoke native skills / expression / TTS / visuals |
|
||
+------------------------------+-------------------------------+
|
||
|
|
||
| normalized turn context
|
||
v
|
||
+--------------------------------------------------------------+
|
||
| MODERN .NET 10 RUNTIME |
|
||
|--------------------------------------------------------------|
|
||
| Conversation Broker |
|
||
| - session state |
|
||
| - follow-up windows |
|
||
| - topic/context tracking |
|
||
| |
|
||
| STT Strategy Selector |
|
||
| - native transcript |
|
||
| - local STT |
|
||
| - cloud STT |
|
||
| |
|
||
| Brain Strategy Selector |
|
||
| - skill/rules path |
|
||
| - local AI |
|
||
| - cloud AI |
|
||
| - hybrid routing |
|
||
| |
|
||
| Action / Orchestration Planner |
|
||
| - gestures / visuals / ESML / delegation |
|
||
| - capability/tool calls |
|
||
| - build final ResponsePlan |
|
||
| |
|
||
| Capability Registry |
|
||
| - weather / time / reminders / tools |
|
||
| - native skill delegation |
|
||
| - robot expression helpers |
|
||
+------------------------------+-------------------------------+
|
||
|
|
||
| ResponsePlan / commands
|
||
v
|
||
+--------------------------------------------------------------+
|
||
| EXECUTION TARGETS |
|
||
|--------------------------------------------------------------|
|
||
| - Native SkillsService |
|
||
| - Native ExpressionService |
|
||
| - Native TTS / visuals / motion |
|
||
| - Local AI backends |
|
||
| - Cloud AI backends |
|
||
| - External APIs / tools |
|
||
+--------------------------------------------------------------+
|
||
```
|
||
|
||
---
|
||
|
||
## Runtime Flow
|
||
|
||
```text
|
||
[Wake Word / Turn / Follow-up]
|
||
|
|
||
v
|
||
[Jibo Native Events]
|
||
|
|
||
v
|
||
[Jibo Bridge Layer]
|
||
|
|
||
v
|
||
[Conversation Broker (.NET)]
|
||
|
|
||
v
|
||
[STT Strategy Selection]
|
||
|
|
||
v
|
||
[Brain Strategy Selection]
|
||
/ | \
|
||
/ | \
|
||
[Skill/Rules] [Local AI] [Cloud AI]
|
||
\ | /
|
||
\ | /
|
||
[Planner]
|
||
|
|
||
v
|
||
[ResponsePlan Built]
|
||
|
|
||
v
|
||
[Jibo Bridge Layer]
|
||
|
|
||
v
|
||
[Skills / Expression / TTS / Motion / Visuals]
|
||
|
|
||
v
|
||
[Follow-up Window or Timeout]
|
||
```
|
||
|
||
---
|
||
|
||
## Planned Hybrid Mode
|
||
|
||
Jibo’s startup and service composition are mode-driven and config-driven, so the long-term plan is to add a **new custom mode** rather than replacing stock behavior outright.
|
||
|
||
### Candidate mode names
|
||
|
||
* `hybrid`
|
||
* `openjibo`
|
||
* `revival`
|
||
* `local-first`
|
||
|
||
### Intent of the mode
|
||
|
||
The custom mode should:
|
||
|
||
* preserve normal mode for stock behavior
|
||
* preserve developer mode for native debugging
|
||
* enable the bridge/runtime path for hybrid experiments
|
||
* allow selective routing between old and new Jibo behavior
|
||
|
||
---
|
||
|
||
## Design Principles
|
||
|
||
### 1. Keep Jibo-specific code at the edges
|
||
|
||
The .NET runtime should know about:
|
||
|
||
* turns
|
||
* sessions
|
||
* plans
|
||
* capabilities
|
||
* render actions
|
||
|
||
It should **not** depend directly on:
|
||
|
||
* Electron internals
|
||
* SSM implementation quirks
|
||
* old Linux deployment constraints
|
||
|
||
### 2. Reuse native embodiment
|
||
|
||
Native Jibo rendering is valuable. ExpressionService appears to own animation, attention, DOF arbitration, and embodied output, so it should be reused as long as possible.
|
||
|
||
### 3. Replace cognition before replacing embodiment
|
||
|
||
The first thing to modernize is:
|
||
|
||
* routing
|
||
* planning
|
||
* AI selection
|
||
* follow-up conversation behavior
|
||
|
||
Not necessarily:
|
||
|
||
* body motion
|
||
* TTS
|
||
* expression plumbing
|
||
|
||
### 4. Favor thin robot-side code
|
||
|
||
The bridge on Jibo should stay small and stable. Fast-moving logic belongs in .NET 10.
|
||
|
||
### 5. Everything should converge to a ResponsePlan
|
||
|
||
Regardless of source:
|
||
|
||
* skill
|
||
* rules engine
|
||
* local AI
|
||
* cloud AI
|
||
|
||
the result should become a single normalized response/output plan.
|
||
|
||
---
|
||
|
||
## Native Jibo Mapping
|
||
|
||
Based on current reverse engineering, the native service boundaries map roughly like this: Jetstream is the turn/event seam, GlobalManagerService performs routing and skill-launch logic, SkillsService manages skill lifecycle, and ExpressionService handles embodiment/rendering.
|
||
|
||
```text
|
||
Our Concept Native Jibo Equivalent
|
||
---------------------------- --------------------------------
|
||
Wake / Turn Source Jetstream
|
||
Conversation Broker split across Jetstream + routing
|
||
Brain Selection GlobalManagerService + skills
|
||
Skill Execution SkillsService
|
||
Renderer / Embodiment ExpressionService
|
||
```
|
||
|
||
---
|
||
|
||
## Proposed Project Layout
|
||
|
||
```text
|
||
/src
|
||
/Jibo.Runtime
|
||
Core runtime orchestration
|
||
- ConversationBroker
|
||
- Session state
|
||
- Turn pipeline
|
||
- ResponsePlan builder
|
||
|
||
/Jibo.Runtime.Abstractions
|
||
Interfaces and models
|
||
- ITurnSource
|
||
- ISttStrategy
|
||
- IBrainStrategy
|
||
- IResponsePlanner
|
||
- IRobotAdapter
|
||
- TurnContext
|
||
- ResponsePlan
|
||
|
||
/Jibo.Bridge
|
||
Jibo adapter / compatibility layer
|
||
- robot event ingestion
|
||
- command dispatch back to Jibo
|
||
- native hook integration
|
||
|
||
/Jibo.Brain.Rules
|
||
deterministic routing / skills / decision tree
|
||
|
||
/Jibo.Brain.Local
|
||
local AI experiments
|
||
|
||
/Jibo.Brain.Cloud
|
||
cloud AI experiments
|
||
|
||
/Jibo.Capabilities
|
||
tools and callable capabilities
|
||
- weather
|
||
- time
|
||
- reminders
|
||
- skill delegation
|
||
- expression helpers
|
||
|
||
/Jibo.Simulator
|
||
fake robot target for testing ResponsePlans
|
||
|
||
/docs
|
||
architecture
|
||
notes
|
||
traces
|
||
```
|
||
|
||
---
|
||
|
||
## Initial Build Plan
|
||
|
||
### Phase 1 — Contracts and runtime skeleton
|
||
|
||
Build the core models and interfaces first:
|
||
|
||
* `TurnContext`
|
||
* `ConversationSession`
|
||
* `SttResult`
|
||
* `BrainDecision`
|
||
* `ResponsePlan`
|
||
* `RenderAction`
|
||
* `FollowupPolicy`
|
||
|
||
### Phase 2 — Minimal broker
|
||
|
||
Implement:
|
||
|
||
* session open/close
|
||
* follow-up timeout
|
||
* topic/context tracking
|
||
|
||
### Phase 3 — Bridge skeleton
|
||
|
||
Create the adapter boundary for:
|
||
|
||
* inbound Jibo events
|
||
* outbound robot commands
|
||
|
||
Even if the first version is mocked, keep the interface stable.
|
||
|
||
### Phase 4 — First working path
|
||
|
||
Implement a narrow vertical slice:
|
||
|
||
* input turn
|
||
* decision/rules path
|
||
* weather example
|
||
* TTS response
|
||
* follow-up window
|
||
|
||
### Phase 5 — Native integration expansion
|
||
|
||
Add native delegation for:
|
||
|
||
* skills
|
||
* expression
|
||
* visuals
|
||
* gestures
|
||
* local turn/open follow-up behavior
|
||
|
||
### Phase 6 — Hybrid AI routing
|
||
|
||
Add:
|
||
|
||
* local AI path
|
||
* cloud AI path
|
||
* confidence/routing policy
|
||
|
||
---
|
||
|
||
## First Vertical Slice
|
||
|
||
Recommended first demonstration:
|
||
|
||
### Example
|
||
|
||
User says:
|
||
|
||
> Hey Jibo, what’s the weather?
|
||
|
||
System flow:
|
||
|
||
1. Jibo event arrives through bridge
|
||
2. .NET broker opens a session
|
||
3. transcript enters routing
|
||
4. weather capability is called
|
||
5. planner builds a `ResponsePlan`
|
||
6. bridge sends speech + visual action back to Jibo
|
||
7. follow-up window stays open
|
||
|
||
Then:
|
||
|
||
> What about the low tonight?
|
||
|
||
The same session stays active without wake word if the follow-up window is still open.
|
||
|
||
---
|
||
|
||
## Near-Term Questions to Answer
|
||
|
||
* What is the cleanest robot-side bridge seam:
|
||
|
||
* Jetstream hook
|
||
* skill hook
|
||
* local service calls
|
||
* mixed approach
|
||
|
||
* What is the smallest command set needed to drive Jibo usefully:
|
||
|
||
* speak
|
||
* gesture
|
||
* visual
|
||
* launch skill
|
||
* keep listening
|
||
|
||
* Which pieces should remain native the longest:
|
||
|
||
* expression
|
||
* skill hosting
|
||
* turn engine
|
||
* wake-word flow
|
||
|
||
* How should custom mode selection activate the hybrid path
|
||
|
||
---
|
||
|
||
## Practical Strategy
|
||
|
||
For now:
|
||
|
||
* **develop fast in .NET 10**
|
||
* **use Jibo as an embodied endpoint**
|
||
* **keep the robot-side integration thin**
|
||
* **delay deep on-robot porting until architecture proves itself**
|
||
|
||
This keeps experimentation fast while preserving a path toward deeper integration later.
|
||
|
||
---
|
||
|
||
## Current Working Hypothesis
|
||
|
||
The best long-term shape is:
|
||
|
||
```text
|
||
stock Jibo embodiment + modern external cognition + thin hybrid bridge
|
||
```
|
||
|
||
That gives us:
|
||
|
||
* rapid iteration
|
||
* local-first experiments
|
||
* preserved native robot personality/expression
|
||
* reduced dependence on brittle legacy cloud paths
|