440 lines
12 KiB
Markdown
440 lines
12 KiB
Markdown
|
|
# Hybrid Jibo Runtime Plan
|
|||
|
|
|
|||
|
|
## Goal
|
|||
|
|
|
|||
|
|
Build a **modern local-first Jibo runtime** while preserving the parts of native Jibo that are still useful:
|
|||
|
|
|
|||
|
|
* native wake/turn plumbing where helpful
|
|||
|
|
* native skills where helpful
|
|||
|
|
* native embodiment and rendering
|
|||
|
|
* fast experimentation in **.NET 10** off-robot
|
|||
|
|
|
|||
|
|
Jibo’s native runtime already exposes a layered service model centered around **Jetstream** for turn/event flow, **GlobalManagerService** for routing, **SkillsService** for skill lifecycle, and **ExpressionService** for embodiment/rendering. The SSM startup is config-driven and mode-driven, which suggests a hybrid mode is a viable path.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Architecture Direction
|
|||
|
|
|
|||
|
|
We will keep the **main experimental runtime in .NET 10** and treat Jibo as an embodied endpoint with a thin bridge layer.
|
|||
|
|
|
|||
|
|
That means:
|
|||
|
|
|
|||
|
|
* **off-robot**: conversation logic, planning, AI routing, capabilities
|
|||
|
|
* **on-robot**: thin adapter/bridge to native Jibo services
|
|||
|
|
* **native Jibo**: reuse rendering, skill hosting, and useful event seams
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## High-Level ASCII Flowchart
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
+--------------------------------------------------------------+
|
|||
|
|
| NATIVE JIBO LAYER |
|
|||
|
|
|--------------------------------------------------------------|
|
|||
|
|
| Wake / Turn Events |
|
|||
|
|
| - Jetstream |
|
|||
|
|
| - hjHeard / turn started / turn result |
|
|||
|
|
| |
|
|||
|
|
| Native Services |
|
|||
|
|
| - GlobalManagerService |
|
|||
|
|
| - SkillsService |
|
|||
|
|
| - ExpressionService |
|
|||
|
|
| - TTS / Body / Visual / Motion services |
|
|||
|
|
+------------------------------+-------------------------------+
|
|||
|
|
|
|
|||
|
|
| events / hooks / commands
|
|||
|
|
v
|
|||
|
|
+--------------------------------------------------------------+
|
|||
|
|
| JIBO BRIDGE LAYER |
|
|||
|
|
|--------------------------------------------------------------|
|
|||
|
|
| Thin adapter between Jibo and modern runtime |
|
|||
|
|
| |
|
|||
|
|
| Responsibilities: |
|
|||
|
|
| - receive turn/wake events |
|
|||
|
|
| - receive skill context / native state |
|
|||
|
|
| - forward normalized events to .NET runtime |
|
|||
|
|
| - accept ResponsePlans / commands from .NET runtime |
|
|||
|
|
| - invoke native skills / expression / TTS / visuals |
|
|||
|
|
+------------------------------+-------------------------------+
|
|||
|
|
|
|
|||
|
|
| normalized turn context
|
|||
|
|
v
|
|||
|
|
+--------------------------------------------------------------+
|
|||
|
|
| MODERN .NET 10 RUNTIME |
|
|||
|
|
|--------------------------------------------------------------|
|
|||
|
|
| Conversation Broker |
|
|||
|
|
| - session state |
|
|||
|
|
| - follow-up windows |
|
|||
|
|
| - topic/context tracking |
|
|||
|
|
| |
|
|||
|
|
| STT Strategy Selector |
|
|||
|
|
| - native transcript |
|
|||
|
|
| - local STT |
|
|||
|
|
| - cloud STT |
|
|||
|
|
| |
|
|||
|
|
| Brain Strategy Selector |
|
|||
|
|
| - skill/rules path |
|
|||
|
|
| - local AI |
|
|||
|
|
| - cloud AI |
|
|||
|
|
| - hybrid routing |
|
|||
|
|
| |
|
|||
|
|
| Action / Orchestration Planner |
|
|||
|
|
| - gestures / visuals / ESML / delegation |
|
|||
|
|
| - capability/tool calls |
|
|||
|
|
| - build final ResponsePlan |
|
|||
|
|
| |
|
|||
|
|
| Capability Registry |
|
|||
|
|
| - weather / time / reminders / tools |
|
|||
|
|
| - native skill delegation |
|
|||
|
|
| - robot expression helpers |
|
|||
|
|
+------------------------------+-------------------------------+
|
|||
|
|
|
|
|||
|
|
| ResponsePlan / commands
|
|||
|
|
v
|
|||
|
|
+--------------------------------------------------------------+
|
|||
|
|
| EXECUTION TARGETS |
|
|||
|
|
|--------------------------------------------------------------|
|
|||
|
|
| - Native SkillsService |
|
|||
|
|
| - Native ExpressionService |
|
|||
|
|
| - Native TTS / visuals / motion |
|
|||
|
|
| - Local AI backends |
|
|||
|
|
| - Cloud AI backends |
|
|||
|
|
| - External APIs / tools |
|
|||
|
|
+--------------------------------------------------------------+
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Runtime Flow
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
[Wake Word / Turn / Follow-up]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Jibo Native Events]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Jibo Bridge Layer]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Conversation Broker (.NET)]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[STT Strategy Selection]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Brain Strategy Selection]
|
|||
|
|
/ | \
|
|||
|
|
/ | \
|
|||
|
|
[Skill/Rules] [Local AI] [Cloud AI]
|
|||
|
|
\ | /
|
|||
|
|
\ | /
|
|||
|
|
[Planner]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[ResponsePlan Built]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Jibo Bridge Layer]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Skills / Expression / TTS / Motion / Visuals]
|
|||
|
|
|
|
|||
|
|
v
|
|||
|
|
[Follow-up Window or Timeout]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Planned Hybrid Mode
|
|||
|
|
|
|||
|
|
Jibo’s startup and service composition are mode-driven and config-driven, so the long-term plan is to add a **new custom mode** rather than replacing stock behavior outright.
|
|||
|
|
|
|||
|
|
### Candidate mode names
|
|||
|
|
|
|||
|
|
* `hybrid`
|
|||
|
|
* `openjibo`
|
|||
|
|
* `revival`
|
|||
|
|
* `local-first`
|
|||
|
|
|
|||
|
|
### Intent of the mode
|
|||
|
|
|
|||
|
|
The custom mode should:
|
|||
|
|
|
|||
|
|
* preserve normal mode for stock behavior
|
|||
|
|
* preserve developer mode for native debugging
|
|||
|
|
* enable the bridge/runtime path for hybrid experiments
|
|||
|
|
* allow selective routing between old and new Jibo behavior
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Design Principles
|
|||
|
|
|
|||
|
|
### 1. Keep Jibo-specific code at the edges
|
|||
|
|
|
|||
|
|
The .NET runtime should know about:
|
|||
|
|
|
|||
|
|
* turns
|
|||
|
|
* sessions
|
|||
|
|
* plans
|
|||
|
|
* capabilities
|
|||
|
|
* render actions
|
|||
|
|
|
|||
|
|
It should **not** depend directly on:
|
|||
|
|
|
|||
|
|
* Electron internals
|
|||
|
|
* SSM implementation quirks
|
|||
|
|
* old Linux deployment constraints
|
|||
|
|
|
|||
|
|
### 2. Reuse native embodiment
|
|||
|
|
|
|||
|
|
Native Jibo rendering is valuable. ExpressionService appears to own animation, attention, DOF arbitration, and embodied output, so it should be reused as long as possible.
|
|||
|
|
|
|||
|
|
### 3. Replace cognition before replacing embodiment
|
|||
|
|
|
|||
|
|
The first thing to modernize is:
|
|||
|
|
|
|||
|
|
* routing
|
|||
|
|
* planning
|
|||
|
|
* AI selection
|
|||
|
|
* follow-up conversation behavior
|
|||
|
|
|
|||
|
|
Not necessarily:
|
|||
|
|
|
|||
|
|
* body motion
|
|||
|
|
* TTS
|
|||
|
|
* expression plumbing
|
|||
|
|
|
|||
|
|
### 4. Favor thin robot-side code
|
|||
|
|
|
|||
|
|
The bridge on Jibo should stay small and stable. Fast-moving logic belongs in .NET 10.
|
|||
|
|
|
|||
|
|
### 5. Everything should converge to a ResponsePlan
|
|||
|
|
|
|||
|
|
Regardless of source:
|
|||
|
|
|
|||
|
|
* skill
|
|||
|
|
* rules engine
|
|||
|
|
* local AI
|
|||
|
|
* cloud AI
|
|||
|
|
|
|||
|
|
the result should become a single normalized response/output plan.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Native Jibo Mapping
|
|||
|
|
|
|||
|
|
Based on current reverse engineering, the native service boundaries map roughly like this: Jetstream is the turn/event seam, GlobalManagerService performs routing and skill-launch logic, SkillsService manages skill lifecycle, and ExpressionService handles embodiment/rendering.
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
Our Concept Native Jibo Equivalent
|
|||
|
|
---------------------------- --------------------------------
|
|||
|
|
Wake / Turn Source Jetstream
|
|||
|
|
Conversation Broker split across Jetstream + routing
|
|||
|
|
Brain Selection GlobalManagerService + skills
|
|||
|
|
Skill Execution SkillsService
|
|||
|
|
Renderer / Embodiment ExpressionService
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Proposed Project Layout
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
/src
|
|||
|
|
/Jibo.Runtime
|
|||
|
|
Core runtime orchestration
|
|||
|
|
- ConversationBroker
|
|||
|
|
- Session state
|
|||
|
|
- Turn pipeline
|
|||
|
|
- ResponsePlan builder
|
|||
|
|
|
|||
|
|
/Jibo.Runtime.Abstractions
|
|||
|
|
Interfaces and models
|
|||
|
|
- ITurnSource
|
|||
|
|
- ISttStrategy
|
|||
|
|
- IBrainStrategy
|
|||
|
|
- IResponsePlanner
|
|||
|
|
- IRobotAdapter
|
|||
|
|
- TurnContext
|
|||
|
|
- ResponsePlan
|
|||
|
|
|
|||
|
|
/Jibo.Bridge
|
|||
|
|
Jibo adapter / compatibility layer
|
|||
|
|
- robot event ingestion
|
|||
|
|
- command dispatch back to Jibo
|
|||
|
|
- native hook integration
|
|||
|
|
|
|||
|
|
/Jibo.Brain.Rules
|
|||
|
|
deterministic routing / skills / decision tree
|
|||
|
|
|
|||
|
|
/Jibo.Brain.Local
|
|||
|
|
local AI experiments
|
|||
|
|
|
|||
|
|
/Jibo.Brain.Cloud
|
|||
|
|
cloud AI experiments
|
|||
|
|
|
|||
|
|
/Jibo.Capabilities
|
|||
|
|
tools and callable capabilities
|
|||
|
|
- weather
|
|||
|
|
- time
|
|||
|
|
- reminders
|
|||
|
|
- skill delegation
|
|||
|
|
- expression helpers
|
|||
|
|
|
|||
|
|
/Jibo.Simulator
|
|||
|
|
fake robot target for testing ResponsePlans
|
|||
|
|
|
|||
|
|
/docs
|
|||
|
|
architecture
|
|||
|
|
notes
|
|||
|
|
traces
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Initial Build Plan
|
|||
|
|
|
|||
|
|
### Phase 1 — Contracts and runtime skeleton
|
|||
|
|
|
|||
|
|
Build the core models and interfaces first:
|
|||
|
|
|
|||
|
|
* `TurnContext`
|
|||
|
|
* `ConversationSession`
|
|||
|
|
* `SttResult`
|
|||
|
|
* `BrainDecision`
|
|||
|
|
* `ResponsePlan`
|
|||
|
|
* `RenderAction`
|
|||
|
|
* `FollowupPolicy`
|
|||
|
|
|
|||
|
|
### Phase 2 — Minimal broker
|
|||
|
|
|
|||
|
|
Implement:
|
|||
|
|
|
|||
|
|
* session open/close
|
|||
|
|
* follow-up timeout
|
|||
|
|
* topic/context tracking
|
|||
|
|
|
|||
|
|
### Phase 3 — Bridge skeleton
|
|||
|
|
|
|||
|
|
Create the adapter boundary for:
|
|||
|
|
|
|||
|
|
* inbound Jibo events
|
|||
|
|
* outbound robot commands
|
|||
|
|
|
|||
|
|
Even if the first version is mocked, keep the interface stable.
|
|||
|
|
|
|||
|
|
### Phase 4 — First working path
|
|||
|
|
|
|||
|
|
Implement a narrow vertical slice:
|
|||
|
|
|
|||
|
|
* input turn
|
|||
|
|
* decision/rules path
|
|||
|
|
* weather example
|
|||
|
|
* TTS response
|
|||
|
|
* follow-up window
|
|||
|
|
|
|||
|
|
### Phase 5 — Native integration expansion
|
|||
|
|
|
|||
|
|
Add native delegation for:
|
|||
|
|
|
|||
|
|
* skills
|
|||
|
|
* expression
|
|||
|
|
* visuals
|
|||
|
|
* gestures
|
|||
|
|
* local turn/open follow-up behavior
|
|||
|
|
|
|||
|
|
### Phase 6 — Hybrid AI routing
|
|||
|
|
|
|||
|
|
Add:
|
|||
|
|
|
|||
|
|
* local AI path
|
|||
|
|
* cloud AI path
|
|||
|
|
* confidence/routing policy
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## First Vertical Slice
|
|||
|
|
|
|||
|
|
Recommended first demonstration:
|
|||
|
|
|
|||
|
|
### Example
|
|||
|
|
|
|||
|
|
User says:
|
|||
|
|
|
|||
|
|
> Hey Jibo, what’s the weather?
|
|||
|
|
|
|||
|
|
System flow:
|
|||
|
|
|
|||
|
|
1. Jibo event arrives through bridge
|
|||
|
|
2. .NET broker opens a session
|
|||
|
|
3. transcript enters routing
|
|||
|
|
4. weather capability is called
|
|||
|
|
5. planner builds a `ResponsePlan`
|
|||
|
|
6. bridge sends speech + visual action back to Jibo
|
|||
|
|
7. follow-up window stays open
|
|||
|
|
|
|||
|
|
Then:
|
|||
|
|
|
|||
|
|
> What about the low tonight?
|
|||
|
|
|
|||
|
|
The same session stays active without wake word if the follow-up window is still open.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Near-Term Questions to Answer
|
|||
|
|
|
|||
|
|
* What is the cleanest robot-side bridge seam:
|
|||
|
|
|
|||
|
|
* Jetstream hook
|
|||
|
|
* skill hook
|
|||
|
|
* local service calls
|
|||
|
|
* mixed approach
|
|||
|
|
|
|||
|
|
* What is the smallest command set needed to drive Jibo usefully:
|
|||
|
|
|
|||
|
|
* speak
|
|||
|
|
* gesture
|
|||
|
|
* visual
|
|||
|
|
* launch skill
|
|||
|
|
* keep listening
|
|||
|
|
|
|||
|
|
* Which pieces should remain native the longest:
|
|||
|
|
|
|||
|
|
* expression
|
|||
|
|
* skill hosting
|
|||
|
|
* turn engine
|
|||
|
|
* wake-word flow
|
|||
|
|
|
|||
|
|
* How should custom mode selection activate the hybrid path
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Practical Strategy
|
|||
|
|
|
|||
|
|
For now:
|
|||
|
|
|
|||
|
|
* **develop fast in .NET 10**
|
|||
|
|
* **use Jibo as an embodied endpoint**
|
|||
|
|
* **keep the robot-side integration thin**
|
|||
|
|
* **delay deep on-robot porting until architecture proves itself**
|
|||
|
|
|
|||
|
|
This keeps experimentation fast while preserving a path toward deeper integration later.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Current Working Hypothesis
|
|||
|
|
|
|||
|
|
The best long-term shape is:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
stock Jibo embodiment + modern external cognition + thin hybrid bridge
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
That gives us:
|
|||
|
|
|
|||
|
|
* rapid iteration
|
|||
|
|
* local-first experiments
|
|||
|
|
* preserved native robot personality/expression
|
|||
|
|
* reduced dependence on brittle legacy cloud paths
|