Files
JiboExperiments/OpenJibo/README.md

440 lines
12 KiB
Markdown
Raw Normal View History

# Hybrid Jibo Runtime Plan
## Goal
Build a **modern local-first Jibo runtime** while preserving the parts of native Jibo that are still useful:
* native wake/turn plumbing where helpful
* native skills where helpful
* native embodiment and rendering
* fast experimentation in **.NET 10** off-robot
Jibos native runtime already exposes a layered service model centered around **Jetstream** for turn/event flow, **GlobalManagerService** for routing, **SkillsService** for skill lifecycle, and **ExpressionService** for embodiment/rendering. The SSM startup is config-driven and mode-driven, which suggests a hybrid mode is a viable path.
---
## Architecture Direction
We will keep the **main experimental runtime in .NET 10** and treat Jibo as an embodied endpoint with a thin bridge layer.
That means:
* **off-robot**: conversation logic, planning, AI routing, capabilities
* **on-robot**: thin adapter/bridge to native Jibo services
* **native Jibo**: reuse rendering, skill hosting, and useful event seams
---
## High-Level ASCII Flowchart
```text
+--------------------------------------------------------------+
| NATIVE JIBO LAYER |
|--------------------------------------------------------------|
| Wake / Turn Events |
| - Jetstream |
| - hjHeard / turn started / turn result |
| |
| Native Services |
| - GlobalManagerService |
| - SkillsService |
| - ExpressionService |
| - TTS / Body / Visual / Motion services |
+------------------------------+-------------------------------+
|
| events / hooks / commands
v
+--------------------------------------------------------------+
| JIBO BRIDGE LAYER |
|--------------------------------------------------------------|
| Thin adapter between Jibo and modern runtime |
| |
| Responsibilities: |
| - receive turn/wake events |
| - receive skill context / native state |
| - forward normalized events to .NET runtime |
| - accept ResponsePlans / commands from .NET runtime |
| - invoke native skills / expression / TTS / visuals |
+------------------------------+-------------------------------+
|
| normalized turn context
v
+--------------------------------------------------------------+
| MODERN .NET 10 RUNTIME |
|--------------------------------------------------------------|
| Conversation Broker |
| - session state |
| - follow-up windows |
| - topic/context tracking |
| |
| STT Strategy Selector |
| - native transcript |
| - local STT |
| - cloud STT |
| |
| Brain Strategy Selector |
| - skill/rules path |
| - local AI |
| - cloud AI |
| - hybrid routing |
| |
| Action / Orchestration Planner |
| - gestures / visuals / ESML / delegation |
| - capability/tool calls |
| - build final ResponsePlan |
| |
| Capability Registry |
| - weather / time / reminders / tools |
| - native skill delegation |
| - robot expression helpers |
+------------------------------+-------------------------------+
|
| ResponsePlan / commands
v
+--------------------------------------------------------------+
| EXECUTION TARGETS |
|--------------------------------------------------------------|
| - Native SkillsService |
| - Native ExpressionService |
| - Native TTS / visuals / motion |
| - Local AI backends |
| - Cloud AI backends |
| - External APIs / tools |
+--------------------------------------------------------------+
```
---
## Runtime Flow
```text
[Wake Word / Turn / Follow-up]
|
v
[Jibo Native Events]
|
v
[Jibo Bridge Layer]
|
v
[Conversation Broker (.NET)]
|
v
[STT Strategy Selection]
|
v
[Brain Strategy Selection]
/ | \
/ | \
[Skill/Rules] [Local AI] [Cloud AI]
\ | /
\ | /
[Planner]
|
v
[ResponsePlan Built]
|
v
[Jibo Bridge Layer]
|
v
[Skills / Expression / TTS / Motion / Visuals]
|
v
[Follow-up Window or Timeout]
```
---
## Planned Hybrid Mode
Jibos startup and service composition are mode-driven and config-driven, so the long-term plan is to add a **new custom mode** rather than replacing stock behavior outright.
### Candidate mode names
* `hybrid`
* `openjibo`
* `revival`
* `local-first`
### Intent of the mode
The custom mode should:
* preserve normal mode for stock behavior
* preserve developer mode for native debugging
* enable the bridge/runtime path for hybrid experiments
* allow selective routing between old and new Jibo behavior
---
## Design Principles
### 1. Keep Jibo-specific code at the edges
The .NET runtime should know about:
* turns
* sessions
* plans
* capabilities
* render actions
It should **not** depend directly on:
* Electron internals
* SSM implementation quirks
* old Linux deployment constraints
### 2. Reuse native embodiment
Native Jibo rendering is valuable. ExpressionService appears to own animation, attention, DOF arbitration, and embodied output, so it should be reused as long as possible.
### 3. Replace cognition before replacing embodiment
The first thing to modernize is:
* routing
* planning
* AI selection
* follow-up conversation behavior
Not necessarily:
* body motion
* TTS
* expression plumbing
### 4. Favor thin robot-side code
The bridge on Jibo should stay small and stable. Fast-moving logic belongs in .NET 10.
### 5. Everything should converge to a ResponsePlan
Regardless of source:
* skill
* rules engine
* local AI
* cloud AI
the result should become a single normalized response/output plan.
---
## Native Jibo Mapping
Based on current reverse engineering, the native service boundaries map roughly like this: Jetstream is the turn/event seam, GlobalManagerService performs routing and skill-launch logic, SkillsService manages skill lifecycle, and ExpressionService handles embodiment/rendering.
```text
Our Concept Native Jibo Equivalent
---------------------------- --------------------------------
Wake / Turn Source Jetstream
Conversation Broker split across Jetstream + routing
Brain Selection GlobalManagerService + skills
Skill Execution SkillsService
Renderer / Embodiment ExpressionService
```
---
## Proposed Project Layout
```text
/src
/Jibo.Runtime
Core runtime orchestration
- ConversationBroker
- Session state
- Turn pipeline
- ResponsePlan builder
/Jibo.Runtime.Abstractions
Interfaces and models
- ITurnSource
- ISttStrategy
- IBrainStrategy
- IResponsePlanner
- IRobotAdapter
- TurnContext
- ResponsePlan
/Jibo.Bridge
Jibo adapter / compatibility layer
- robot event ingestion
- command dispatch back to Jibo
- native hook integration
/Jibo.Brain.Rules
deterministic routing / skills / decision tree
/Jibo.Brain.Local
local AI experiments
/Jibo.Brain.Cloud
cloud AI experiments
/Jibo.Capabilities
tools and callable capabilities
- weather
- time
- reminders
- skill delegation
- expression helpers
/Jibo.Simulator
fake robot target for testing ResponsePlans
/docs
architecture
notes
traces
```
---
## Initial Build Plan
### Phase 1 — Contracts and runtime skeleton
Build the core models and interfaces first:
* `TurnContext`
* `ConversationSession`
* `SttResult`
* `BrainDecision`
* `ResponsePlan`
* `RenderAction`
* `FollowupPolicy`
### Phase 2 — Minimal broker
Implement:
* session open/close
* follow-up timeout
* topic/context tracking
### Phase 3 — Bridge skeleton
Create the adapter boundary for:
* inbound Jibo events
* outbound robot commands
Even if the first version is mocked, keep the interface stable.
### Phase 4 — First working path
Implement a narrow vertical slice:
* input turn
* decision/rules path
* weather example
* TTS response
* follow-up window
### Phase 5 — Native integration expansion
Add native delegation for:
* skills
* expression
* visuals
* gestures
* local turn/open follow-up behavior
### Phase 6 — Hybrid AI routing
Add:
* local AI path
* cloud AI path
* confidence/routing policy
---
## First Vertical Slice
Recommended first demonstration:
### Example
User says:
> Hey Jibo, whats the weather?
System flow:
1. Jibo event arrives through bridge
2. .NET broker opens a session
3. transcript enters routing
4. weather capability is called
5. planner builds a `ResponsePlan`
6. bridge sends speech + visual action back to Jibo
7. follow-up window stays open
Then:
> What about the low tonight?
The same session stays active without wake word if the follow-up window is still open.
---
## Near-Term Questions to Answer
* What is the cleanest robot-side bridge seam:
* Jetstream hook
* skill hook
* local service calls
* mixed approach
* What is the smallest command set needed to drive Jibo usefully:
* speak
* gesture
* visual
* launch skill
* keep listening
* Which pieces should remain native the longest:
* expression
* skill hosting
* turn engine
* wake-word flow
* How should custom mode selection activate the hybrid path
---
## Practical Strategy
For now:
* **develop fast in .NET 10**
* **use Jibo as an embodied endpoint**
* **keep the robot-side integration thin**
* **delay deep on-robot porting until architecture proves itself**
This keeps experimentation fast while preserving a path toward deeper integration later.
---
## Current Working Hypothesis
The best long-term shape is:
```text
stock Jibo embodiment + modern external cognition + thin hybrid bridge
```
That gives us:
* rapid iteration
* local-first experiments
* preserved native robot personality/expression
* reduced dependence on brittle legacy cloud paths