Files
JiboExperiments/OpenJibo/docs/development-plan.md
2026-04-16 07:18:33 -05:00

87 lines
3.4 KiB
Markdown

# Development Plan
## Summary
This document is the working implementation plan after the initial hosted-cloud scaffold.
It is intentionally broader than the current Node server. The Node server is a protocol oracle and discovery tool, not the complete map of Jibo.
## Current Scope
- stable .NET cloud scaffold
- Azure-oriented architecture and data ownership
- normalized runtime contracts for cloud-to-runtime handoff
- bootstrap documentation for region injection and targeted device patching
- starter endpoint coverage for account, notification, robot, loop, update, uploads, and core WebSocket acceptance
- starter xUnit coverage for the .NET application layer
## Next Implementation Scope
- expand HTTP `X-Amz-Target` coverage from observed traffic and fixtures
- grow WebSocket compatibility from stub acceptance into realistic turn orchestration
- keep websocket parity fixture-driven, starting with exact sequencing and payload-shape fidelity for the successful joke vertical slice before claiming broader skill coverage
- replace in-memory state with Azure SQL-backed persistence
- add structured fixture replay tests
- harden region/bootstrap docs by software version
## Discovery Scope
We still need to map more than the current Node server expresses. Priority discovery areas:
- all hostnames and service prefixes observed in real startup and turn traffic
- skill launch and skill lifecycle flows
- interactivity command families beyond the current joke flow
- richer embodied speech and animation behaviors
- upload, logging, backup, and key-sharing flows
- per-version configuration differences and region handling
## Current WebSocket Discovery Focus
The next fixture-driven websocket work should continue to separate three buckets:
- discovered behavior
Grounded by the Node oracle, sanitized fixtures, and live captures
- implemented parity
Only the narrow slices currently replayed and tested in `.NET`
- future hypotheses
Ideas to investigate later, but not behaviors to silently bake into the hosted cloud
Right now the strongest implemented vertical slice beyond basic listen completion is the successful joke turn:
- `CLIENT_ASR` transcript-carrying turn completion
- synthetic `LISTEN` result shaping
- `EOS`
- delayed joke `SKILL_ACTION`
That should remain the model for future websocket work: capture first, fixture second, parity third.
## Speech, Animation, And ESML
The current joke flow is only a small foothold into Jibo expressiveness.
Future work should map:
- direct speech modifiers
- animation selection and filtering
- embodied speech behaviors
- ESML and SSML subsets
- interactions between speech, visuals, and timing
Useful external references:
- [Speak-Tweak Docs](https://hri2024.jibo.media.mit.edu/Speak-Tweak-Docs)
- [ESML PDF](https://hri2024.jibo.media.mit.edu/attachments/SDK-SDK---ESML-121023-203758.pdf)
## Future Scope
- full endpoint inventory beyond the current Node mapping
- OTA-driven recovery
- paid hosted plans or donation-supported hosting
- deeper on-device bridge and OS modernization
- more capable skill/runtime integration
- possible LLM or tool-use patterns inspired by workshop-era experimentation
## MCP-Like Ideas
Recent MIT workshop materials suggest experimentation around modern AI tooling for Jibo, including an MCP-oriented idea. We should treat that as inspiration for future OpenJibo directions, not as a present dependency or supported integration.