2026-04-09 21:30:23 -05:00
# Jibo.Cloud.DotNet
2026-04-11 07:12:57 -05:00
## Summary
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
`Jibo.Cloud.DotNet` is the stable hosted implementation of the OpenJibo cloud.
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
This is the production-oriented path for restoring device connectivity and creating a foundation for future runtime, AI, and OTA work.
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
## Architecture
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The first implementation is a modular monolith:
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
```text
Api -> Application -> Domain -> Infrastructure
2026-04-09 21:30:23 -05:00
```
2026-04-11 07:12:57 -05:00
This keeps deployment simple while preserving clean boundaries.
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
## Azure Direction
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The target Azure footprint is:
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
- Azure App Service for HTTP and WebSocket traffic
- Azure SQL for relational persistence
- Azure Blob Storage for uploads and update artifacts
- Azure Key Vault for secrets and certificates
- Application Insights for observability
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
Azure SQL is the primary system of record for:
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
- accounts
- devices
- sessions
- update metadata
- host mappings
- bootstrap and provisioning records
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
## Compatibility Goal
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The first compatibility milestone is `core revive` .
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
That means the .NET cloud should handle:
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
- token and session issuance
- account and robot identity flows needed for startup
- core `X-Amz-Target` dispatch
- listen and proactive WebSocket paths
- basic media and update metadata responses
- handoff into normalized `TurnContext` and `ResponsePlan` contracts
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
## Relationship To The Node Prototype
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The Node server remains the discovery harness and fixture source.
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The .NET implementation should:
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
- copy observed behavior where needed
- use fixtures captured from Node and real robots
- avoid speculative protocol design
2026-04-11 21:50:26 -05:00
- separate HTTP parity, websocket parity, and future discovery work so coverage stays honest
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
## Current State
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
This folder now contains the first hosted scaffold, not just a README.
2026-04-09 21:30:23 -05:00
2026-04-11 07:12:57 -05:00
The intent is to grow from a runnable dev monolith into the real Azure deployment target without abandoning the existing abstractions work.
2026-04-11 21:50:26 -05:00
Current websocket scope is still intentionally narrow:
- token-backed socket sessions
2026-04-12 08:31:33 -05:00
- explicit websocket turn-state tracking separate from long-lived cloud session state
2026-04-11 21:50:26 -05:00
- synthetic `LISTEN` result shaping for `LISTEN` , `CLIENT_NLU` , and `CLIENT_ASR`
2026-04-11 22:11:08 -05:00
- buffered audio state tracking behind a dedicated turn-finalization layer
2026-04-15 14:33:43 -05:00
- raw audio auto-finalization once `LISTEN` + `CONTEXT` + minimum buffered audio thresholds are present
2026-04-11 22:11:08 -05:00
- synthetic STT strategy selection for fixture-driven audio turn completion
2026-04-12 09:00:17 -05:00
- structured websocket telemetry and live-run fixture export
2026-04-11 21:50:26 -05:00
- `CONTEXT` capture and follow-up turn state
- `EOS` completion
2026-04-15 18:24:18 -05:00
- delayed `SKILL_ACTION` emission after `EOS` to preserve the current Node-observed turn sequence
2026-04-11 21:50:26 -05:00
- first skill vertical for joke/chat `SKILL_ACTION` playback
2026-04-15 11:58:58 -05:00
- repo-root live-run capture support for both `captures/http/` and `captures/websocket/`
2026-04-11 21:50:26 -05:00
Not yet covered:
- real binary audio / ASR finalization parity
2026-04-11 22:11:08 -05:00
- provider-backed ASR integration
2026-04-12 08:31:33 -05:00
- timed finalize/fallback behavior matching richer Node turn-state semantics
2026-04-11 21:50:26 -05:00
- upstream Nimbus or broader skill lifecycle behavior
- animation / expression command families
- ESML feature parity beyond the narrow synthetic playback payloads used in the current scaffold
2026-04-15 11:58:58 -05:00
## Live Capture Status
The first real `.NET` robot test has confirmed:
- startup HTTP traffic reaches the `.NET` cloud
- `Notification.NewRobotToken` is in the active startup path
- `api-socket.jibo.com` connections are being accepted live
It has not yet confirmed:
- full startup parity with the successful Node run cadence
- consistent eye-open / wake completion on the robot
- the later health/log upload sequence currently seen in the working Node run
2026-04-15 14:33:43 -05:00
Current raw-audio behavior is still a compatibility bridge:
- if buffered audio has a synthetic transcript hint, the server now auto-finalizes the turn and emits `LISTEN` + `EOS` + `SKILL_ACTION`
- if buffered audio crosses the finalize threshold without a usable transcript, the server now emits a Node-style fallback completion with `EOS` instead of hanging the turn forever
- this is intentionally not a claim of real ASR parity
2026-04-16 15:40:28 -05:00
## Buffered Audio STT
The current `.NET` websocket stack now preserves buffered Ogg/Opus websocket frames in memory for each in-flight turn.
That enables two distinct STT paths:
- fixture-oriented synthetic transcript hints for replay and parity tests
- an opt-in local tool-based path that can normalize the buffered Ogg pages, call `ffmpeg` , and then call `whisper.cpp`
The local tool path is intentionally off by default. It exists to help map real robot audio behavior while the stable hosted cloud remains the primary goal.
Configuration lives under `OpenJibo:Stt` :
- `EnableLocalWhisperCpp`
- `FfmpegPath`
- `WhisperCliPath`
- `WhisperModelPath`
- `WhisperLanguage`
- `TempDirectory`
This is not yet a claim of production-ready onboard ASR. It is a `.NET` discovery seam that keeps us compatible with the Node oracle while we evaluate longer-term options such as Azure-hosted STT or a managed decode/transcribe stack.