Avoiding the Blue-Bubble: Why VR Collaboration Needs an Open Protocol

TL;DR: If tomorrow’s workplaces depend on VR, we can’t repeat the phone-messaging mistake where features only work inside one vendor’s walled garden. “Zoom-for-VR” or “WhatsApp-for-VR” will always leave someone out—or lock everyone in. The fix isn’t another app; it’s a protocol: an open, interoperable way to discover sessions, exchange avatars, synchronize shared 3D objects, and protect privacy across any headset or runtime. Think RCS-for-VR, not another silo.


The problem: VR’s looming “blue bubble”

In VR, the value is being able to stand around the same object—a design mock, a 3D scan of an arena, a data viz—and collaborate in real time. If that only works when everyone buys the same headset and same meeting app, the tech fails the moment a teammate, client, or vendor uses different gear.

We’ve seen this movie: modern messaging took years to claw back from proprietary silos until the industry moved toward RCS and standardized features like high-quality media, typing indicators, and (increasingly) interoperable E2EE. That change came from standards, not yet another app. (GSMA)

VR must follow the same path. “Let’s just make a Zoom-for-VR” sounds sensible—until procurement asks whether your counterpart is on Teams, Zoom, or something else, and legal asks who holds the data keys. A protocol-first approach lets different apps, engines, and devices interoperate without forcing everyone into the same subscription or hardware.


What exists we can stand on

We already have strong building blocks:

  • OpenXR gives apps a common API to talk to many XR runtimes and devices. It reduces per-device porting and is supported across major ecosystems. (The Khronos Group)
  • glTF 2.0 (and OpenUSD) are widely adopted 3D asset formats; they’re efficient, extensible ways to ship meshes, materials, animations. (Khronos Registry)
  • WebRTC data channels provide low-latency, encrypted, reliable/ordered or unreliable/unordered transport for multi-peer state sync. (IETF Datatracker)
  • Matrix demonstrates open, federated, E2EE-capable real-time messaging with room semantics and bridging—useful inspiration for session discovery and federation. (Matrix Specification)
  • MPEG-I Scene Description (ISO/IEC 23090-14) is formalizing how complex immersive scenes reference media and interact—another anchor for consistency. (ISO)
  • Industry groups are actively aligning USD & glTF and mapping the standards landscape for metaverse interoperability. (Metaverse Standards Forum)

These pieces don’t yet form a complete “collaboration protocol,” but they’re close.


Proposal: VRCSP — VR Collaboration Session Protocol (sketch)

A protocol, not a product. Vendors can compete on UX while interoperating at the wire level.

Design goals

  1. Hardware-agnostic sessions: any OpenXR/engine runtime can join. (The Khronos Group)
  2. Asset-format neutrality with glTF 2.0 / OpenUSD as first-class, and room for MPEG-I SD alignment. (Khronos Registry)
  3. Real-time, encrypted state sync over WebRTC data channels (with MLS-style group E2EE for control/state). (MDN Web Docs)
  4. Federated discovery: Matrix-like rooms so organizations can host their own servers yet collaborate across domains. (Matrix Specification)
  5. Conflict-free collaboration: CRDT-based scene graph edits so concurrent changes merge without foot-guns.
  6. Extensible capabilities: hand tracking, passthrough/AR, haptics negotiated per session.
  7. Privacy & enterprise posture: org-managed identity, audit, and key control.

High-level architecture

  • Signaling & Discovery:
    • Federation layer (Matrix-style) to create/join rooms, exchange offers, publish capabilities, and manage access control. (Matrix Specification)
  • Transport:
    • WebRTC for media + RTCDataChannel for state (unordered/unreliable for fast transforms; ordered/reliable for scene edits). (IETF Datatracker)
  • Assets & Scene:
    • Shared scene graph references glTF/OpenUSD assets; optional mapping to MPEG-I Scene Description for media integration. (Khronos Registry)
  • Runtime:
    • Clients render via their native engine/runtime (OpenXR where available). (The Khronos Group)
  • Security:
    • DTLS/SRTP on WebRTC plus MLS group keys for control/state channels; enterprise can host the MLS/identity service (RCS’s movement toward interoperable E2EE shows the path). (MDN Web Docs)

Wire-level sketch

Below is a conceptual message model for VRCSP (JSON for readability). This is not a finalized spec—it shows how a fully open, vendor-neutral session might work.

1) Session discovery & join

// Matrix-like room create (federated) — discovery layer
{
  "type": "room.create",
  "room_id": "!abc123:example.com",
  "visibility": "private",
  "purpose": "design-review",
  "policy": { "invite_only": true, "orgs_allowed": ["example.com", "partner.org"] }
}
// Capability handshake over reliable control channel
{
  "type": "hello",
  "protocol": "VRCSP/0.1",
  "identity": { "user": "alice@example.com", "org": "example.com" },
  "runtime": {
    "api": "OpenXR",
    "version": "1.0",
    "features": { "handTracking": true, "passthrough": false, "eyeGaze": true }
  },
  "formats": { "assets": ["glTF2.0", "OpenUSD"], "scene": ["MPEG-I-SD"] },
  "transport": { "webrtc": { "sdp": "…", "iceServers": ["stun:stun.example.org"] } },
  "crypto": { "mls": { "cipherSuite": "MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519" } }
}

2) Asset manifest & content-addressing

// What the session will render; URIs may be HTTPS, S3, IPFS, etc.
{
  "type": "asset.manifest",
  "scene": "usd://designs/lockerroom.usd",
  "assets": [
    { "id": "chair01", "uri": "https://cdn.example.com/chair.glb",
      "hash": "sha256-5e3d…", "format": "glTF2.0" },
    { "id": "flooring", "uri": "s3://bucket/flooring.usdz",
      "hash": "sha256-8a70…", "format": "OpenUSD" }
  ]
}

Clients cache/verify by hash for integrity; fall back to alternative formats if needed.

3) Avatar description & presence

// Avatar descriptor (could also allow VRM or USD-based rigs)
{
  "type": "avatar.descriptor",
  "user": "alice@example.com",
  "humanoidRig": "humanoid-v1",
  "asset": { "uri": "https://cdn.example.com/avatars/alice.glb", "hash": "sha256-1c2d…" },
  "attachments": [{ "type": "badge", "label": "Architect" }]
}

4) Real-time pose & input (fast, unordered data channel)

{
  "type": "avatar.pose",
  "user": "alice@example.com",
  "t": 1726071584.234,
  "head": { "p": [0,1.62,0], "q": [0,0,0,1] },
  "hands": {
    "left": { "visible": true, "joints": [[…], […]] },
    "right": { "visible": true, "joints": [[…], […]] }
  }
}

5) Shared scene edits (CRDT patch, reliable channel)

// Create a shared object
{
  "type": "scene.op",
  "doc": "scene-graph",
  "op_id": "d6c9-1",
  "crdt": {
    "op": "add",
    "path": "/objects/cube42",
    "value": { "kind": "mesh", "asset": "chair01", "transform": {
      "t": [2.0, 0.0, -1.5], "r": [0, 0.707, 0, 0.707], "s": [1,1,1]
    }}
  }
}
// Update transform concurrently — merges deterministically
{
  "type": "scene.op",
  "doc": "scene-graph",
  "op_id": "d6c9-2",
  "crdt": {
    "op": "set",
    "path": "/objects/cube42/transform/t",
    "value": [2.3, 0.0, -1.2]
  }
}

6) Media & annotations

// Attach a live video note to an object (ties to MPEG-I SD ideas)
{
  "type": "annotation.media",
  "target": "/objects/cube42",
  "media": {
    "kind": "video-note",
    "uri": "webrtc://stream/uuid-77",
    "thumbnail": "https://cdn.example.com/notes/77.jpg"
  }
}

7) Permissions & export

{
  "type": "policy.update",
  "rules": [
    { "role": "viewer", "allow": ["read.pose", "read.scene"] },
    { "role": "editor", "allow": ["write.scene", "annotate"] }
  ]
}
// Export a point-in-time bundle for audit/share
{
  "type": "export.request",
  "format": "usd",
  "include": ["sceneGraph", "annotations", "provenance"],
  "destination": "s3://org-archive/design-review-2025-09-11.usdz"
}

Why this beats “Teams-for-VR”

  • Freedom of client: any app that speaks the protocol can join—Unity, Unreal, a web client, native engines—on any headset/runtime. OpenXR helps with the runtime side. (The Khronos Group)
  • No asset traps: teams keep using glTF/OpenUSD pipelines and still render the same things together. (Khronos Registry)
  • Network & security done right: real-time over WebRTC data channels, E2EE group keys via MLS-style schemes as RCS is moving toward—so no one vendor is the gatekeeper of your collaboration data. (MDN Web Docs)
  • Ecosystem leverage: MPEG-I SD gives a cross-media reference model, and industry forums are already aligning the 3D stack. (ISO)

Implementation notes (for the adventurous)

  • Signaling: start with Matrix rooms for federation, invites, and presence; bridge to enterprise identity. (Matrix Specification)
  • Transport: one reliable control channel (ordered), one reliable scene channel (ordered), and one or more pose channels (unordered, partial reliability) over WebRTC. (IETF Datatracker)
  • Scene sync: use a CRDT library (e.g., Automerge/Yjs) to represent a shared scene document; apply diffs to the local engine’s scene graph.
  • Assets: prefer content-addressed URIs (hashes) + streaming decompression; support both glTF(.glb) and USD(.usd/.usdz). (Khronos Registry)
  • Interop testing: build conformance scenes (materials, skeletal rigs, IK, physics), much like glTF’s sample models. (Khronos Registry)
  • Privacy: org-scoped MLS groups with hardware-rooted key storage where available; auditable exports that don’t leak raw user biometrics.

Call to action

Vendors: compete on clients and UX, not on lock-in. Standards bodies: help converge on a minimal, testable VRCSP that rides OpenXR, glTF/OpenUSD, WebRTC, and MPEG-I SD. Enterprises: insist on protocol-level interoperability in RFPs—just as you did for messaging when the industry matured toward RCS.

If we get this right, collaborating in VR will feel like joining a meeting link—not switching phones just to get a blue bubble.