The Organization
Meta’s Reality Labs division — formerly Oculus — is the arm responsible for building the hardware and software that power Meta’s mixed-reality ecosystem: Quest headsets, Ray-Ban Meta smart glasses, Horizon Worlds, and the underlying Avatars platform that gives users a persistent digital identity across every surface.
Reality Labs
Meta’s dedicated division for VR, AR, and mixed-reality products. Encompasses hardware (Quest, Ray-Ban), software (Horizon), and core platform services like Avatars.
Avatars Team
Responsible for creating realistic, expressive digital humans. Avatars have natural gestures, idle animations, and playful micro-expressions — representing users across Quest, Horizon, Messenger, and Instagram.
My Role
ML Infrastructure Engineer architecting experimentation infrastructure supporting Meta Avatars and Reality Labs research platforms. Focused on debugging tooling, pipeline testing, observability, and compute orchestration used by ~300 engineers.
Why This Work Mattered
Meta Avatars are generated from a deep pipeline: face scans, body estimation, clothing selection, expression rigs, and rendering. With billions of potential configurations, edge cases surface constantly — avatars rendered too skinny, too tall, with broken textures, or mismatched proportions. Researchers and engineers needed fast, reliable tooling to find, diagnose, and fix these issues before they shipped to users.
Services & Applications Built
Four distinct services, all engineered around one mission: give the Avatars team total visibility into their pipeline so no rendering defect reaches production undetected.
Avatar Search & Chat Interface
A conversational search tool enabling researchers to query specific avatars across their account. Natural-language input to locate, inspect, and triage avatar issues by ID, configuration, or visual anomaly type.
Debugging Dashboard
A standalone product platform for retrieving and displaying avatar debugging data. Visualized rendering parameters, mesh metrics, body proportions, and expression rig states so engineers could pinpoint exactly where the pipeline produced a defect.
Email Automation Service
Automated notification pipelines that alerted stakeholders when avatar quality regressions were detected. Digest reports, threshold-based alerts, and escalation routing so the right people knew about issues before users did.
CI/CD Test Infrastructure
Built the merge-gate testing layer that enforced cross-team test suites. Code could not merge unless it passed validation from all dependent teams — preventing one team's change from breaking another team's avatar surface.
Centralized Observability Platform
Developed a unified platform aggregating logs, debugging signals, and model outputs using Plog and internal infrastructure. Enabled real-time monitoring by research teams and Reality Labs leadership — giving end-to-end visibility into the experiment-to-render pipeline across GPU-intensive compute environments.
System Design
The four services formed an integrated debugging ecosystem. The search interface and dashboard consumed data from the avatar pipeline, while the email service monitored quality signals and the CI/CD layer enforced standards at merge time.
┌──────────────────────────────────────────────────────────────────────────┐ │ META AVATARS — ML INFRASTRUCTURE ECOSYSTEM │ ├──────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Chat / Search │ │ Debug Dashboard │ │ Observability │ │ │ │ Interface │ │ (Standalone) │ │ Platform (Plog) │ │ │ └───────┬──────────┘ └────────┬─────────┘ └────────┬─────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ C++ Experimentation Infrastructure │ │ │ │ Experiment Tracking · Debugging · Results Aggregation │ │ │ │ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌───────────────┐ │ │ │ │ │ Meshes │ │ Rigs │ │ Textures │ │ Rendering │ │ │ │ │ └──────────┘ └──────────┘ └────────────┘ └───────────────┘ │ │ │ └──────────────────────────┬─────────────────────────────────────┘ │ │ │ │ │ ┌───────────────┼───────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────────────┐ ┌────────────────┐ ┌──────────────────┐ │ │ │ Email Automation │ │ CI/CD Test Gate│ │ GPU Compute │ │ │ │ (Monitoring) │ │ (Merge Block) │ │ Orchestration │ │ │ └──────────────────┘ └────────────────┘ │ 150+ cores │ │ │ │ │ │ ~2TB RAM │ │ │ ▼ ▼ │ ~55 min/render │ │ │ Stakeholder Alerts Cross-Team Tests └──────────────────┘ │ │ & Digest Reports Must pass ALL │ │ │ └──────────────────────────────────────────────────────────────────────────┘
Results & Outcomes
These tools became core infrastructure for the Avatars team’s daily workflow, reducing time-to-diagnosis and preventing pipeline breakages across teams.
~300 engineers supported — The C++ experimentation infrastructure enabled experiment tracking, debugging, and results aggregation across ML workflows used daily by hundreds of researchers and engineers.
Faster issue triage — Researchers could search and locate defective avatars through the chat interface instead of manually querying databases, reducing diagnosis time from hours to minutes.
Visual debugging at a glance — The debugging dashboard surfaced mesh geometry, body proportions, expression states, and rendering parameters in a single view, eliminating the need to inspect raw pipeline data.
Real-time observability for leadership — Centralized platform aggregating logs, debugging signals, and model outputs via Plog, enabling monitoring by both research teams and Reality Labs leadership.
Proactive quality monitoring — The email automation service caught regressions early by alerting teams when quality metrics drifted outside acceptable thresholds, before users encountered the issue.
Zero pipeline regressions from unvetted merges — Distributed validation systems including end-to-end, regression, and integration testing frameworks prevented regressions across complex ML research pipelines.
GPU-scale compute orchestration — Engineered high-performance compute orchestration for systems utilizing 150+ GPU cores and ~2TB RAM, enabling avatar generation pipelines requiring ~55 minutes per render.
Tech Stack & Environment
Working within Meta’s internal infrastructure, leveraging their proprietary tooling alongside industry-standard technologies.
Languages & Tooling
- C++ (experimentation infra)
- Python (pipeline automation)
- Bash / Shell scripting
- Jupyter Notebooks
Infrastructure & Compute
- ALA servers (GPU clusters)
- MTP developer platforms
- Internal cloud environments
- 150+ GPU core orchestration
Observability & Testing
- Plog (logging infrastructure)
- End-to-end test frameworks
- Regression & integration suites
- Cross-team merge gates
Domain
- ML experimentation pipelines
- Avatar generation (~55 min/render)
- 3D mesh & body estimation
- Expression rigging systems
How a Typical Bug Flowed Through the System
From detection to resolution, the four services formed a continuous loop that kept avatar quality high and turnaround fast.
1. Detection
Quality metrics drift outside threshold — an avatar body type renders 15% narrower than expected. The monitoring service fires an alert email to the owning team with the affected avatar IDs, configuration snapshot, and severity level.
2. Search & Triage
A researcher opens the chat tool, queries for the flagged avatar IDs, and filters by body configuration. The interface returns matching avatars, their generation timestamps, and pipeline stage where the anomaly was introduced.
3. Diagnosis
The engineer opens the standalone dashboard, loads the affected avatar, and inspects the mesh geometry, body proportion parameters, and expression rig state side-by-side. Pinpoints the issue to a body estimation weight that was incorrectly applied.
4. Fix & Validation
The engineer submits a fix. The merge gate runs the full cross-team test suite: avatar rendering tests, expression tests, body proportion tests, and integration tests from every dependent surface. All pass. Code merges. Pipeline stays healthy.
Case Study — Meta Reality Labs, Avatars Team · Prepared 2026