Meta
September 2025 – PresentMachine Learning Infrastructure Engineer
- Architected large-scale C++ experimentation infrastructure enabling experiment tracking, debugging, and results aggregation across ML workflows used by ~300 engineers.
- Built distributed validation systems including end-to-end, regression, and integration testing frameworks preventing regressions across complex ML research pipelines.
- Developed centralized observability platform aggregating logs, debugging signals, and model outputs using Plog and internal infrastructure for real-time monitoring.
- Integrated compute infrastructure including ALA servers, internal cloud environments, Skills tooling, and MTP developer platforms supporting large-scale ML experimentation.
- Engineered high-performance compute orchestration for systems utilizing 150+ GPU cores and ~2TB RAM, enabling avatar generation pipelines requiring ~55 minutes per render.