I am a research scientist at Meta SuperIntelligence Labs, working on speech and audio. I am a core contributor to Meta’s foundational audio generation models, including SAM Audio, MovieGen Audio, AudioBox, VoiceBox and MMS. I obtained my Ph.D. from TTIC where I worked on automatic sign language understanding under the advisement of Prof. Karen Livescu.
December 2025 — Launched SAM Audio, a foundation model that extends Segment Anything to audio, enabling general-purpose audio separation via multimodal prompts.
March 2025 — Our team released AudioBox-aesthetics, a unified automatic quality assessment framework for any audio.