Publications

A System for Microserving of LLMs
WebLLM: A High-Performance In-Browser LLM Inference Engine
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Local deployment of large-scale music AI models on commodity hardware
Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
Coordinating Distributed Example Orders for Provably Accelerated Training