RouteLLM: I was LMSYS research lead and co-first author on a paper for LLM routing using preference data. We showed that our routers can reduce cost by up to 85% on academic benchmarks without compromising quality, beating several startups in the process. I also created an open-source framework to productionize this work and actively maintain it, with 2k+ stars on GitHub.
Efficient distributed inference: I wrote my Master’s research thesis on speeding up distributed transformer inference using a technique called “dynamic partitioning”, switching between different tensor parallel strategies at inference time depending on GPU and model characteristics. As part of this, I also explored adjacent ideas for speeding up inference related to prompt disaggregation and KV cache offloading.
Chatbot Arena: I added support for several models on Chatbot Arena and conducted analysis on understanding model performance through the lens of human preference.
Tensor Trust: I was co-author on an AI safety paper analyzing the adversarial robustness of LLMs using an online game where users could attack and defend against prompt injection attacks. In particular, I spent quite a bit time trying to transfer the insights gathered from the game to real-world applications, jailbreaking Notion AI, Claude, Bing Chat, and ChatGPT :)
SkyPilot: I was part of the core dev team of ~10 on SkyPilot, a framework for seamlessly executing ML workloads across clouds (towards achieving the vision of the sky). I contributed towards several efforts to improve the robustness of SkyPilot’s multi-node provisioning and configuration setup.
Exoshuffle: I was co-author on a distributed systems paper introducing a new architecture for generalized, large-scale shuffle algorithms built on top of Ray and distributed futures.
Exoshuffle-Cloudsort: We demonstrated how Exoshuffle could match the performance of monolithic shuffle systems, creating the world's most cost-efficient sort system and breaking the previous record on the Cloudsort benchmark at $0.97/TB
Reka: I optimized multi-modal inference, investigated video understanding, and worked on long-context modeling for Reka Core, a SOTA multimodal LLM rivaling GPT-4 performance; was the 1st external intern hired at the company.
Citadel Securities: I led the design and development of a new architecture for sending securities over a low-latency, distributed message bus and deployed this into production systems before I left; lots of C++, networking, and system design.
Monad Labs: I tackled 2 main projects in distributed, low-latency systems to parallelize the EVM; 1) I implemented lazy optimizations for gossip protocols (libp2p) as part of the consensus mechanism, reducing bandwidth requirements by over 50%; 2) I created the first prototype of Monad's mempool in Rust, achieving latency improvements of up to 6x using the Tokio runtime; one of the first few interns hired in a lean team of ~15.
Google : I saved tens of thousands of engineering hours and improved the efficiency of global Google Cloud networking deployments by creating a new, distributed service to identify and cluster flaky workflows; worked with C++, gRPC, and clustering algorithms.
Motional: I built mapping software infrastructure for self-driving vehicles deployed on Uber and Lyft; specifically, I created a new service to visualize mapping algorithms so as to allow engineers to better debug these algorithms; I also built a backend service to index and search lidar, radar, and camera data collected from vehicles, processing terabytes of data each day.
Bot MD: I built an AI assistant used by doctors in the fight against COVID-19 across Southeast Asia; notably, I spearheaded the design and development of a new internal task orchestration platform for the entire company called Bach; I worked across Python / Django and Go, leveraging custom Docker Compose files and AWS images for scalable deployment.
GovTech: I was part of the team that built a new web application called OneCV used to streamline the delivery of social services to the underprivileged across Singapore; I was involved in the entire process, from user research and requirements gathering to design and development.
directed Cal Hacks, the world's largest collegiate hackathon
conducted cs + bio research at the Singapore University of Technology & Design, developing an Android application to analyze an athlete's running pace to determine the optimal music for running performance via auditory-motor synchronization (2016)
was the first Singaporean to win Google Code-in, winning a trip to Google’s Mountain View HQ in high school (2016)
worked on nlp research at the Defense Science & Technology Agency, building a web application that uses natural language understanding to intelligently categorize and visualize search engines (2014)
created CatAn Lab, an app to help students learn qualitative analysis in Chemistry, and won 2nd at a nationwide competition (2013)
dipped my toes my in competitive programming e.g. Project Euler (2012)
I am the proud recipient of 2 gold medals on S/O lol - looking at these questions are never not embarrassing but also a reminder of how much I've grown :')