Tony Z. Zhao

I am a first-year CS PhD student at Stanford, advised by Chelsea Finn. I am interested in Robotics, Machine Learning and NLP.

Previously, I was a resident at Google X (Intrinsic) summer 2021. I did my undergrad at Berkeley 2017-2021, advised by Sergey Levine and Dan Klein. I worked closely with Anusha Nagabandi, Eric Wallace, and Abhishek Gupta.

Email  /  Twitter  /  LinkedIn  /  Scholar

profile photo

[Mar 23, 2022] I will be interning at Tesla Autopilot summer 2022.
[Feb 22, 2021] We released the code for contextual calibration: Github link.
[Jan 14, 2021] I will be an AI resident at Google X summer 2021.
[Oct 29, 2020] We released the code for MELD: Github link.

Offline Meta-Reinforcement Learning for Industrial Insertion
Tony Z. Zhao*, Jianlan Luo*, Oleg Sushkov, Rugile Pevceviciute, Nicolas Heess, Jon Scholz, Stefan Schaal, Sergey Levine
ICRA, 2022  
arXiv / website

Combines offline meta-RL with online finetuning for industrial insertion. Our method solves 12 new tasks including RAM and network card insertion, with 100% success rate and an average of 6 minutes online interactions.

Calibrate Before Use: Improving Few-Shot Performance of Language Models
Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
ICML, 2021   (Long talk, top 3%)
arXiv / code

Introduces contextual calibration, a data-free procedure that improves GPT-2/GPT-3’s accuracy (up to 30% absolute) and reduces variance across different prompt designs.

Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention
Abhishek Gupta*, Justin Yu*, Tony Z. Zhao*, Vikash Kumar*, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, Sergey Levine
ICRA, 2021
arXiv / website

Towards autonomous robot training. Learning complex dexterous manipulation skills with 16-DOF robotic hand and 6-DOF Sawyer arm, through 60 hours of non-interrupted training.

Concealed Data Poisoning Attacks on NLP Models
Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
NAACL, 2021
arXiv / blog / twitter / code

Demonstrates that predictions of deep NLP models can be manipulated with concealed changes to the training data. Experimented with widely used models (e.g. BERT, GPT-2) and tasks including text classification, language modeling and machine translation.

MELD: Meta-Reinforcement Learning from Images via Latent State Models
Tony Z. Zhao*, Anusha Nagabandi*, Kate Rakelly*, Chelsea Finn, Sergey Levine
CoRL, 2020
arXiv / website / code

Bridges Meta-RL for fast skill acquisition and latent state models for state estimation. First meta-RL algorithm trained on real-world robotic control setting from images.

website template