Tony Z. Zhao

I am a first-year CS PhD student at Stanford, advised by Chelsea Finn. I am interested in Robotics, Machine Learning and NLP.

Previously, I was a resident at Google X (Intrinsic) summer 2021. I did my undergrad at Berkeley 2017-2021, advised by Sergey Levine and Dan Klein. I worked closely with Anusha Nagabandi, Eric Wallace, and Abhishek Gupta.

Email  /  Twitter  /  LinkedIn  /  Scholar

profile photo

[Mar 23, 2022] I will be interning at Tesla Autopilot summer 2022.
[Feb 22, 2021] We released the code for contextual calibration: Github link.
[Jan 14, 2021] I will be an AI resident at Google X summer 2021.
[Oct 29, 2020] We released the code for MELD: Github link.

What Makes Representation Learning from Videos Hard for Control?
Tony Z. Zhao, Siddharth Karamcheti, Thomas Kollar, Chelsea Finn, Percy Liang,
in submission
RSS 2022 Workshop on Scaling Robot Learning, Best Paper Award Finalist

A large-scale empirical study on pretrained visual representations, focusing on the distribution shift between pretraining videos and downstream control tasks.

Offline Meta-Reinforcement Learning for Industrial Insertion
Tony Z. Zhao*, Jianlan Luo*, Oleg Sushkov, Rugile Pevceviciute, Nicolas Heess, Jon Scholz, Stefan Schaal, Sergey Levine
ICRA, 2022  
arXiv / website

Combines offline meta-RL with online finetuning for industrial insertion. Our method solves 12 new tasks including RAM and network card insertion, with 100% success rate and an average of 6 minutes online interactions.

Calibrate Before Use: Improving Few-Shot Performance of Language Models
Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
ICML, 2021   (Long talk, top 3%)
arXiv / code

Introduces contextual calibration, a data-free procedure that improves GPT-2/GPT-3’s accuracy (up to 30% absolute) and reduces variance across different prompt designs.

Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention
Abhishek Gupta*, Justin Yu*, Tony Z. Zhao*, Vikash Kumar*, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, Sergey Levine
ICRA, 2021
arXiv / website

Towards autonomous robot training. Learning complex dexterous manipulation skills with 16-DOF robotic hand and 6-DOF Sawyer arm, through 60 hours of non-interrupted training.

Concealed Data Poisoning Attacks on NLP Models
Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
NAACL, 2021
arXiv / blog / twitter / code

Demonstrates that predictions of deep NLP models can be manipulated with concealed changes to the training data. Experimented with widely used models (e.g. BERT, GPT-2) and tasks including text classification, language modeling and machine translation.

MELD: Meta-Reinforcement Learning from Images via Latent State Models
Tony Z. Zhao*, Anusha Nagabandi*, Kate Rakelly*, Chelsea Finn, Sergey Levine
CoRL, 2020
arXiv / website / code

Bridges Meta-RL for fast skill acquisition and latent state models for state estimation. First meta-RL algorithm trained on real-world robotic control setting from images.

website template