Tony Z. Zhao

I am a second-year CS PhD student at Stanford, advised by Chelsea Finn. I am supported by Stanford Robotics Fellowship 2022-23.

Previously, I interned at Tesla Autopilot and Google X Intrinsic. I graduated from Berkeley in 2021, advised by Sergey Levine and Dan Klein.

I want to enable robots to perform complex fine manipulation tasks. I am also interested in startups and currently a Deeptech Fellow at Clear Ventures.

Email  /  Twitter  /  LinkedIn  /  Scholar

profile photo

[Feb 22, 2021] We released the code for contextual calibration: Github link.
[Oct 29, 2020] We released the code for MELD: Github link.

What Makes Representation Learning from Videos Hard for Control?
Tony Z. Zhao, Siddharth Karamcheti, Thomas Kollar, Chelsea Finn, Percy Liang,
in submission
RSS 2022 Workshop on Scaling Robot Learning, Best Paper Award Finalist

A large-scale empirical study on pretrained visual representations, focusing on the distribution shift between pretraining videos and downstream control tasks.

Offline Meta-Reinforcement Learning for Industrial Insertion
Tony Z. Zhao*, Jianlan Luo*, Oleg Sushkov, Rugile Pevceviciute, Nicolas Heess, Jon Scholz, Stefan Schaal, Sergey Levine
ICRA, 2022  
arXiv / website

Combines offline meta-RL with online finetuning for industrial insertion. Our method solves 12 new tasks including RAM and network card insertion, with 100% success rate and an average of 6 minutes online interactions.

Calibrate Before Use: Improving Few-Shot Performance of Language Models
Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
ICML, 2021   (Long talk, top 3%)
arXiv / code

Introduces contextual calibration, a data-free procedure that improves GPT-2/GPT-3’s accuracy (up to 30% absolute) and reduces variance across different prompt designs.

Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention
Abhishek Gupta*, Justin Yu*, Tony Z. Zhao*, Vikash Kumar*, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, Sergey Levine
ICRA, 2021
arXiv / website

Towards autonomous robot training. Learning complex dexterous manipulation skills with 16-DOF robotic hand and 6-DOF Sawyer arm, through 60 hours of non-interrupted training.

Concealed Data Poisoning Attacks on NLP Models
Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
NAACL, 2021
arXiv / blog / twitter / code

Demonstrates that predictions of deep NLP models can be manipulated with concealed changes to the training data. Experimented with widely used models (e.g. BERT, GPT-2) and tasks including text classification, language modeling and machine translation.

MELD: Meta-Reinforcement Learning from Images via Latent State Models
Tony Z. Zhao*, Anusha Nagabandi*, Kate Rakelly*, Chelsea Finn, Sergey Levine
CoRL, 2020
arXiv / website / code

Bridges Meta-RL for fast skill acquisition and latent state models for state estimation. First meta-RL algorithm trained on real-world robotic control setting from images.

website template