Songlin Wei 魏松林

I'm a first-year PhD student at USC computer science department advised by Prof. Yue Wang.

Previously, I spent two truly wonderful years at Peking University.

I earned my Bachelor of Software Engineering degree from Xiamen University. Over the years, my career has undergone various transformations. I developed large social media websites, built robots, and started companies.

My current research directions:

  • Vision-Language-Action models for humanoids
  • Benchmarking-oriented Simulation
  • Humanoid Motion Tracking

Please reach out for collaboration if interested.

Email  /  GitHub  /  Google Scholar  /  Wechat

profile photo

Publications

* denotes equal contribution, † denotes corresponding author(s)
project image

Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation


Songlin Wei*, Hongyi Jing*, Boqian Li*, Zhenyu Zhao*, Jiageng Mao, Zhenhao Ni, Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang, Marco Pavone, Di Huang, Yue Wang†
Arxiv Preprint, 2026
arxiv / website /

Ψ0 is an open vision-language-action (VLA) model for dexterous humanoid loco-manipulation.

project image

ICLR: In-Context Imitation Learning with Visual Reasoning


Toan Nguyen, Weiduo Yuan, Songlin Wei, Hui Li, Daniel Seita†, Yue Wang†
Arxiv Preprint, 2026
arxiv / website /

We present In-Context Imitation Learning with Visual Reasoning (ICLR), a framework that augments demonstration prompts with structured visual reasoning traces representing anticipated future robot trajectories in image space

project image

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation


Shaocong Xu, Songlin Wei, Qizhe Wei, Zheng Geng, Hong Li, Licheng Shen, Qianpu Sun, Shu Han, Bin Ma, Bohan Li, Chongjie Ye, Yuhang Zheng, Nan Wang, Saining Zhang, and Hao Zhao†
Arxiv Preprint, 2025
arxiv / website /

“Diffusion knows transparency.” Generative video priors can be repurposed, efficiently and label-free, into robust, temporally coherent perception for challenging real-world manipulation.

project image

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data


Shengliang Deng*, Mi Yan*, Songlin Wei, Haixin Ma, Yuxin Yang, Jiayi Chen, Zhiqi Zhang, Taoyu Yang, Xuheng Zhang, Heming Cui, Zhizheng Zhang, He Wang†
Arxiv Preprint, 2025
arxiv / website /

We present GraspVLA, a VLA model pretrained on large-scale synthetic action data as a foundational model for grasping tasks.

project image

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks


Jiazhao Zhang, Kunyu Wang ,Shaoan Wang ,Minghan Li ,Haoran Liu, Songlin Wei, Zhongyuan Wang ,Zhizheng Zhang† ,He Wang†
Arxiv Preprint, 2024

We present Uni-NaVid, the first video-based vision-language-action (VLA) model designed to unify diverse embodied navigation tasks and enable seamless navigation for mixed long-horizon tasks in unseen real-world environments.

project image

RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments


Yuxing Chen*, Songlin Wei*, Bowen Xiao, Jiangran Lyu, Jiayi Chen, Feng Zhu, He Wang†
Arxiv Preprint, 2024

In this work, we address the problem of inserting a hanger into various unseen garments that are initially laid out flat on a table.

project image

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation


Wenbo Cui*, Chengyang Zhao*, Songlin Wei*, Jiazhao Zhang, Haoran Geng, Yaran Chen, He Wang†
Arxiv Preprint, 2024
arxiv /

we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses.

project image

D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation


Songlin Wei, Haoran Geng, Jiayi Chen, Congyue Deng, Wenbo Cui, Chengyang Zhao, Xiaomeng Fang, Leonidas Guibas, He Wang†
CoRL 2024, Wild3D@ECCV 2024, 2024
arxiv / website /

We propose a diffusion model-based depth estimation framework on stereo image pairs for robotic manipulation.

project image

Make a Donut🍩: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools


Yang You, Bokui Shen, Congyue Deng, Haoran Geng, Songlin Wei, He Wang, Leonidas Guibas†
Arxiv, 2024
arxiv /

In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training

project image

Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach


Yufei Ding*, Haoran Geng*, Chaoyi Xu, Xiaomeng Fang, Jiazhao Zhang, Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang†
IROS, 2024
website /

We present Open6DOR, a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks. Following this, we propose a zero-shot and robust method, Open6DORGPT, which proves effective in demanding simulation environments and real-world scenarios.

project image

SAGE🌿: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects


Haoran Geng*, Songlin Wei*, Congyue Deng, Bokui Shen, He Wang†, Leonidas Guibas†
RSS, 2024
arxiv / website /

We present SAGE🌿, a framework bridging the understanding of semantic and actionable parts for generalizable manipulation of articulated objects.

project image

FG-NeRF: Flow-GAN based Probabilistic Neural Radiance Field for Independence-Assumption-Free Uncertainty Estimation


Songlin Wei*, Jiazhao Zhang*, Yang Wang, Fanbo Xiang, Hao Su, He Wang
Arxiv, 2023
arxiv /

We propose an independence-assumption-free probabilistic neural radiance field based on Flow-GAN. By combining the generative capability of adversarial learning and the powerful expressivity of normalizing flow, our method explicitly models the density-radiance distribution of the whole scene.

project image

3D Object Aided Self-Supervised Monocular Depth Estimation


Songlin Wei, Guodong Chen, Wenzheng Chi, Zhenhua Wang and Lining Sun
IROS, 2022
arxiv / video /

Self-supervised depth estimation methods rely on static world assumption, which produce inaccurate depths of dynamic objects. In this work, we propose to address dynamic object movements through monocular 3D object detection.

project image

Object Clustering with Dirichlet Process Mixture Model for Data Association in Monocular SLAM


Songlin Wei, Guodong Chen, Wenzheng Chi, Zhenhua Wang and Lining Sun
IEEE Transactions on Industrial Electronics, 2022
arxiv / video /

We propose a novel data association method for cuboid landmarks based on Dirichlet Process Mixture Model. By jointly considering object class, position, and size, our method can perform data association robustly.






Forked from Leonid Keselman's website