Songlin Wei 魏松林

I'm a PhD student at school of computer science of Peking University advised by Prof. He Wang.

I earned my Bachelor of Software Engineering degree from Xiamen University. Over the years, my career has undergone various transformations. I developed large social media websites, built robots, and started companies.

I finally found my passion in doing research and went to Soochow University and obtained a master degree in Control Science and Technology. I had worked closely with Prof. WenZheng Chi and Prof. Guodong Chen.

Email / GitHub / Google Scholar / Wechat

Publications

My research interests include 3D computer vision, robotic learning and Embodied AI. I'm currently working on Vision-Language-Action models for robotics. Please reach out for collaboration if interested.

* denotes equal contribution, † denotes corresponding author(s)

	Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks Jiazhao Zhang, Kunyu Wang ,Shaoan Wang ,Minghan Li ,Haoran Liu, Songlin Wei, Zhongyuan Wang ,Zhizheng Zhang† ,He Wang† Arxiv Preprint, 2024 We present Uni-NaVid, the first video-based vision-language-action (VLA) model designed to unify diverse embodied navigation tasks and enable seamless navigation for mixed long-horizon tasks in unseen real-world environments.
	RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments Yuxing Chen, Songlin Wei, Bowen Xiao, Jiangran Lyu, Jiayi Chen, Feng Zhu, He Wang† Arxiv Preprint, 2024 In this work, we address the problem of inserting a hanger into various unseen garments that are initially laid out flat on a table.
	GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation Wenbo Cui, Chengyang Zhao, Songlin Wei, Jiazhao Zhang, Haoran Geng, Yaran Chen, He Wang† Arxiv Preprint*, 2024 arxiv / we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses.
	D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation Songlin Wei, Haoran Geng, Jiayi Chen, Congyue Deng, Wenbo Cui, Chengyang Zhao, Xiaomeng Fang, Leonidas Guibas, He Wang† CoRL 2024, Wild3D@ECCV 2024, 2024 arxiv / website / We propose a diffusion model-based depth estimation framework on stereo image pairs for robotic manipulation.
	Make a Donut🍩: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools Yang You, Bokui Shen, Congyue Deng, Haoran Geng, Songlin Wei, He Wang, Leonidas Guibas† Arxiv, 2024 arxiv / In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training
	Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach Yufei Ding, Haoran Geng, Chaoyi Xu, Xiaomeng Fang, Jiazhao Zhang, Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang† IROS, 2024 website / We present Open6DOR, a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks. Following this, we propose a zero-shot and robust method, Open6DORGPT, which proves effective in demanding simulation environments and real-world scenarios.
	SAGE🌿: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang†, Leonidas Guibas† RSS, 2024 arxiv / website / We present SAGE🌿, a framework bridging the understanding of semantic and actionable parts for generalizable manipulation of articulated objects.
	FG-NeRF: Flow-GAN based Probabilistic Neural Radiance Field for Independence-Assumption-Free Uncertainty Estimation Songlin Wei, Jiazhao Zhang, Yang Wang, Fanbo Xiang, Hao Su, He Wang Arxiv, 2023 arxiv / We propose an independence-assumption-free probabilistic neural radiance field based on Flow-GAN. By combining the generative capability of adversarial learning and the powerful expressivity of normalizing flow, our method explicitly models the density-radiance distribution of the whole scene.
	3D Object Aided Self-Supervised Monocular Depth Estimation Songlin Wei, Guodong Chen, Wenzheng Chi, Zhenhua Wang and Lining Sun IROS, 2022 arxiv / video / Self-supervised depth estimation methods rely on static world assumption, which produce inaccurate depths of dynamic objects. In this work, we propose to address dynamic object movements through monocular 3D object detection.
	Object Clustering with Dirichlet Process Mixture Model for Data Association in Monocular SLAM Songlin Wei, Guodong Chen, Wenzheng Chi, Zhenhua Wang and Lining Sun IEEE Transactions on Industrial Electronics, 2022 arxiv / video / We propose a novel data association method for cuboid landmarks based on Dirichlet Process Mixture Model. By jointly considering object class, position, and size, our method can perform data association robustly.

Forked from Leonid Keselman's website

Songlin Wei 魏松林

Publications

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation

Make a Donut🍩: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools

Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach

SAGE🌿: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects

FG-NeRF: Flow-GAN based Probabilistic Neural Radiance Field for Independence-Assumption-Free Uncertainty Estimation

3D Object Aided Self-Supervised Monocular Depth Estimation

Object Clustering with Dirichlet Process Mixture Model for Data Association in Monocular SLAM