|
Han Yi
Hello! I am currently a second-year CS PhD student at the University of North Carolina at Chapel Hill (UNC), under the guidance of Prof. Gedas Bertasius. I completed my Master's degree at the National University of Singapore (NUS) in 2024. During my time at NUS, I also served as a research intern at the NExT++ Research Center, where I was advised by Prof. Tat-Seng Chua, Prof. Zhedong Zheng, and Prof. Xiangyu Xu.
I love basketball, football, rap music, and fitness.
Email /
CV /
Google Scholar /
Github
|
|
Research
I'm broadly interested in advanced Computer Vision and Multi-modal Learning, with a focus on both the fine-grained understanding and generation of complex human actions. I also work on leveraging foundation models (LLMs, VLMs, etc.) to solve multiple video understanding tasks.
|
|
SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence
Yulu Pan*, Han Yi*, Seongsu Ha*, Md Mohaiminul Islam*, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius
*Equal contribution
arXiv, 2026
Project Page
/
Paper
/
Extended Paper
/
Code
/
Data
SVI-Bench is a unified benchmark for evaluating the full cognitive stack of video intelligence — perception, reasoning, simulation, and agency. Built on 35,000 hours of broadcast basketball, soccer, and hockey footage, it shows that while models handle perception well, reasoning, simulation, and agency are where current systems break down.
|
|
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi, Yulu Pan, Feihong He, Xinyu Liu, Benjamin Zhang, Oluwatumininu Oguntola, Gedas Bertasius
NeurIPS, 2025
Project Page
/
Paper
/
Data
We introduce ExAct, a video-language benchmark for expert-level analysis of skilled human actions. It contains over 3,500 expert-curated video QA pairs across domains like sports, cooking, and music. Our benchmark reveals a significant performance gap between state-of-the-art VLMs and human experts, highlighting the need for models with a more nuanced understanding of complex human skills.
|
|
Progressive Text-to-3D Generation for Automatic 3D Prototyping
Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-seng Chua
ACM Transactions on Multimedia Computing, Communications and Applications (TOMM), 2026
A progressive strategy that learns text-to-3D in a coarse-to-fine manner.
|
|
Image Deblurring With Image Blurring
Ziyao Li, Zhi Gao, Han Yi, Yu Fu, Boan Chen
IEEE Transactions on Image Processing (TIP), 2023
Proposed a novel motion deblurring framework using Blur Space Disentangled Network (BSDNet) and Hierarchical Scale-recurrent Deblurring Network (HSDNet) to effectively address real-world blur, achieving state-of-the-art results.
|
|
Single image deraindrop leveraging luminance priors and context aggregation
Yi Liu, Zhi Gao, Tiancan Mei, Han Yi
Neurocomputing, 2024
Developed a recurrent single-image deraindrop approach utilizing luminance priors and contextual feature aggregation, achieving superior performance in restoring color and texture consistency.
|
Services
Reviewer: ICLR 2026, ICLR 2025, ACM MM 2025, ACM MM ASIA 2025, ACM MM 2024 (Outstanding Reviewer Award)
|
|