Takuma Yagi, Ph.D.

Publications

Google Scholar
Researchmap

Preprints

Journal papers (refereed)

[1-3] Takuma Yagi, Misaki Ohashi, Yifei Huang, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, and Yoichi Sato. FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation. International Journal of Computer Vision (IJCV). 2025. [preprint] [paper] [data & code]
[1-2] Takuma Yagi, Takumi Nishiyasu, Kunimasa Kawasaki, Moe Matsuki, and Yoichi Sato. GO-Finder: A Registration-Free Wearable System for Assisting Users in Finding Lost Hand-Held Objects. ACM Transactions on Interactive Intelligent Systems (TiiS'22). 2022. [paper]
[1-1] Takehiko Ohkawa, Takuma Yagi, Atsushi Hashimoto, Yoshitaka Ushiku and Yoichi Sato. Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation. IEEE Access. 2021. [project page][paper][preprint][code]

Conference papers (refereed)

[2-11] Yie Qiu, Yanjun Sun, Takuma Yagi, Shusaku Egami, Natsuki Miyata, Ken Fukuda, Kensho Hara, and Ryusuke Sagawa. VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'25). 2025. [paper]
[2-10] Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, and Yoichi Sato. Learning Multiple Object States from Actions via Large Language Models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25). 2025. [project page][arxiv]
[2-9] Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, and Yoichi Sato. Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25). 2025. [arxiv]
[2-8] Yue Qiu, Shusaku Egami, Ken Fukuda, Natsuki Miyata, Takuma Yagi, Kensho Hara, Kenji Iwata, and Ryusuke Sagawa. DailySTR: A Daily Human Activity Pattern Recognition Dataset for Spatio-temporal Reasoning. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS'24), 2024.
[2-7] Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, ..., Takuma Yagi, ..., Michael Wray. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'24). 2024. [project page][arxiv]
[2-6] Sota Miyamoto, Takuma Yagi, Yuto Makimoto, Mahiro Ukaim Yoshitaka Ushiku, Atsushi Hashimoto, and Nakamasa Inoue. PolarDB: Formula-Driven Dataset for Pre-Training Trajectory Encoders, In Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'24). [paper]
[2-5] Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, and Yoichi Sato, Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'23). 2023. [paper][preprint]
[2-4] Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, ..., Takuma Yagi, ... Jitendra Malik. Ego4D: Around the World in 3,000 Hours of Egocentric Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22). 2022. [project page][preprint]
[2-3] Takuma Yagi, Md. Tasnimul Hasan, and Yoichi Sato, Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction. British Machine Vision Conference (BMVC'21). 2021. [preprint][code][talk]
[2-2] Takuma Yagi, Takumi Nishiyasu, Kunimasa Kawasaki, Moe Matsuki, and Yoichi Sato. GO-Finder: A Registration-Free Wearable System for Assisting Users in Finding Lost Objects via Hand-Held Object Discovery. In Proceedings of the 26th International Conference on Intelligent User Interfaces (IUI'21). 2021. [paper][preprint][preview][talk]
[2-1] Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. Future Person Localization in First-Person Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18). 2018. (Spotlight) [paper][preprint][bibtex][code]

Conference papers (non-refereed)

[3-6] Yie Qiu, Yanjun Sun, Takuma Yagi, Shusaku Egami, Natsuki Miyata, Ken Fukuda, Kensho Hara, and Ryusuke Sagawa. VideoSetBench: Identifying and Reasoning Similarities and Differences in Similar Videos. In CVPR 2025 Workshop Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2). 2025.
[3-5] Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, and Yoichi Sato. Learning Object States from Actions via Large Language Models. CVPR Workshop Learning from Procedural Videos and Language: What is Next?. 2024. [arxiv]
[3-4] Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, and Yoichi Sato. Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos. CVPR Workshop Learning from Procedural Videos and Language: What is Next?. 2024. [arxiv]
[3-3] Takuma Yagi, Misaki Ohashi, Yifei Huang, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, and Yoichi Sato. FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotations. Joint International Third Ego4D and Eleventh EPIC Workshop@CVPR23. Extended Abstracts. 2023,
[3-2] Takuma Yagi, Md. Tasnimul Hasan, and Yoichi Sato, Object Instance Identification in Dynamic Environments. In Tenth International Workshop on Egocentric Perception, Interaction and Computing (EPIC@CVPR22), Extended Abstracts. 2022. [paper][preprint][code]
[3-1] Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, and Yoichi Sato, Precise Affordance Annotation for Egocentric Action Video Datasets. In Tenth International Workshop on Egocentric Perception, Interaction and Computing (EPIC@CVPR22), Extended Abstracts. 2022. [preprint]

Domestic papers (non-refereed)

[4-25] 田中僚真, 木林佑太, 八木拓真, 片岡裕雄, 青木義満, 原健翔. 手順ラベル記述に基づく持続時間推定を用いた作業動画における手順検出, 第28回画像の認識・理解シンポジウム（MIRU2025, 一般論文）. 2025.
[4-24] 加藤義道, 舘野将寿, 原健翔, 片岡裕雄, 森島繁生, 八木拓真. 手物体の位置情報を考慮した視覚言語モデルによる微細な一人称視点HOI理解. 第28回画像の認識・理解シンポジウム（MIRU2025, 一般論文）. 2025.
[4-23] Masatoshi Tateno, Gido Kato, Kensho Hara, Hirokatsu Kataoka, Yoichi Sato, and Takuma Yagi. HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics. 第28回画像の認識・理解シンポジウム（MIRU2025, 口頭発表論文）. 2025.
[4-22] Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, and Yoichi Sato, Learning Object States from Actions via Large Language Models, 第27回画像の認識・理解シンポジウム（MIRU2024, ロングオーラル発表）. 2024.
[4-21] 八木拓真, 大橋実咲, 黄逸飛, 古田諒佑, 足達俊吾, 光山統泰, 佐藤洋一. FineBio：密な階層アノテーションを付与したバイオ実験映像データセット. 第46回日本分子生物学会年会（MBSJ2023, 口頭発表）. 2023.
[4-20] Takuma Yagi, Misaki Ohashi, Yifei Huang, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, and Yoichi Sato. FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation. 2023年日本バイオインフォマティクス学会年会第12回生命医薬情報学連合大会（IIBMP2023, ポスター発表）. 2023.
[4-19] 舘野将寿, 八木拓真, 古田諒佑, 佐藤洋一, 大規模言語モデルを用いた学習カテゴリの自動決定による映像からのオープン語彙物体状態認識, 第26回画像の認識・理解シンポジウム（MIRU2023, ロングオーラル発表）, 2023.
[4-18] 八木拓真, 西村太一, 清丸寛一, 唐井希, 大規模言語モデルからの知識抽出に基づく画像からのスクリプト予測の検討, 言語処理学会第29回年次大会（NLP2023, ポスター発表）, 2023.
[4-17] 宮本蒼太, 八木拓真, 牛久祥孝, 橋本敦史, 井上中順, 手の軌道特徴を用いた一人称視点料理動画における詳細動作認識, パターン認識・メディア理解研究会 PRMU, 2022. (PRMU月間ベストプレゼンテーション賞)
[4-16] 八木拓真, Md. Tasnimul Hasan, 佐藤洋一, 動的環境における物体インスタンス識別, 第25回画像の認識・理解シンポジウム（MIRU2022, ポスター発表）, 2022.
[4-15] Takehiko Ohkawa, Takuma Yagi, Atsushi Hashimoto, Yoshitaka Ushiku and Yoichi Sato, Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation, 第24回画像の認識・理解シンポジウム（口頭発表論文、ロング）, 2021.
[4-14] 八木拓真, Md. Tasnimul Hasan, 佐藤洋一, 誘導付き逐次ラベル訂正に基づく映像からの手-物体接触判定, 第24回画像の認識・理解シンポジウム（一般論文）, 2021. （インタラクティブ発表賞）
[4-13] 福嶋稜，八木拓真，馬場惇，岩本拓也，遠藤大介，大澤正彦，購買行動において認知的不協和を顕在化し解消を促進する窓エージェントの提案と検討，HAIシンポジウム，2021.
[4-12] 八木拓真, 西保匠，川崎邦将，松木萌，佐藤洋一，GO-Finder: 手操作物体の発見に基づく事前登録不要のウェアラブル物探し支援システム，インタラクション，2021．（プレミアム発表）
[4-11] Takehiko Ohkawa, Takuma Yagi, and Yoichi Sato, Style Adapted DataBase: Generalizing Hand Segmentation via Semantics-aware Stylization, IEICE Technical Report (PRMU2020), 2020. [manuscript]
[4-10] 八木拓真, 川崎邦将，松木萌，西保匠，佐藤洋一, 手操作物体の識別による手-物体インタラクション可視化システム, 第23回画像の認識・理解シンポジウム（一般論文）, 2020. [ポスター]
[4-9] 八木拓真, 佐藤洋一, 運動情報を用いた手およびその接触物体の弱教師ありセグメンテーション, 第23回画像の認識・理解シンポジウム（一般論文）, 2020. [ポスター・スライド]
[4-8] Donghao Wu, Takuma Yagi, Yusuke Matsui, and Yoichi Sato, Egocentric Pedestrian Motion Forecasting for Separately Modelling Pose and Location, IEICE Technical Report (PRMU2019), 2020. [manuscript]
[4-7] 八木拓真, 川崎邦将，佐藤洋一, 周辺人物位置予測を行うウェアラブルシステム, 第22回画像の認識・理解シンポジウム（デモ発表）, 2019. [ポスター]
[4-6] 八木拓真, 品川政太朗, 秋山解, 加藤大貴, 島村僚, 又吉祐, 【招待ショートサーベイ】ユーザ評価からみるHCI ～良いシステムの実現のためにCV研究者が学ぶこと～, 信学技報, vol. 118, no. 260, PRMU2018-67, pp. 1-4, 2018.
[4-5] 八木拓真, マンガラムカーティケヤ, 米谷竜, 佐藤洋一, 一人称視点映像における人物位置予測, 第21回画像の認識・理解シンポジウム（ポスター発表）, 2018. [ポスター]
[4-4] 八木拓真, マンガラムカーティケヤ, 米谷竜, 佐藤洋一, 一人称視点映像における人物位置予測, 第211回CVIM研究会, 2018.
[4-3] 大澤正彦, 川崎邦将, 八木拓真, 長田茂美, 今井倫太, 汎用人工知能研究のマイルストーンとしての擬人化キャラクター, 第6回汎用人工知能研究会, 2017. [原稿]
[4-2] 八木拓真, 人物動作系列からの「動作素」の自動抽出, 第5回サイエンス・インカレ, 2016. (口頭発表, 協力企業・団体賞受賞)
[4-1] 土井ゆりか,八木拓真,水口智仁, CNN-LSTMを用いた手話認識システムの開発, 第1回汎用人工知能研究会, 2015. [原稿]

Thesis

Takuma Yagi. Hand-Object Interaction Mining from First-Person Videos. Doctor Thesis. The University of Tokyo. Advisor: Yoichi Sato. [thesis]
Takuma Yagi. Future Person Localization in First-Person Videos. Master Thesis. The University of Tokyo. Advisor: Yoichi Sato. [pdf][slides]