This repository hosts the implementation and analysis details of CapImagine. It starts from systematic investigation on the internal mechanisms of latent-space visual reasoning methods through causal ...
Abstract: Recent success in legged robot locomotion is attributed to the integration of reinforcement learning and physical simulators. However, these policies often encounter challenges when deployed ...
Abstract: Accurate segmentation of 3D point clouds in indoor scenes remains a challenging task, often hindered by the labor-intensive nature of data annotation. While weakly supervised learning ...
We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving ...