We present ReconVLA, an implicit grounding paradigm for Vision-Language-Action models that reconstructs gaze regions to focus visual attention, achieving precise manipulation and strong generalization ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results