CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM

arXiv:2601.03956v1 Announce Type: new Abstract: Recent Vision-Language Models (VLMs) have demonstrated significant potential in robotic planning. However, they typically function as semantic reasoners, lacking an intrinsic understanding of the specific robot's physical capabilities. This limitation…