Teaching Metric Distance to Autoregressive Multimodal Foundational Models