The first interactive EM-NLQ framework — bringing human feedback into visual memory search
University of Utah
Abstract
Current Episodic Memory with Natural Language Query (EM-NLQ) models ignore a critical real-world factor: interactivity. Users naturally refine queries and provide feedback when results are off — yet no existing method can leverage this. We introduce ReFocus, the first interactive EM-NLQ framework, built around a plug-and-play Feedback ALignment Module (FALM) that enables any base model to incorporate user feedback iteratively. We also present the EM-QnF task and dataset for feedback-driven interaction, with a lightweight training scheme requiring no sequential optimization. ReFocus achieves state-of-the-art results on three benchmarks and strong gains in human-based evaluation.
Citation
If you find this work useful in your research, please consider citing:
@inproceedings{subedi2026refocus, title = {Interactive Episodic Memory with User Feedback}, author = {Subedi, Nikesh and Bazzani, Loris and Al-Halah, Ziad}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, }