ArXiv

Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

Authors
Kaidi Jia, Yujie Lin, Chengyi Yang...
Categories
cs.CV
arXiv
https://arxiv.org/abs/2605.08031v1
PDF
https://arxiv.org/pdf/2605.08031v1

Brief

HFRU is a reinforcement unlearning method for VLMs that removes sensitive visual knowledge by modifying the vision encoder through a two-stage process: alignment disruption followed by GRPO optimization with a composite reward (including an abstraction reward to avoid object-hallucination). On object-recognition and face-identity tests reported in the abstract, it achieves >98% forgetting and retention with negligible hallucination. Summary based on abstract; full paper not reviewed.

Why it matters

HFRU (Object Hallucination-Free Reinforcement Unlearning) removes sensitive knowledge by operating on the vision encoder (not the language decoder) via a two-stage pipeline: alignment disruption followed by GRPO-based optimization with a composite reward that includes an 'abstraction reward' to reduce hallucinations.

Key details

  • On object-recognition and face-identity benchmarks reported in the abstract, HFRU achieves over 98% forgetting while preserving over 98% retention and produces negligible object hallucination, substantially outperforming prior decoder-fine-tuning methods; code: https://github.com/XMUDeepLIT/HFRU. Published 2026-05-08.
Source evidence

Abstract

Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines alignment disruption with GRPO-based optimization using a composite reward, including an abstraction reward that encourages semantically valid substitutions and mitigates hallucinations. Experiments on object recognition and face identity tasks show that HFRU achieves over 98% forgetting and retention performance, while introducing negligible object hallucination, significantly outperforming prior methods.Our code and implementation details are available at https://github.com/XMUDeepLIT/HFRU.