SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

Mila Quebec AI Institute, Université de Montréal
2025

SegDAC enables visual RL policies to reason over a dynamic set of object tokens instead of pixels, yielding both efficient learning and strong visual generalization.

Abstract

Visual reinforcement learning policies trained on pixel observations often struggle to generalize when visual conditions change at test time. Object-centric representations are a promising alternative, but most approaches use fixed-size slot representations, require image reconstruction, or need auxiliary losses to learn object decompositions. As a result, it remains unclear how to learn RL policies directly from object-level inputs without these constraints. We propose SegDAC, a Segmentation-Driven Actor-Critic that operates on a variable-length set of object token embeddings. At each timestep, text-grounded segmentation produces object masks from which spatially aware token embeddings are extracted. A transformer-based actor-critic processes these dynamic tokens, using segment positional encoding to preserve spatial information across objects. We ablate these design choices and show that both segment positional encoding and variable-length processing are individually necessary for strong performance. We evaluate SegDAC on 8 ManiSkill3 manipulation tasks under 12 visual perturbation types across 3 difficulty levels. SegDAC improves over prior visual generalization methods by 15% on easy, 66% on medium, and 88% on the hardest settings. SegDAC matches the sample efficiency of the state-of-the-art visual RL methods while achieving improved generalization under visual changes.

BibTeX

Please use the following BibTeX entry to cite this work:
@misc{brown2026segdac,
      title={SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens}, 
      author={Alexandre Brown and Glen Berseth},
      year={2026},
      eprint={2508.09325},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.09325},
}