An approach to building pure vision foundation models by prompting masked predictors with "counterfactual" visual inputs. - View it on GitHub
Star
25
Rank
772001