We characterize representations that are ``optimal'' -- in terms of test loss -- for a given functional family (e.g. 2 layer MLP) by proposing notions of sufficiency (being able to predict the labels) and minimality (not being able to distinguish between examples with the same labels) with respect to that functional family. -
View it on GitHub