InFormal is a formality style transfer dataset for four Indic Languages. The dataset is made up of a pair of sentences and corresponding human-annotated labels identifying the more formal sentence as well the pair’s semantic similarity. This dataset can be used as an evaluation set for style transfer tasks in Indic Languages. InFormal contains sentence pairs from 4 Indic Languages - Hindi, Telugu, Kannada and Bengali. The original source of this dataset is the open-source Samanantar dataset, which was then annotated on Task Mate using native speakers. -
View it on GitHub