google-research-datasets/clse

google-research-datasets

Fetched on 2026/03/01 20:06

The Corpus of Linguistically Significant Entities (CLSE) is a dataset of named entities annotated by linguist experts. It includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. The aim of the corpus is to facilitate the creation of more linguistically diverse NLG datasets. - View it on GitHub

Star

Rank

1721505

google-research-datasets

google-research-datasets / clse