google-research-datasets/Hinglish-TOP-Dataset

google-research-datasets

Fetched on 2026/03/01 20:06

Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentation technique. Queries are derived from TOPv2, a multi-domain task oriented semantic parsing dataset. Tests suggest that with CST5, up to 20x less labeled data can achieve the same semantic parsing performance. - View it on GitHub

Star

Rank

605366

google-research-datasets

google-research-datasets / Hinglish-TOP-Dataset