Automatic multi-metric evaluation of human-bot dialogues using LLMs (Claude, GPT-4o) across different datasets and settings. Built for the Artificial Intelligence course at the University of Salerno. - View it on GitHub
Star
0
Rank
13855001