lanl/lost-ocr - Gitstar Ranking

lanl

Fetched on 2026/03/01 22:05

Code for ‘Lost in OCR Translation?’: robust document retrieval under degradation. Compares OCR-based, vision-only, and hybrid pipelines; includes SambaNova LLaMA Vision OCR, Nougat, and ViDoRe baselines. Provides QA data generation, RAG evaluation, and metrics (Levenshtein, nDCG@k, Recall@k, EM/F1) with reproducible scripts. Includes dataset guides - View it on GitHub

Star

Rank

4256943

lanl

lanl / lost-ocr