Leaderboard
Performance Chart
Overall accuracy across 46 models
Filter by Model Family
Model Rankings
Detailed breakdown by category
| Rank | Model | Overall |
|---|---|---|
| 1 | 100% | |
| 2 | 100% | |
| 3 | 100% | |
| 4 | 98.46% | |
| 5 | 98.46% | |
| 6 | 98.46% | |
| 7 | 96.92% | |
| 8 | 96.92% | |
| 9 | 95.38% | |
| 10 | 95.38% | |
| 11 | 95.38% | |
| 12 | 93.94% | |
| 13 | 93.85% | |
| 14 | 93.85% | |
| 15 | 93.85% | |
| 16 | 92.42% | |
| 17 | 92.31% | |
| 18 | 92.31% | |
| 19 | 92.31% | |
| 20 | 90.77% | |
| 21 | 90.77% | |
| 22 | 90.77% | |
| 23 | 89.23% | |
| 24 | 89.23% | |
| 25 | 87.69% | |
| 26 | 87.69% | |
| 27 | 87.69% | |
| 28 | 84.62% | |
| 29 | 84.62% | |
| 30 | 83.08% | |
| 31 | 83.08% | |
| 32 | 81.54% | |
| 33 | 81.54% | |
| 34 | 80.3% | |
| 35 | allenai/olmocr-2-7b | 78.46% |
| 36 | 78.46% | |
| 37 | camel-doc-ocr-080125 | 76.92% |
| 38 | gliese-ocr-7b-post2.0-final | 73.85% |
| 39 | chandra-ocr | 73.85% |
| 40 | 72.31% | |
| 41 | 69.23% | |
| 42 | nanonets-ocr2-3b-aio | 69.23% |
| 43 | 66.15% | |
| 44 | 46.15% | |
| 45 | 33.85% | |
| 46 | ln | 33.85% |
The Hardcoded Subtitle Benchmark tests LLMs on their ability to extract text exactly as presented, including formatting, line breaks, and special characters. Learn more about why this benchmark matters
Samples
View example outputs for each category
Formatting
Line Breaks

Model Output:
Wie wäre es dann, wenn ich dir eine Woche das Essen für die Pause mitbringe?

Model Output:
Ich sehe deine Welt durch Glas...

Model Output:
Licht blitzt auf. Ein Signal anzufangen?