Hardcoded Subtitle Benchmark

Benchmarking Large Language Models on their ability to extract hardcoded subtitles with word and format accuracy. We evaluate how well models preserve line breaks, special characters, and original formatting structure.

Leaderboard

Performance Chart
Overall accuracy across 46 models

Filter by Model Family

Model Rankings
Detailed breakdown by category
RankModelOverall
1
Qwen 3 VL 8B Instruct4 variants
100%
2
google/gemini-3-pro-preview
100%
3
anthropic/claude-opus-4.5
100%
4
qwen/qwen-vl-max
98.46%
5
anthropic/claude-sonnet-4.5
98.46%
6
openai/gpt-5-mini
98.46%
7
openai/gpt-5.1-codex-mini
96.92%
8
perplexity/sonar
96.92%
9
openai/gpt-5.1
95.38%
10
deepcogito/cogito-v2-preview-llama-109b-moe
95.38%
11
qwen3-vl-32b-instruct
95.38%
12
qwen/qwen3-vl-235b-a22b-instruct
93.94%
13
openai/gpt-5.2
93.85%
14
anthropic/claude-haiku-4.5
93.85%
15
qwen3-vl-4b-instruct
93.85%
16
google/gemini-2.5-flash
92.42%
17
z-ai/glm-4.5v
92.31%
18
z-ai/glm-4.6v
92.31%
19
openai/gpt-5.2-chat
92.31%
20
google/gemini-2.0-flash-001
90.77%
21
meta-llama/llama-4-maverick
90.77%
22
qwen/qwen3-vl-8b-thinking
90.77%
23
mistralai/ministral-14b-2512
89.23%
24
openai/gpt-5.1-codex-max
89.23%
25
google/gemini-2.5-flash-lite
87.69%
26
Qwen 3 VL 30B Instruct2 variants
87.69%
27
qwen/qwen2.5-vl-32b-instruct
87.69%
28
qwen3-vl-2b-instruct
84.62%
29
google/gemma-3-27b-it
84.62%
30
mistralai/pixtral-large-2411
83.08%
31
openai/gpt-5-nano
83.08%
32
Ministral 3B2 variants
81.54%
33
mistralai/mistral-large-2512
81.54%
34
qwen/qwen3-vl-30b
80.3%
35
allenai/olmocr-2-7b
78.46%
36
nvidia/nemotron-nano-12b-v2-vl:free
78.46%
37
camel-doc-ocr-080125
76.92%
38
gliese-ocr-7b-post2.0-final
73.85%
39
chandra-ocr
73.85%
40
qwen3-visioncaption-2b
72.31%
41
baidu/ernie-4.5-vl-28b-a3b
69.23%
42
nanonets-ocr2-3b-aio
69.23%
43
google/gemma-3-4b-it
66.15%
44
x-ai/grok-4.1-fast
46.15%
45
tencent/HunyuanOCR
33.85%
46
ln
33.85%

The Hardcoded Subtitle Benchmark tests LLMs on their ability to extract text exactly as presented, including formatting, line breaks, and special characters. Learn more about why this benchmark matters

Samples

View example outputs for each category

Formatting

Line Breaks

Sample 1
Model Output:
Wie wäre es dann, wenn ich dir eine Woche
das Essen für die Pause mitbringe?
Sample 2
Model Output:
Ich sehe deine Welt durch Glas...
Sample 3
Model Output:
Licht blitzt auf. Ein Signal anzufangen?

Special Characters