Public GPU and NPU uploads

Find the fastest token generation for local LLMs

See which GPUs and NPUs deliver the highest token generation speed after the first token when streaming output and response speed matter most.

NVIDIA RTX 5090 32 GB
264 tok/s
AMD RX 7900 XTX 24 GB
191 tok/s
NVIDIA RTX 4090 24 GB
188 tok/s
NVIDIA RTX 5080 16 GB
185 tok/s
NVIDIA RTX 3090 24 GB
160 tok/s
NVIDIA RTX 4080 SUPER 16 GB
147 tok/s
NVIDIA RTX 3080 12 GB 12 GB
139 tok/s
NVIDIA RTX 3080 10 GB 10 GB
139 tok/s
AMD RX 9070 XT 16 GB
137 tok/s
NVIDIA RTX 5070 Ti 16 GB
136 tok/s
NVIDIA RTX 4070 Ti SUPER 16 GB
129 tok/s
AMD RX 7900 XT 20 GB
123 tok/s
AMD RX 9070 16 GB
120 tok/s
AMD RX 7800 XT 16 GB
118 tok/s
AMD RX 7900 GRE 16 GB
116 tok/s
NVIDIA RTX 4070 Ti 12 GB
111 tok/s
AMD RX 6900 XT 16 GB
108 tok/s
AMD RX 6800 XT 16 GB
100 tok/s
AMD RX 6800 16 GB
96 tok/s
Apple M2 Ultra 76-core GPU 64 GB
94 tok/s
Apple M2 Ultra 76-core GPU 128 GB
94 tok/s
Apple M2 Ultra 76-core GPU 192 GB
94 tok/s
NVIDIA RTX 5060 Ti 16 GB 16 GB
94 tok/s
NVIDIA RTX 5060 Ti 8 GB 8 GB
94 tok/s
NVIDIA RTX 4070 12 GB
92 tok/s
Apple M3 Ultra 80-core GPU 256 GB
92 tok/s
Apple M3 Ultra 80-core GPU 512 GB
92 tok/s
Apple M2 Ultra 60-core GPU 64 GB
89 tok/s
Apple M2 Ultra 60-core GPU 128 GB
89 tok/s
Apple M2 Ultra 60-core GPU 192 GB
89 tok/s
Apple M3 Ultra 60-core GPU 96 GB
88 tok/s
AMD RX 6700 XT 12 GB
84 tok/s
Apple M1 Ultra 64-core GPU 64 GB
84 tok/s
Apple M1 Ultra 64-core GPU 128 GB
84 tok/s
Apple M4 Max 40-core GPU 48 GB
83 tok/s
Apple M4 Max 40-core GPU 64 GB
83 tok/s
Apple M4 Max 40-core GPU 128 GB
83 tok/s
AMD RX 6750 XT 12 GB
82 tok/s
NVIDIA RTX 3070 8 GB
79 tok/s
NVIDIA RTX 3060 12 GB 12 GB
76 tok/s
NVIDIA RTX 3060 8 GB 8 GB
76 tok/s
Apple M1 Ultra 48-core GPU 64 GB
75 tok/s
Apple M1 Ultra 48-core GPU 128 GB
75 tok/s
AMD RX 9060 XT 8 GB 8 GB
71 tok/s
AMD RX 9060 XT 16 GB 16 GB
71 tok/s
Intel Arc B580 12 GB
70 tok/s
Apple M4 Max 32-core GPU 36 GB
70 tok/s
Apple M3 Max 40-core GPU 48 GB
66 tok/s
Apple M3 Max 40-core GPU 64 GB
66 tok/s
Apple M3 Max 40-core GPU 128 GB
66 tok/s
Apple M2 Max 38-core GPU 32 GB
66 tok/s
Apple M2 Max 38-core GPU 64 GB
66 tok/s
Apple M2 Max 38-core GPU 96 GB
66 tok/s
AMD RX 6650 XT 8 GB
62 tok/s
Apple M1 Max 32-core GPU 32 GB
61 tok/s
Apple M1 Max 32-core GPU 64 GB
61 tok/s
Apple M2 Max 30-core GPU 32 GB
61 tok/s
Apple M2 Max 30-core GPU 64 GB
61 tok/s
Apple M3 Max 30-core GPU 36 GB
57 tok/s
Apple M3 Max 30-core GPU 96 GB
57 tok/s
Apple M1 Max 24-core GPU 32 GB
55 tok/s
Apple M1 Max 24-core GPU 64 GB
55 tok/s
AMD RX 6600 XT 8 GB
54 tok/s
AMD RX 7600 XT 16 GB
53 tok/s
Intel Arc A770 8 GB 8 GB
53 tok/s
Intel Arc A770 16 GB 16 GB
53 tok/s
Apple M4 Pro 20-core GPU 24 GB
51 tok/s
Apple M4 Pro 20-core GPU 48 GB
51 tok/s
Apple M4 Pro 20-core GPU 64 GB
51 tok/s
AMD RX 6600 8 GB
51 tok/s
Apple M4 Pro 16-core GPU 24 GB
50 tok/s
Apple M4 Pro 16-core GPU 48 GB
50 tok/s
Intel Arc B570 10 GB
50 tok/s
Intel Arc A750 8 GB
44 tok/s
Apple M2 Pro 19-core GPU 16 GB
39 tok/s
Apple M2 Pro 19-core GPU 32 GB
39 tok/s
Apple M2 Pro 16-core GPU 32 GB
38 tok/s
Apple M2 Pro 16-core GPU 16 GB
38 tok/s
Apple M1 Pro 16-core GPU 16 GB
36 tok/s
Apple M1 Pro 16-core GPU 32 GB
36 tok/s
Apple M1 Pro 14-core GPU 16 GB
36 tok/s
Apple M1 Pro 14-core GPU 32 GB
36 tok/s
Apple M3 Pro 18-core GPU 18 GB
31 tok/s
Apple M3 Pro 18-core GPU 36 GB
31 tok/s
Apple M3 Pro 14-core GPU 18 GB
31 tok/s
Apple M3 Pro 14-core GPU 36 GB
31 tok/s
AMD RX 6500 XT 4 GB
28 tok/s
Apple M4 10-core GPU 16 GB
24 tok/s
Apple M4 10-core GPU 24 GB
24 tok/s
Apple M4 10-core GPU 32 GB
24 tok/s
Apple M2 10-core GPU 8 GB
22 tok/s
Apple M2 10-core GPU 16 GB
22 tok/s
Apple M2 10-core GPU 24 GB
22 tok/s
Apple M3 10-core GPU 8 GB
21 tok/s
Apple M3 10-core GPU 16 GB
21 tok/s
Apple M3 10-core GPU 24 GB
21 tok/s
Apple M1 7-core GPU 8 GB
14 tok/s
Apple M1 7-core GPU 16 GB
14 tok/s
Apple M1 8-core GPU 8 GB
14 tok/s
Apple M1 8-core GPU 16 GB
14 tok/s
AMD RX 6400 4 GB
AMD RX 6700 10 GB
AMD RX 6750 GRE 12 GB 12 GB
AMD RX 6750 GRE 10 GB 10 GB
AMD RX 6950 XT 16 GB
AMD RX 7600 8 GB
AMD RX 7700 XT 12 GB
AMD RX 9060 8 GB
Apple M1 7-core GPU 8 GB
Apple M2 8-core GPU 8 GB
Apple M2 8-core GPU 16 GB
Apple M2 8-core GPU 24 GB
Apple M3 8-core GPU 8 GB
Apple M3 8-core GPU 16 GB
Apple M3 8-core GPU 24 GB
Apple M4 8-core GPU 16 GB
Apple M4 8-core GPU 24 GB
Apple M4 8-core GPU 32 GB
Apple M4 9-core GPU 12 GB
Apple M5 8-core GPU 16 GB
Apple M5 8-core GPU 24 GB
Apple M5 8-core GPU 32 GB
Apple M5 10-core GPU 16 GB
Apple M5 10-core GPU 24 GB
Apple M5 10-core GPU 32 GB
Apple M5 Max 32-core GPU 36 GB
Apple M5 Max 40-core GPU 48 GB
Apple M5 Max 40-core GPU 64 GB
Apple M5 Max 40-core GPU 128 GB
Apple M5 Pro 20-core GPU 24 GB
Apple M5 Pro 20-core GPU 48 GB
Apple M5 Pro 20-core GPU 64 GB
Intel Arc A310 4 GB
Intel Arc A380 6 GB
Intel Arc A580 8 GB
NVIDIA RTX 3050 6 GB 6 GB
NVIDIA RTX 3050 8 GB 8 GB
NVIDIA RTX 3060 Ti 448 GB/s 8 GB
NVIDIA RTX 3060 Ti 608 GB/s 8 GB
NVIDIA RTX 3070 Ti 8 GB
NVIDIA RTX 3080 Ti 12 GB
NVIDIA RTX 3090 Ti 24 GB
NVIDIA RTX 4060 8 GB
NVIDIA RTX 4060 Ti 16 GB 16 GB
NVIDIA RTX 4060 Ti 8 GB 8 GB
NVIDIA RTX 4070 SUPER 12 GB
NVIDIA RTX 4080 16 GB
NVIDIA RTX 5050 8 GB
NVIDIA RTX 5060 8 GB
NVIDIA RTX 5070 12 GB