Contribute
Register

Geekbench 6 released

Interesting to see the big bump of GB6 single score for AppleSi single over Intel where the was pretty much score parity in GB5, as this the most criritcal category for marketing.
The new single-core baseline score is 2500 based on Intel Core i7-12700. M1 and M2 are still close to this.
 
The new single-core baseline score is 2500 based on Intel Core i7-12700. M1 and M2 are still close to this.
Is there a 12th+ generation and AppleSi lift?

For example, let's say 11th gen and previous has a weak or non-existent AI accelerator (I don't actually know, but imagine so for the sake of analysis), while 12th+AppleSi have substantial AI accelerators, while GB6 places much more significance on contribution of AI metric to score. Then any device prior to AI would be depressed in the rankings.

Continue the thought experiment: Let's say AppleSi has a special unit, like ProRes accelerator that others don't, and that GB evolves to place significance of ProRes transcode on score.

Is the inclusion of metrics informed by accelerators a peculiar form of multicore? Does it deserve its own category?

Now that GPU is considered a critical accelerator to all designs, and integrates into all devices, should the GPU score as a separate metric be integrated into the mainline score, or should GB not only still feature separate GPU score, with selectable rendering libraries (OpenCL vs Metal) but also feature separate score for other accelerators?

One way to approach would be to decide that GB is an Apple device metric, which happens to allow comparison of Windows, Linux, Android devices with Apple?

Let's say GB then includes a "Apple Continuity" metric to the main score, because all Apple devices now support this feature...

If Geekbench is evolving to become even geekier, that's great, but Dear PrimateLabs, please expose the orientation and details.
 
Comparison between my Sig build, Ventura 13.2.1 / Windows11 Pro and my M1 MBA. Single Core and Metal numbers have improved dramatically!, my biggest hit is Ventura Multicore score, ouch!.

Screenshot_20230215_015805.png


Screenshot 2023-02-15 at 9.11.18 AM.png


Screenshot 2023-02-15 at 9.12.54 AM.png


Screenshot 2023-02-15 at 9.02.00 AM.png


Screenshot 2023-02-15 at 9.03.40 AM.png
 
Last edited:
compared to Geekbench 5 scores, new CPU scores are around 10% higher

GPU metal score jumped from. 90000 to 113000.
 
Comparison between my Sig build, Ventura 13.2.1 / Windows11 Pro and my M1 MBA. Single Core and Metal numbers have improved dramatically!, my biggest hit is Ventura Multicore score, ouch!.

It's the scoring that's changed not your kit!

Your PC is doing the same thing the same ways it was before.

Now you are comparing your system with itself and it's coming up wanting!

The first point of benchmarks is to allow comparison of systems. If the benchmark changes, maybe for very good reasons, then scores with previous benchmark are not comparable!

Benchmarks can also assess suitability for a purpose, but with PCs it's always about bragging rights.

For example, what does it mean to for a consumer to benchmark a h265 codec in the video accelerator? The accelerator was designed to operate at a certain resolution of a certain color format at a certain frame-rate that corresponds to video industry standards for content. You can test to see if it meets that standard.

Geekbench is not helping us understand performance any better and they're changing the metrics in a way that confuses and obfuscates technical details.

So what if GB6 weights video or ML workloads a little more in its scores?

If you run Zoom you expect it to work and it does because it follows some standards. Your meeting isn't going to go better if the video accelerator can push 15% more FPS. You will wait for an installer to unzip, regardless as to whether it goes 15% faster or slower; it doesn't make any difference. So the choices of weightings in the benchmark are somewhat arbitrary.

When you get an Apple device, you expect it to do Apple stuff the Apple way (unless you're a hakkintosher) where part of the Apple way is they keep telling you what's possible in ways you would have never thought of on your own. What's a benchmark when the experience is magic?

The only thing that's changed with GB6 is Geekbench! Your kit is doing exactly what it was doing before. If that wasn't good enough it still isn't good enough.

What's weird is to think that if GB just gives you a different number you kit becomes no longer good enough... Which in a world of devices for fashionable consumption can be a very lucrative event for the purveyors.
 
compared to Geekbench 5 scores, new CPU scores are around 10% higher

GPU metal score jumped from. 90000 to 113000.
The newer the higher, the older the lower.
 
My takes on this ArsT article...

—Beware of my troll—

• Poole on benchmarking: How hard can it be? I saw others used simple tests, so I made some of my own.

• Poole on understanding computer behavior: "Reverse engineering" is a synonym for I examined what it did.

• Poole on evolution of metrics: Recently, I do a lot of vid conferencing. Blurring backgrounds become a distinct device-defining workload. Fortunately, I continue to include a Gaussian Blur metric in the benchmark.

• Poole on users: As a group, they are easily confused and don't know what's in a CPU or how it works. They think the "crypto" metric is about cryptocurrency. So GB6 removes the "crypto" metric. And also removes or renames "integer" and other metrics. Just for fun GB6 adds a metric called "camera" which actually has nothing to do with the device camera. "Ooops!"

• Let it be known to all that there are some variables with Multiprocessing performance...

• Poole: The intent of has always been to test the system as a whole, but GB must be kept simple to run, therefore it's not testing the system as a whole. ArsT: We can confirm, GB is not great at testing the system as a whole, but it helps people get a quick sense of the system as a whole.
—If there's one thing computers are not noted for, it's logically combining many sequential small steps into a seemingly concurrent, organized abstract whole that hides all the processing details. But maybe someday this will be figured out.

• In 2005 Poole wondered why does a newer more-advanced consumer product work worse than an older one?
—According to results posted here, GB6 vs GB5 single core scores on AppleSi jumped from 1800 to 2400; 33%. 18 years of work and problem finally solved.

• Poole: My friend said his mac felt slow so he benchmarked it and Apple fixed it.
—I love this story!!

• ArsT: "...Tiny blips of performance..."



Easy to overlook in the article, Ars includes links to GB workload summaries:


 
Back
Top