How to Make a Good Built-in Game Benchmark

My job consists largely of running gaming benchmarks and then writing about the results. When I look in the benchmark results folders of some of the games I’ve used a lot, finding over 2,000 individual test results causes me to reflect. But what hurts more than realizing that I’ve spent over 200 hours in Shadow of the Tomb Raider — most of that time being used in running benchmarks — is knowing just how pointless a lot of that testing was.

This article is less for our normal readers, though hopefully some of you will appreciate my ranting, and more for the game developers. Creating a built-in benchmark takes time and effort, but even after all that effort, some games just cause the people who are most likely to use the feature — people like me — extra pain for no good reason. So, here are the things I wish built-in benchmarks would get right, along with call-outs to some particularly good examples, as well as a public shaming of some egregious games.

1 – Repeatability Is Critical

The first requirement of any good benchmark needs to be repeatability. If I run a benchmark five times in a row, using the same settings, the results should all fall in a narrow range. I always toss the first result since the graphics card hasn’t warmed up yet, game data files might not be cached, and there’s usually a lot more variability on that first run. The second, third, etc., runs, however, shouldn’t have more than about a 1% spread. Unfortunately, that’s often not the case.

Take Dirt 5 as an example. All of the cars are simulated in the benchmark, and the weather seems to vary a bit as well. That means sometimes you get a “perfect” run, and performance might be 5% faster than a “bad” run. That means repeating the test more times to ensure the results represent what the hardware can do, rather than showcasing whether a particular card got lucky.

Assassin’s Creed Odyssey is another terrible example, or at least it was at launch. Weather effects like rain and heavy clouds could drop performance by as much as 20%. The potential for rain was later removed from the benchmark, but the cloud cover still impacts framerates. A clear, sunny day can perform over 10% better than a heavily clouded day.

It’s not just weather and time of day effects, though. Gears 5 has an otherwise great built-in benchmark (the Microsoft Store notwithstanding), but it has issues with framerate caps at the beginning of the test about every second or third run. I’m not sure if that was ever fixed, but it was possible to get a 60 FPS cap for the first 10 seconds or so on some runs, which could skew performance downward by 15% or more.

So, skip the randomness for any built-in benchmark, even if that’s normally a big part of an actual game. The same cars should win every race, in the same order, like in Forza Horizon 4. The same people should get shot, in the same way, every time a test gets run. I don’t want extraneous people or effects that only show up a third of the time, periodic rainfall, etc. Just make it consistent, please.

2 – Don’t Waste Time

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button