I’d put money on humans scoring even less on subjects they’ve never heard of.
They are testing is the ability to reason. The AI, or human, can still use the internet to find out the answer. Here's a sample question that illustrates the distinction.
Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.
I think the easiest way to explain this, is to say they are testing the ability to reason your way to an answer, to a question so unique, that it doesn't exist anywhere on the internet.