While the existing benchmark is certainly detailed and extensive, it is neither easy to use nor semantic in nature.