Recent findings from a new peer-reviewed study have sparked debate across the machine learning community, raising important questions about current evaluation practices and the potential for inflated ...