Alibaba Group’s machine-learning technology is better at reading comprehension than humans, according to a well-known test built for the industry by Microsoft
The Alibaba model topped human scores when tested by the Microsoft Machine Reading Comprehension dataset, one of the artificial-intelligence world’s most challenging tests for reading comprehension.
Developed by scientists at DAMO Academy, Alibaba’s global research program, the model scored 0.54 in the MS Marco question-answering task, which evaluates a machine’s ability to use natural language – the way humans communicate – to answer real questions posed by humans. That topped the human score of 0.539, a benchmark provided by Microsoft.
To earn a winning score, machine-learning models must deliver answers to real queries posed to Microsoft’s search engine, Bing – such as “biggest cities in Illinois by population” and “how many carbohydrates in asparagus” – that best match the human answers in the dataset. Per its website, the MS Marco dataset has a collection of more than three million web documents, about 1,010,916 anonymized user queries and 182,669 real answers written by humans.