If an AI machine or program matches or surpasses human intelligence, does that mean it can perfectly simulate humans? If so, then what about reasoning—our ability to apply logic and think logically before making decisions? How might we determine whether an AI program can reason? To try to answer this question, a team of researchers proposed a new framework that works like a psychological study for software.
“This test treats an ‘intelligent’ program as if it were participating in a psychological study and has three steps: (a) test the program in a series of experiments that test its conclusions, (b) test its understanding of its own of the mode of reasoning and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers Note.
They suggest the standard methods of assessing a machine’s intelligence, such as the Turing test, can only tell you if the machine is good at processing information and imitating man answers. Current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, they have come close to passing the Turing Test, however the test results do not imply that these programs can think and reason like humans.
This is why the Turing Test may no longer be relevant, and there is a need for new assessment methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing test. “We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason the way humans reason?” the authors of the study wrangle.
What’s up with the Turing test?
During the Turing Test, evaluators play different games that involve text-based communications with real people and artificial intelligence programs (machines or chatbots). It’s a blind test, so raters don’t know if they’re messaging a human or a chatbot. If AI programs are successful in generating human responses—to the extent that evaluators have difficulty distinguishing between human and AI programs—the AI is considered to have passed. However, since the Turing test is based on subjective interpretation, these results are also subjective.
The researchers suggest that there are several limitations associated with the Turing test. For example, any of the games played during the test are imitation games designed to test whether or not a machine can ape one human. Reviewers make decisions based solely on the language or tone of the messages they receive. ChatGPT is excellent at imitation human language, even in answers where it gives incorrect information. Thus, the test clearly does not assess the logic and reasoning ability of a machine.
Turing test results also cannot tell you whether a machine can do introspection. We often think about our past actions and reflect on our lives and decisions, a judjement which prevents us from repeating the same mistakes. The same applies to artificial intelligence, according to a study from Stanford University, which suggests that machines that could reflect on themselves are more practical for human use.
“Artificial intelligence agents that can draw on past experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from home robotics to personalized learning tools,” Nick Haber, assistant professor from the University of Stanford who did not participate. in the current study, he said.
Furthermore, the Turing Test fails to analyze the thinking ability of an AI program. In a recent Turing Test experiment, GPT-4 was able to convince raters that they were texting humans over 40 percent of the time. However, this score fails to answer the key question: Can the AI program think?
Alan Turing, the famous British scientist who created the Turing Test, once he said, “A computer would deserve to be called intelligent if it could trick a human into believing it was human.” However, his test only covers one aspect of human intelligence: imitation. Although it is possible to fool someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including these other aspects.
“It is not clear whether passing the Turing Test is a major milestone or not. It tells us nothing about what a system can do or understand, nothing about whether it has established complex internal monologues or can engage in programming over abstract time horizons, which is key to human intelligence,” Mustafa said. Suleyman, artificial intelligence expert and founder of DeepAI, he said Bloomberg.