AI Needs Better Tests: Melanie Mitchell at NeurIPS
Key Takeaways from teh Interview with Melanie Mitchell on AI Evaluation & Insights from Psychology
Here’s a breakdown of the key points from the interview, focusing on what developmental and comparative psychologists can teach AI researchers:
1. the Need for Rigorous Experimental Methodology in AI:
* AI researchers, particularly those from computer science backgrounds, often lack formal training in experimental methodology.
* Evaluating AI systems requires robust experimentation,not just demonstrating successes.
2. Lessons from Developmental & Comparative Psychology:
* Dealing with Non-Verbal Agents: These fields are experts at probing cognition in beings who can’t verbally explain their reasoning (like animals and babies).This is directly applicable to AI, where understanding how a system arrives at a conclusion is crucial.
* Careful Control Experiments: Psychologists emphasize meticulously designed control experiments and variations in stimuli to ensure results are robust and not due to unintended cues.
* Focus on Failure Modes: Analyzing why a system fails can be more insightful than celebrating successes. Failures reveal underlying limitations and biases.
* Skepticism & Alternative Explanations: A core principle is to be skeptical of initial hypotheses - even your own – and actively seek alternative explanations for observed behavior.
3. Concrete Examples:
* Clever Hans the Horse: This classic case demonstrates the importance of controlling for unintended cues. The horse wasn’t doing arithmetic; it was reading subtle facial expressions from the questioner. This highlights the need to rule out simpler explanations before attributing complex cognitive abilities.
* Babies & Moral Sense: Initial research suggested babies have an innate preference for “helpers” over “hinders.” However, further examination revealed the videos themselves contained cues (e.g., movement patterns) that influenced the babies’ preferences, not a genuine moral judgment. This illustrates the importance of carefully scrutinizing the stimuli used in experiments.
In essence, the interview argues that AI researchers need to adopt a more critical and experimentally rigorous approach to evaluation, drawing on the well-established methodologies of psychology to avoid misinterpreting AI behavior and making unwarranted claims about intelligence.
