The Hidden Weakness in AI Safety Tests

New research reveals fundamental limitations in evaluating AI systems, showing that even rigorous testing can fail when subtle differences exist between training and real-world conditions.