Two people engaged in a discussion over research data, analyzing graphs on a laptop and surrounded by notes.

Google DeepMind reveals cognitive framework to finally measure AGI

Google DeepMind unveiled a comprehensive framework Monday to measure progress toward Artificial General Intelligence, breaking intelligence into 10 core cognitive abilities and launching a $200,000 Kaggle competition to develop new AI benchmarks. The initiative, running through April 16, invites researchers worldwide to create evaluation tools for underassessed areas like metacognition and social cognition, marking a shift from task-based to theory-driven AI assessment.

The framework represents a fundamental departure from existing AI benchmarks like MMLU, BIG-bench, and HELM by grounding evaluation in formal cognitive science rather than collecting vast arrays of tasks, according to Google DeepMind. The new approach introduces a three-stage evaluation protocol that measures whether AI systems exhibit human-like problem-solving patterns, match average human capabilities, and eventually surpass top human experts in specific domains.

Breaking Down Intelligence Into Core Abilities

The taxonomy identifies 10 fundamental cognitive abilities essential for AGI, ranging from basic perception and attention to complex metacognition and social cognition. According to the framework published on Google’s blog, these include perception for processing sensory information, attention for filtering distractions, memory for information storage and retrieval, and learning for acquiring new knowledge.


Higher-level abilities encompass executive functions for planning and decision-making, reasoning for logical thinking and problem-solving, and metacognition for awareness of one’s own thought processes. The framework also recognizes language understanding, action for interacting with physical or virtual worlds, and social cognition for understanding other agents.


The Kaggle hackathon, running from March 17 to April 16, 2026, specifically targets five abilities with the largest evaluation gaps: learning, metacognition, attention, executive functions, and social cognition. Hosted on Kaggle’s Community Benchmarks platform, the competition offers researchers worldwide the chance to develop new assessment tools that will be integrated into DeepMind’s evaluation suite.

Critical Gaps and Industry Response

Two people engaged in a discussion over research data, analyzing graphs on a laptop and surrounded by notes.

Despite the comprehensive approach, the framework notably lacks measures to prevent benchmark gaming, where models optimize for metrics without genuine capability improvement. The initial announcement, according to DeepMind’s documentation, focuses exclusively on measuring AI capabilities with no discussion of evaluating AI safety or alignment with human values.


Early reactions from within Google’s ecosystem and the broader AI community have been positive, with endorsements appearing on LinkedIn from Isabelle Hau and Erin Mote. However, prominent researchers from competing AI labs including OpenAI and Anthropic have not yet offered public commentary on the framework’s methodology or potential limitations.


The absence of a governance model for maintaining and updating benchmarks raises questions about long-term viability as AI technology advances, potentially rendering static benchmarks obsolete within months of deployment.

Sources

  • blog.google
  • kaggle.com