---
abstract: |
  Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do everything a human can do, but are humans truly general? In this paper, we address what's wrong with our conception of AGI, and why, even in its most coherent formulation, it is a flawed concept to describe the future of AI. We explore whether the most widely accepted definitions are plausible, useful, and truly general. We argue that AI must embrace specialization, rather than strive for generality, and in its specialization strive for superhuman performance, and introduce Superhuman Adaptable Intelligence (SAI). SAI is defined as intelligence that can learn to exceed humans at anything important that we can do, and that can fill in the skill gaps where humans are incapable. We then lay out how SAI can help hone a discussion around AI that was blurred by an overloaded definition of AGI, and extrapolate the implications of using it as a guide for the future.
bibliography:
- bibliography.bib
---

\newcommand{\theHalgorithm}{\arabic{algorithm}}
\newcommand{\ToDo}[1]{\textcolor{red}{\textbf{<ToDo: #1>}}}
\twocolumn[
  \icmltitle{AI Must Embrace Specialization via Superhuman Adaptable Intelligence}

  % It is OKAY to include author information, even for blind submissions: the
  % style file will automatically remove it for you unless you've provided
  % the [accepted] option to the icml2026 package.

  % List of affiliations: The first argument should be a (short) identifier you
  % will use later to specify author affiliations Academic affiliations
  % should list Department, University, City, Region, Country Industry
  % affiliations should list Company, City, Region, Country

  % You can specify symbols, otherwise they are numbered in order. Ideally, you
  % should not use this facility. Affiliations will be numbered in order of
  % appearance and this is the preferred way.
  \icmlsetsymbol{equal}{*}
\begin{icmlauthorlist}
  \icmlauthor{Judah Goldfeder}{equal,col}
  \icmlauthor{Philippe Wyder}{equal,distyl}
  \icmlauthor{Yann LeCun}{nyu}
  \icmlauthor{Ravid Shwartz-Ziv}{nyu}
\end{icmlauthorlist}

\icmlaffiliation{col}{Columbia University, New York, NY, USA}
\icmlaffiliation{distyl}{Distyl, New York, NY, USA}
\icmlaffiliation{nyu}{New York University, New York, NY, USA}

\icmlcorrespondingauthor{Judah Goldfeder}{jag2396@columbia.edu}

  % You may provide any keywords that you find helpful for describing your
  % paper; these are used to populate the "keywords" metadata in the PDF but
  % will not be shown in the document
  \icmlkeywords{Machine Learning, ICML}

  \vskip 0.3in
]

`\printAffiliationsAndNotice{}`{=latex}

# Introduction

The AI community has become increasingly fractured over where the field is headed. On one side are "doomers," who argue we are headed towards a gruesome societal endgame---mass unemployment, loss of human agency, and a future in which humanity becomes subordinate to artificial overlords. On the other side are those who expect advanced artificial intelligence to bring something close to utopia, ending hunger, suffering, and scarcity. A third camp frames AI as a "normal technology," forecasting major impacts but rejecting extreme narratives [@narayanan2025ai].

Central to all of these views is the concept of Artificial General Intelligence or AGI. Yet, as is often the case in widely public debates, much of the disagreement stems less from evidence than from terminology: AGI is invoked constantly, but rarely defined precisely, and the resulting ambiguity has made the debate far more confusing and far more polarized---than it needs to be.

Much of the discourse uses human intelligence as a paradigm of generality, but we argue that this notion is fundamentally misguided. As humans, we struggle to perceive our own blind spots; this leads to the illusion of generality. In truth, we are only good at the specific subset of tasks that are important to our existence, but are completely incapable of performing tasks outside this narrow range. Awareness of human limitation gives rise to a critical realization: humans may be specialized creatures, but are nonetheless capable of accomplishing or quickly learning a wide range of incredible things. We argue that the current focus on AGI and generality as the North Star of the field, should be replaced with an emphasis on adaptability, including the time it takes to learn a new task, and the range of tasks capable of being learned. We refer to this as **Superhuman Adaptable Intelligence (SAI)**. A natural corollary of an emphasis on adaptability is the need for a model with strong assumptions about the world. This suggests **self-supervised learning (SSL)** as a promising way to acquire generic knowledge, and **world models** as a useful mechanism for planning and zero-shot task transfer. We believe that recentering the discourse around SAI will lead towards better communication, clearer goals, and more rapid progress.

\begin{tcolorbox}[
enhanced,
colback=gray!3,
colframe=black!60,
arc=2mm,
boxrule=0.6pt,
left=6pt,right=6pt,top=6pt,bottom=6pt
]\setlength{\parskip}{4pt}

{\color{teal!70!black}\textbf{[Pos.\ \#1]} Human intelligence is not general in any meaningful way}

{\color{blue!70!black}\textbf{[Pos.\ \#2]} Generality is not a requirement for an intelligence to be extremely useful}

{\color{violet!75!black}\textbf{[Pos.\ \#3]} There is no consensus on the meaning of the term AGI in industry or academia}

{\color{orange!85!black}\textbf{[Pos.\ \#4]} Existing definitions are insufficient}

{\color{red!70!black}\textbf{[Pos.\ \#5]} We should instead focus on Superhuman Adaptable Intelligence, which points toward SSL and world models}

\end{tcolorbox}

# Human Intelligence is Specialized {#sec_HumanIntelIsSpecialized}

While the idea of human intelligence as the paradigm of generality is ubiquitous in the literature, two related but distinct notions of generality are often conflated:

1.  The average, educated human is capable of a wide range of tasks that are very \`\`general\`\` in nature, and enable a wide range of objectives to be accomplished. This includes things like complex planning and locomotion, fine motor skills, abstract thinking, self simulation, spatial reasoning, and visual understanding.

2.  Human intelligence as a whole is \`\`general\`\` because it can be specialized to \`\`any\`\` given task, whether it be medicine, advanced mathematics, plumbing, or playing chess.

Both of these claims make the same error: **circularly defining generality in human terms, and then asserting humanity as its paradigm.**

Evolution has honed humanity over time to be highly specialized in the domain of skills necessary for survival in the physical world. The things most innate to us are not always the most simple, but the most critical for our survival. This observation has given rise to Moravec's Paradox, where the tasks we find easiest, like locomotion, are difficult for computers, but tasks that we find difficult, which are not essential to our survival, like playing chess, turn out to be much easier for computers. This clearly illustrates the illusion of our generality. While the average abilities of an educated human are truly remarkable, one only need ask them to play chess like a grandmaster, or compose a musical symphony like Beethoven, to truly realize the hubris of calling such an intelligence general.

The above argument serves to dispel the first definition of generality. The second notion of generality is more subtle in its error. By identifying specialization/adaptation as the core component of generality, it is closer to the definition of SAI that we are arguing in favor of. However, our point of contention is that we object to calling human adaptation general. While we are excellent at adapting to the tasks that were of high evolutionary importance, we are simply incapable of adapting to many tasks outside of this range at a high level. Take chess as an example. Magnus Carlsen is widely regarded as the greatest chess player of all time, and as such represents the pinnacle of human adaptation when it comes to playing chess. But this begs the question: Is Magnus actually any good at chess? When compared with the best computers, the answer is clearly no. Even more damning is that with modern day computers, creating a program that plays chess at a much higher level than Magnus *is not particularly difficult*. Our perception of his ability is colored by the limitations of humanity. Humans in general are bad at chess; Magnus is much better than most humans. The conclusion, that Magnus is good at chess, is a perfect illustration of our own human centric biases. Magnus Carlsen is not objectively good at chess, he is good at chess with respect to human performance levels. By any objective metric, playing chess at a much higher level is not difficult from a computational perspective, but it is something that humans are incapable of. Relatedly, many animals can perform tasks that humans cannot do at a high level, such as echolocation.

So what are humans then, if not a paradigm of general intelligence? The evidence points to specialized adaptation. We have an incredible ability to adapt and specialize, within the range of tasks that we evolved to address [@russell_norvig_2010_AIModernApproach]. This is the main contention of [**\[Pos. #1\]**]{style="color: teal!70!black"}.

## Alternative Views {#alternative-views .unnumbered}

Several objections have been raised to our assertion that humans are not general. Elon Musk and Demis Hassabis have claimed that our argument conflates General Intelligence with Universal Intelligence. They further argue that the human brain is indeed general in the Turing Machine sense, capable of learning anything computable given enough time, memory, and data. They therefore claim that \"brains are the most exquisite and complex phenomena we know of in the universe (so far), and they are in fact extremely general\" [@Hassabis2025XUniversalIntelligence; @Musk2025DemisIsRight].

In response, it is indeed important to clarify terms. Universal Intelligence refers to the ability to act intelligently over all computable environments [@legg2007universal]. General Intelligence, as Demis is using it, seemingly refers to adaptation to any computable task given time and resources. Far from conflating the terms, we are arguing that humans are not capable of either of these things.

While the issue of whether approximate Turing-completeness under idealized conditions matters for defining intelligence is a legitimate question, it is missing the point. Even if we grant the fact that human brains are approximately Turing-complete (far from an obvious fact), under real constraints such as finite memory, finite time, and finite attention, we handle only a tiny sliver of possible problems. The space of possible functions is unimaginably vast, and we can represent an infinitesimal fraction. We feel general because we can't perceive our blind spots, not because we lack them.

# Implications for AI North Star Terminology

<figure id="fig_axesOfAGI" data-latex-placement="ht">
<div class="center">
<img src="Figures/Fig1_AxesOfAGI.png" />
</div>
<figcaption>A two-dimensional semantic map organizing prominent definitions for AGI and other North Star measures of artificial intelligence, along two axes. The vertical axis represents the source of intelligence, ranging from performance-based capabilities (DO, bottom) to learning and adaptability (LEARN, top). The horizontal axis represents the scope of tasks, from universal/open-ended domains (left) to human-centric and economically-focused domains (right). Definitions cluster into three categories: Adaptive Generalists (teal) emphasize learning efficiency and generalization in open environments; Cognitive Mirrors (violet) focus on replicating human-level cognitive capabilities across broad task domains; Economic Engines (orange) prioritize practical utility and economic value in human-relevant tasks. <em>Superhuman Adaptable Intelligence (SAI)</em> falls into the realm of adaptable AI that can do anything that is important both inside and outside the human realm. </figcaption>
</figure>

## A Survey of Definitions

Measuring machine intelligence is non-trivial. Language-based tests, such as the Turing Test, where a machine has to pretend to be a human well enough to fool a human to believe the machine is human [@turing1950_TuringTest] and the Winograd schema challenge that tests common-sense reasoning and natural language understanding [@Levesque_WinogradSchemaChallenge] are helpful to measure aspects of intelligence, but not a true measure of whether AGI is achieved. Steve Wozniak's *Coffee Test*---whether a machine could make a cup of coffee if sent to a random kitchen---draws attention to the fact that, despite claims to the contrary[@chen_belkin_bergen_danks_2026], language alone is not sufficient to be considered intelligent and that human intelligence *adapts* well to unseen environments [@Wozniak2010_CoffeeTest]. AGI definitions commonly fall into categories along two axes, the first one defining what capabilities we are referring to, and the second defining the required scope of those capabilities:

1.  Axis 1 (capability): (A) AI that can **learn** to do tasks vs (B) AI that can **do** tasks out of the box

2.  Axis 2 (scope): (I) Anything, (II) Anything important, (III) Anything humans can do, (IV) Anything humans can do that is important

We visualized popular definitions of AGI in accordance with this two-dimensional framework in Fig. `\ref{fig_axesOfAGI}`{=latex}.

There is a reasonable argument to be made for a third axis that spans the space from observable capability to subjective understanding [@Searle_1980_sentience], thereby including the dimension of the *internalist* view. According to this view, an AI could meet any performance benchmark for AGI, yet if it lacks subjective experience (qualia), it remains merely a simulation of intelligence rather than the genuine article. While the exploration of this dimension is profound, we consider it outside the scope of this work. Our focus is on operational definitions---metrics that can be observed and measured---whereas the internalist objection currently resides in the realm of metaphysics and philosophy of mind, offering no falsifiable test for engineering progress.

Regardless, one thing is clear: AGI as a term is overloaded with varying definitions from high-impact sources. This confusion has even led to claims that AGI has already arrived[@aguerra_y_arcas_norvig_2023; @chen_belkin_bergen_danks_2026]. The varying definitions plotted in Fig. `\ref{fig_axesOfAGI}`{=latex} and the imprecise nature of the public discussions being had by high-profile individuals around AGI, as shown in the previous section, clearly demonstrate [**\[Pos. #3\]**]{style="color: violet!75!black"}.

## Why Existing Definitions are Insufficient

[**\[Pos. #1\]**]{style="color: teal!70!black"} has the following implications for these definitions:

1.  Humanity is still *quite useful*, so AI does not need to be general to still be groundbreaking and powerful ([**\[Pos. #2\]**]{style="color: blue!70!black"}).

2.  Any definition focused exclusively on humanity as a goal cannot claim to be general.

3.  Focusing exclusively on humans is also not ideal, since there are many tasks we cannot do that are still high utility and important.

In addition, for a definition to be useful, it must meet the following criteria:

1.  It must be feasible. If a goal is not possible to be realized from a theoretical perspective, it provides questionable value.

2.  It must be internally consistent. If a definition claims to be *general*, it must actually *be general* in a meaningful way.

3.  It must be assessable. The goal as presented should lead to clear subgoals and strategies, and there must be a clear metric with which to measure progress.

Having established these criteria, we can now demonstrate [**\[Pos. #4\]**]{style="color: orange!85!black"}, namely that existing definitions come up short.

\begin{table*}[t]

\small
\setlength{\tabcolsep}{6pt}
\renewcommand{\arraystretch}{1.25}

\begin{tabularx}{\linewidth}{%
  >{\arraybackslash}X
  >{\arraybackslash}p{0.11\linewidth}
  >{\arraybackslash}p{0.39\linewidth}
}
\toprule
\textbf{AGI Definition} & \textbf{Failure mode} & \textbf{Explanation} \\
\midrule

\textit{``Match or exceed the cognitive versatility and proficiency of a well-educated adult.''} \cite{hendrycks2025definitionagi}
& Not Consistent & Human cognition is not general in any meaningful way. This definition is also unnecessarily narrow\\

\textit{``Highly autonomous systems that \textbf{outperform humans at most economically valuable work}.'' \cite{Morris_LevelsOfAGI}}
& Not Consistent  & The focus here is explicitly  on a subset of tasks that are of economic worth. Clearly not General\\

\textit{``A system that should be able to do \textbf{pretty much any cognitive task that humans can do}.'' --- \textbf{Demis Hassabis (DeepMind CEO)} \cite{Mitchell2024_DebatesOnTheNatureOfAGI}}
& Not Consistent & This definition is  not actually general both in its focus on humans, and also  in its focus on "cognitive tasks", which seems to be to the exclusion of physical tasks like locomotion \\

``We need precise, quantitative definitions and measures of intelligence – in particular human-like general intelligence....
...The intelligence of a system is a measure of its \textbf{skill-acquisition efficiency} over a scope of tasks, with respect to priors, experience, and generalization difficulty'' \cite{chollet2019measureintelligence}
& Not Consistent & Chollet himself admits as much, calling human cognition 'only “general” in a limited sense', a contradiction of terms \\


\textit{``We define AGI as a system that demonstrates \textbf{broad generality} (performing a wide range of tasks) and \textbf{high performance} (matching or exceeding human levels).'' \cite{Morris_LevelsOfAGI}}
& Not Feasible & While they acknowledge generality requires exceeding human levels, with a focus on direct performance over adaptation, such a system is not realizable with finite resources. \\



``Intelligence measures an agent’s ability to \textbf{achieve goals in a wide range of environments}.'' \cite{legg2007universal}
& Not Feasible & Legg and Hutter define the domain of environments as all that are computable. They further emphasize ability over adaptability. Strong ability on such a vast set of tasks is not realizable with finite resources\\


"Highly autonomous systems that outperform humans at most economically valuable work" \cite{openai_charter}. & Not Assessable & The focus on performance means that any evaluation would have to benchmark against an ever growing set of tasks\\

\bottomrule
\end{tabularx}
\caption{The failure of most AGI definitions. Note: some definitions fail for multiple reasons, but we only highlight one.}
\label{tab:definitions}
\end{table*}

First, definitions of AGI that claim true generality fall prey to the \"No Free Lunch\" theorem---no single, general-purpose machine learning algorithm or optimization strategy works best for every problem [@Wolpert_NoFreeLunch]. Or to frame it differently: given finite energy, an approach that directs available energy towards learning a finite set of tasks will reasonably outperform an approach that distributed the finite energy over an infinite amount of tasks. At the limit, the amount of energy dedicated to each of the infinite tasks approaches zero. Thus, any definition that defines the scope as *literally anything computable* fails our criteria by *not being feasible*.

Second, any definition of AGI that focuses on a subset of tasks, or that emphasizes specialization and adaptation as key metrics, can not truly be said to be general. Similarly, AGI measured by the \"general\" nature of humans is not truly general. Chollet acknowledges this problem and states that human intelligence \"is only "general" in a limited sense\" [@chollet2019measureintelligence], but we contend that this is simply an inherent contradiction of terminology. For this reason, Shane Legg and Marcus Hutter speak of \"Universal Intelligence\" rather than AGI because human intelligence is \"far too limited\" [@legg2007universal], since a definition of AGI that is human-centric excludes the infinite space of non-human intelligence [@Wang2019_OnDefiningArtificialIntelligence]. Despite the above objections, AGI defined specifically as the ability to match human cognitive breadth is quite popular: Hendrycks et al. and Morris et al. argue that human generality is the only general example for the concept of intelligence [@hendrycks2025definitionagi; @Morris_LevelsOfAGI]. While the above emphasized the ability to do *anything* humans can do, others argue that AGI must be able to learn to do or do anything *important* that humans can do. Their arguments acknowledge that the domain of human intelligence is finite and that it is desirable for AI to be able to perform or learn to perform a subset of important tasks: tasks that generate economic value. In the words of Nilsson, \"Systems with true human-level intelligence should be able to perform the tasks for which humans get paid\"  [@Nilsson_EmploymentTest].[^1], an idea further echoed in the OpenAI Charter [@openai_charter].

We suspect that the cause for such a widespread conflation of human intelligence with generality stems from the urge for self-flattery, and the difficulty of truly conceiving of our own limitations. Regardless, all the definitions in this category fail our criteria by *not being internally consistent*.

One might raise the objection that our contention here is merely one of semantics, and that these definitions can still be valuable North Stars for the field, even if they misuse the term \"general\". In response, we argue that when defining the end goal of an entire field, semantics are extremely important. A misapprehension of generality is dangerous for several reasons. It obscures how such an intelligence can actually be realized, which violates our feasibility criteria. Further, it can lead to unnecessarily narrow conceptions of what the end goal should be. For example, the belief that humans are general has led to several definitions of AGI as mimicking humans, which is certainly far too limited a goal for what AI is and can become.

Third, definitions of AGI that cannot be assessed or evaluated are not practical or useful. Legg and Hutter acknowledge this issue in their paper: \"various practical challenges will need to be addressed before universal intelligence can be used to construct an effective intelligence test\" [@legg2007universal]. The ability to measure progress is critical for many reasons. An enormous body of evidence suggests that the precise ability to measure progress is one of the strongest catalysts of *progress itself* [@wyder2025common]. Relatedly, clear metrics usually give an idea of what sorts of subgoals and strategies are useful. Even more fundamentally, a definition that is not measurable is not really much of a definition at all, and is often indicative of a lack of precision, or a hand-wavy nature.

This criterion highlights a key difference between the two categories of definitions on our first axis (capability). Any definition that focuses on *learning* or *adapting* implicitly has a clear metric with which to evaluate intelligence: speed of adaptation to new tasks. Conversely, definitions focused on *doing* and *performing* often lack any obvious way to measure this, other than benchmarking the AI's ability to do everything, an ever-expanding and ill-defined set of benchmarks. Table `\ref{tab:definitions}`{=latex} elaborates on our issues with many popular AGI definitions ([**\[Pos. #4\]**]{style="color: orange!85!black"}).

# Why Specialization Wins

To motivate [**\[Pos. #5\]**]{style="color: red!70!black"}, it behooves us to explore the importance of specialization. Specialization is not an accident of biology; it is a predictable consequence of limited resources, competing objectives, and environments that reward performance on a small subset of evolutionarily relevant challenges. Forister et al. state that a generalist organism carries genetic traits suited to various environments, but never the ideal combination for thriving in any one of them [@Forister_RevisitingEvoOfEcoSpec]. Organisms face persistent trade-offs: improving performance on one niche often reduces performance elsewhere, and selection therefore tends to favor designs that are sharply tuned to the local payoff landscape rather than uniformly competent across all possible conditions [@Futuyma1988_EvoOfEcoSpec]. In markets and organizations, the same logic appears under a different name: entities that fail to meet the performance threshold disappear, so competition acts as a selection mechanism that amplifies effective strategies and eliminates ineffective ones  [@Hannan_Freeman_1977_EcologyOfOrganizations; @Loasby_1983_EvoTheoEconChange]. AI systems are not exempt from this pressure: models that are too costly, too unreliable, or insufficiently accurate in the domains that matter will be neglected in favor of systems that are better matched to those domains.

In machine learning, the core mathematical point is that performance gains require assumptions about the problem class i.e. the target distribution. Again, "No Free Lunch". An algorithm wins by being a good fit for the target problem. As AI improves, specialized systems can improve too: if it is possible to attain a higher performance on a task, a system that concentrates that capability on a narrower task can typically realize larger gains than a system that must spend capacity and compute covering additional unrelated tasks.

Practically, this means that generality is intractable. Although multi-task learning can benefit performance when tasks share an underlying structure, it can lead to \"negative transfer\" when tasks compete for representational capacity or impose conflicting gradients, and thereby harm task performance [@ruder2017overviewmultitasklearningdeep]. Models that route queries to specialized subsets of model parameters depending on the task are a technological acknowledgment of this limitation---these systems attain breadth and scale through repeated, modular specialization rather than uniform shared parameters for all inputs[@Fedus2022_MixtureOfExperts]. Although seemingly \"general\", these models achieve their best performance through internal specialization.

Universal generality is a theoretical concept, but in practical terms it is a myth. A large fraction of what we intuitively mean by \`\`do anything" reduces to planning and decision-making under uncertainty. Classical planning problems quickly become intractable in worst case (e.g., propositional STRIPS variants) [@Bylander1994], and probabilistic planning inherits similarly severe complexity barriers [@LittmanGoldsmithMundhenk1998]. This does not mean planning is impossible in practice; it means that broad generality across arbitrary environments has no reason to be computationally cheap. A specialized agent that restricts the space of environments, goals, and action models it must handle can leverage structure and avoid worst-case blowups. This is similar for humans, as our biases, genetic makeup, and environment naturally drive us towards "human things," a mere sliver of universal generality.

Empirically, specialized AI systems repeatedly demonstrate the advantage of concentrating model design, data curation, and evaluation around a single domain objective. Protein structure prediction is an archetypal example: AlphaFold achieved dramatic gains by targeting a specific scientific task with task-specific training and architectural choices, and it set a new bar for accuracy and usefulness in that domain [@Jumper2021]. It is therefore plausible---indeed, expected under both the No Free Lunch framing and negative-transfer dynamics---that an AI system asked to \`\`fold proteins *and* fold laundry" will not match a protein-folding specialist on protein-folding performance unless it internally recovers specialization (e.g., via routing, modularity, or dedicated submodels) [@Wolpert_NoFreeLunch; @ruder2017overviewmultitasklearningdeep; @Fedus2022_MixtureOfExperts; @Jumper2021].

Specialization also clarifies why AI can be uniquely valuable: it can target precisely the domains where human cognition is systematically miscalibrated. Humans exhibit stable biases and heuristics that are often sensible under ancestral constraints but error-prone in modern settings [@TverskyKahneman1974]. More broadly, the evolutionary mismatch hypothesis argues that many psychological mechanisms were tuned for past selection regimes and can therefore produce maladaptive outputs in contemporary environments [@Li2018Mismatch]. This creates an opportunity: specialized AI systems can be designed to excel exactly where humans are weak but where correctness now matters (e.g., high-dimensional statistical inference, optimization under constraints, complex mechanistic modeling) [@TverskyKahneman1974; @Domingos2012].

Finally, none of this implies that generality is "bad". It implies a narrower, more operational claim: we must embrace specialization rather than fight it. Even in domains that feel like demonstrations of \`\`general intelligence," the history of AI milestones frequently reflects intense domain targeting rather than broad competence, while newer \`\`general" methods still succeed by exploiting strong structure in the task family [@Silver2018_AlphaGo]. For high-stakes applications (e.g., scientific discovery, medicine), the correct aspiration is not to preserve the romance of a single generalist mind, but to build the strongest available specialists---and, where needed, compose them into systems whose coordination is engineered rather than assumed.

We should also note that this claim does not dispute the bitter lesson [@sutton2019bitter]. The bitter lesson is the observation that approaches that scale with computational power tend to outperform ones based on domain knowledge, a claim that we agree with. The diminishing usefulness of domain knowledge is distinct from the usefulness of domain specialization. As scaling progresses, we will need to know less about proteins to build a system that does protein folding; however, such a system still benefits from focusing *specifically* on proteins.

<figure id="fig_Uni_AI_Hum_domain" data-latex-placement="ht">
<img src="Figures/Fig3_task_spaces.png" style="width:100.0%" />
<figcaption>Illustration of the task space overlap between the human domain and the AI domain within the universal task space.</figcaption>
</figure>

Awareness of the \"narrowness\" of humans and the benefit of specialization allows us to exploit the complementary nature of AI as it is filling in the gaps in the human domain where it matches or eclipses human performance, while also being able to perform tasks outside the human domain (see figure `\ref{fig_Uni_AI_Hum_domain}`{=latex}).

# Towards Superhuman Adaptable Intelligence

Given the utility of specialization, we propose **Superhuman Adaptable Intelligence (SAI)** as the *idée fixe* of AI research ([**\[Pos. #5\]**]{style="color: red!70!black"}). Unlike the earlier AI North Star terminologies that we challenged, our definition of SAI sidesteps issues of feasibility by focusing on adaptation to tasks with human utility, as opposed to the performance of simply doing the task. We embrace the necessity of specialization, and avoid the pitfall of claiming generality. Further, we broaden the task domain beyond the human task domain, while not requiring that the AI is master of the human task domain as a whole. Finally, adaptation speed---the speed with which an agent can acquire new skills and learn new tasks, can be measured, and thus our approach is practical.

\begin{tcolorbox}[
  enhanced,
  colback=blue!2,
  colframe=blue!55!black,
  arc=2mm,
  boxrule=0.8pt,
  left=8pt,right=8pt,top=8pt,bottom=8pt
]{\color{blue!70!black}\large\bfseries Definition}\par

\noindent
\textbf{Superhuman Adaptable Intelligence (SAI)} is capable of adapting to exceed humans at any task humans can do, while also being able to adapt to tasks outside the human domain that have utility.
\end{tcolorbox}

Our definition is most similar to Chollet's [@chollet2019measureintelligence], except that we object to calling such a definition general, and also reject his view that we \"should benchmark progress specifically against human intelligence\". While human performance can be a useful reference point during early development, we argue that anchoring benchmarks to human baselines is ultimately orthogonal to the route to superhuman capability. AI models and systems that optimize well-defined objectives and improve through self-play, evolutionary search, or large-scale exploration in simulation can surpass human performance without imitation [@zhao2025absolutezeroreinforcedselfplay]. We believe that over-indexing on \`\`human-level" metrics risks misspecifying the target and limiting evaluation to anthropocentric tasks and constraints.

More broadly, any evaluation scheme that treats intelligence as a checklist of fixed competencies---whether anchored to human baselines or to an ever-growing catalog of tasks---misses the point of SAI. Instead, the focus should be on minimizing adaptation time. The space of possible skills is effectively unbounded, so individually testing skills becomes a Sisyphean endeavor.

\begin{tcolorbox}[
  enhanced,
  colback=teal!2,
  colframe=teal!55!black,
  arc=2mm,
  boxrule=0.8pt,
  left=8pt,right=8pt,top=8pt,bottom=8pt
]{\color{teal!70!black}\large\bfseries Metric}\par

\noindent
\textbf{SAI} is measured by the \textbf{speed} with which it takes an agent to acquire new skills and learn new tasks.
\end{tcolorbox}

Our vision towards SAI as a North Star is potentially realizable via self-supervised learning (SSL). We believe that learning in the embedding space as opposed to in the token space may drive performance gains. We also believe that world models may help us advance towards SAI. Simultaneously, we reject the concept of a single model or architecture as the \"one paradigm to rule them all,\" as it would suggest that the evolution of artificial intelligence will come to a halt once that architecture has been discovered.

It is also important to note that our definition emphasized tasks outside the human domain that have *utility*. The purpose of this clause was to exclude a potential infinitely set of useless tasks from our definition, but we have not as of yet precisely defined utility, or how we determine task importance. Many definitions have been proposed, such as economic value or societal agreement. The exact definition one prefers is largely orthogonal to our arguments here, and we leave debating which one is most appropriate to other work.

<figure id="fig_divergence" data-latex-placement="t">
<img src="Figures/tree_clean.png" style="width:90.0%" />
<figcaption>Illustration of autoregressive model divergence</figcaption>
</figure>

## Why Self Supervised Learning

By shifting the focus from performance to adaptation, SAI points to SSL as a potential pathway. Specializing to a wide range of tasks requires the ability to learn generic knowledge. In many real-world settings, supervised learning is not feasible in practice because it presupposes access to large, reliably labeled datasets [@LeCun2015DeepLearning]---an assumption that often fails outside carefully curated benchmarks. In contrast, SSL can be applied to any data that contain exploitable internal structure [@balestriero2023cookbookselfsupervisedlearning]. Further, perhaps even more powerful, SSL has actually been shown to be on par with and even exceed SL *even when supervision is abundant* [@he2020momentum; @grill2020bootstrap; @chen2020simple; @he2022masked]. SSL fueled the rise of GPTs, and has reached SOTA performance in most domains.

## The substrate for fast adaptation

Adaptation and specialization can be produced by many architectures and paradigms, yet which architecture is most performant remains an open research question. Designing maximally adaptable algorithms remains a central pursuit of meta learning [@finn2017model]. The brain is not a monolith, but a system of systems. This suggests that no single system will be able to adapt in the way that humans do. Thus, we believe that adaptation requires hierarchy and diversity of models and modalities.

Specifically, we believe that adaptation is benefitted by a world model and by moving from token level prediction to latent prediction architectures such as Dreamer 4, Genie 2, or Joint Embedding Prediction Architecture (JEPA) [@van2025joint; @Assran2023_I_JEPA; @hafner2025_Dreamer4; @bruce2024_genie_generativeinteractiveenvironments]. Pixels are not state. The physical world is too rich and too stochastic for pixel-level prediction to be a meaningful objective; what matters is learning and forecasting a compact representation that captures the system's dynamics. It has long been posited that humans and animals make heavy use of world models in their cognition [@craik1967nature]. A world model allows for simulation, and therefore planning [@Schrittwieser2020]. As such, it is the hallmark of zero shot and few shot adaptation [@lecun2022path]. Although, we find this argument towards a particular group of architectures persuasive, SAI doesn't dictate a specific architecture.

## On the importance of diversity

Homogeneity kills research. Autoregressive LLMs and LMMs have become the dominant architecture in the state-of-the-art \"general\" AI space [@Huang2024_LLM2LMM; @su2025largelanguagemodelsreally]. This concentration is understandable---shared tooling and benchmarks create momentum---but it also narrows the search space. Progress is most rapid when a greater diversity of solutions are explored.

In addition to slowing progress, these homogeneous solutions are often only local optima. GPTs and similar autoregressive models are no exception, they have many flaws [@lin2021limitations]; their errors diverge exponentially with prediction length [@lecun_objective_driven_ai_harvard_2024], as shown in figure `\ref{fig_divergence}`{=latex}. In practice, compounding prediction error makes long-horizon interaction brittle. SAI counters homogenization and drives diversity in AI development. It provides a more coherent and reasonable target that fosters a diversity of specialization profiles. Embracing specialization counteracts incentives that lead to fast convergence towards the mean.

# Discussion

The AGI discourse is often framed as a single destination, benchmarked against an ill-defined notion of \`\`human-level" generality. We argue that this framing is both scientifically unhelpful and operationally misleading. Human intelligence is not a universal competence engine; it's a collection of specialized capabilities shaped by constraints and selective pressures. There is no reason to expect the most capable artificial systems to mirror the human task distribution, nor to treat human performance as the natural reference point for progress.

We propose *Superhuman Adaptable Intelligence (SAI)* as a more concrete and productive North Star: the ability to rapidly adapt to *important* tasks inside and outside the human domain. The central quantity is not a checklist of skills, but the speed and efficiency with which new skills are acquired under realistic resource constraints. This reframes evaluation away from human-centric benchmarks and toward measurable adaptation dynamics.

\begin{tcolorbox}[
  enhanced,
  colback=red!2,
  colframe=red!55!black,
  arc=2mm,
  boxrule=0.8pt,
  left=8pt,right=8pt,top=8pt,bottom=8pt
]{\color{red!70!black}\large\bfseries Key Insight}\par

\noindent
The AI that folds our proteins should not be the AI that folds our laundry!
\end{tcolorbox}

Finally, SAI's specialization focus fosters an environment that promotes diverse engineering approaches. Progress won't come from a single architecture optimized for next-token prediction. We believe instead that systems that learn general latent structure from unlabeled data, build world models that support planning, and compose specialized modules are better suited to fast adaptation. Put another way: it is highly unlikely that an AI tasked to fold both proteins and laundry will exceed a protein-folding specialist at protein folding or a laundry-folding specialist at laundry folding. Given limited resources, capability should be allocated to the tasks that carry utility rather than to an anthropocentric notion of universal competence. One promising path forward is therefore to emphasize self-supervised learning approaches, predictive world models, and modularity---and to judge advances by how quickly and reliably they produce new competence, rather than by how closely they imitate human behavior.

\bibliographystyle{icml2026}
\newpage
\appendix
\onecolumn

[^1]: Nilsson doesn't use the term AGI, he speaks of \"strong AI\" or \"human-level artificial intelligence\" (the term was popularized later by Shane Legg and Ben Goertzel), but still pushes the idea of humans being \"more-or-less\" general purpose
