The pursuit of scientific knowledge represents one of humanity’s most intricate endeavors. Initially, researchers must possess a comprehensive grasp of established findings and pinpoint a substantive void in current understanding.
Following this, the formulation of a precise research inquiry is paramount, accompanied by the meticulous design and execution of an investigative procedure to elicit a response.
Subsequent to experimentation, the rigorous evaluation and interpretation of acquired data are essential, a process that can invariably lead to the emergence of new avenues of inquiry.
Could such a complex undertaking be subjected to automation? Last week, Sakana AI Labs unveiled an “AI scientist” – an artificial intelligence framework purported to achieve scientific breakthroughs in the realm of machine learning, operating with complete autonomy.
Leveraging advanced generative large language models (LLMs), analogous to those powering sophisticated AI chatbots, this system exhibits the capacity to conceptualize ideas, select promising hypotheses, engineer novel algorithms, visualize experimental outcomes, and compile comprehensive research papers documenting the investigation and its conclusions, complete with citations.
Sakana asserts that its AI technology can manage the entire lifecycle of a scientific investigation at an estimated cost of merely US$15 per publication—a figure less than the expense of a single scientist’s midday meal.
These assertions are substantial. Do they withstand scrutiny? Furthermore, even if validated, would a proliferation of AI-driven scientific publications, generated at an unprecedented pace, truly benefit the scientific community?
The Mechanics of Computational Scientific Inquiry
A significant portion of scientific work is conducted openly, and the vast majority of accumulated scientific knowledge is documented in accessible formats, enabling its transmission and understanding. Millions of scholarly articles are readily available through digital archives, including platforms like arXiv and PubMed.
Large language models, when trained on this extensive corpus of data, internalize the linguistic conventions and inherent patterns characteristic of scientific discourse. Consequently, it is perhaps not surprising that a generative LLM can produce content resembling a credible scientific paper, having assimilated numerous exemplary models for replication.
The more pertinent question lies in whether an AI system can generate a paper of genuine scientific merit—one that is truly *insightful*. A critical prerequisite for groundbreaking science is originality.
Assessing the Novelty and Significance
Scientific professionals have no interest in information that is already common knowledge. Instead, their pursuit is directed towards novel discoveries, particularly those that significantly diverge from established paradigms. This necessitates a discerning judgment regarding the scope and intrinsic value of any new contribution.
The Sakana system attempts to address the challenge of “interestingness” through a dual-pronged approach. Firstly, it evaluates novel research concepts by assessing their thematic overlap with existing scholarly work, drawing from repositories such as Semantic Scholar. Any proposed idea deemed excessively derivative is automatically disqualified.
Secondly, Sakana’s AI incorporates a simulated peer review mechanism. This involves employing a separate LLM to evaluate the quality and originality of the research paper generated by the primary system. This process is informed by the extensive examples of peer reviews available on platforms like openreview.net, which provide a framework for critiquing academic submissions. These review protocols have also been incorporated into the training data for LLMs.
Potential Limitations in AI’s Self-Assessment Capabilities
Evaluations of Sakana AI’s generated output have been varied. Some critics have characterized the results as producing “a relentless stream of uninspired scientific content.”
Even the system’s internal evaluations of its own research papers often rate them as mediocre at best. While enhancements are anticipated as the technology matures, the fundamental question of the utility of automated scientific publications persists.
The proficiency of LLMs in accurately assessing research quality remains an open area of investigation. My own research, slated for forthcoming publication in Research Synthesis Methods, indicates that LLMs exhibit limitations in evaluating the risk of bias within medical research studies, although this capability may improve over time.
Sakana’s system automates discoveries within the domain of computational research, an area considerably more amenable to automation than scientific disciplines reliant on physical experimentation. The experiments conducted by Sakana are executed through code, which, being a structured form of text, is well-suited for LLM training and generation.
AI as a Supportive Instrument for Researchers, Not a Replacement
For several decades, artificial intelligence researchers have been developing tools designed to augment scientific endeavors. The sheer volume of published research often makes it a formidable task for scientists to identify relevant literature pertaining to a specific research question.
Specialized search engines, powered by AI, assist researchers in locating and consolidating existing studies. These include not only the aforementioned Semantic Scholar but also more contemporary platforms such as Elicit, Research Rabbit, scite, and Consensus.
Text mining utilities, like PubTator, delve into research papers to extract pivotal information, such as specific genetic mutations, associated diseases, and their established interrelationships. This functionality is particularly advantageous for the curation and organization of scientific data.
Machine learning techniques have also been applied to facilitate the synthesis and analysis of medical evidence through tools like Robot Reviewer. Furthermore, summaries provided by systems such as Scholarcy, which compare and contrast arguments presented in various papers, are instrumental in conducting literature reviews.
The overarching objective of these tools is to enhance the efficiency and effectiveness of scientists’ work, rather than to supersede their roles.
Potential for AI-Generated Research to Exacerbate Existing Issues
While Sakana AI acknowledges that it does not foresee a reduction in the importance of human scientists, its envisioned “fully AI-driven scientific ecosystem” could precipitate profound changes within the scientific landscape.
One significant concern is the potential for AI-generated publications to proliferate unchecked within scientific literature. This could lead to future AI systems being trained on AI-generated content, a phenomenon known as model collapse. Such a scenario might result in a progressive decline in their capacity for genuine innovation.
However, the ramifications for the scientific enterprise extend beyond the internal dynamics of AI research systems.
The scientific community already grapples with problematic entities, such as “paper mills” that produce fabricated research articles. This issue is poised to intensify considerably with the advent of scientific paper generation achievable at a $15 cost and a rudimentary initial prompt.
The imperative to meticulously verify errors within a vast influx of automatically generated research could swiftly outstrip the capacity of human researchers. The established peer review system is, by many accounts, already compromised, and inundating it with additional research of dubious quality is unlikely to yield improvement.
The bedrock of scientific progress is trust. Researchers place a premium on the integrity of the scientific process to ensure confidence in the validity and advancement of our comprehension of the world, and increasingly, of sophisticated machine intelligence.
An ecosystem in which artificial intelligence systems assume a central role raises fundamental questions concerning the meaning and value of scientific inquiry, and the appropriate level of reliance we should place on AI-driven scientific agents. Is this the trajectory we wish for the future of scientific exploration?

