In the summer of 1990, three trucks sprayed a yellow liquid at different sites in and around Tokyo, including two US Naval bases, Narita Airport, and the imperial palace. The attackers belonged to a group called Aum Shinrikyo, a Japanese cult that aimed to cause the collapse of civilization, making space for the rise of a new society ordered according to their religious ideals. Five years later, Aum would gain notoriety by carrying out sarin gas attacks on the Tokyo subway, killing 13 and injuring thousands.
Aum intended for the yellow liquid dispersed in the summer of 1990 to contain botulinum toxin, one of the most poisonous biological substances known to human beings. However, no one was killed in the attacks that summer. One possible factor in their failure is that Aum lacked a crucial bit of knowledge: the difference between disseminating the bacterium Clostridium botulinum and disseminating the highly deadly botulinum toxin it produces. It is unclear whether Aum even managed to acquire a toxin-producing form of the bacterium, and there are also other causes for why Aum’s attack failed.
But if it had access to contemporary artificial intelligence tools, Aum Shinrikyo, or a similarly malign group, might not have made this and other mistakes. ChatGPT is very good at answering questions and providing knowledge, including on the production of botulinum toxin. If Aum had had access to ChatGPT, would the attacks of the summer of 1990 be remembered as possibly the worst bioterrorism event in history?
Advances in artificial intelligence have tremendous potential to have positive impacts on science and health. Tools like ChatGPT are revolutionizing how society works and learns, and artificial intelligence applied to biology has led to solving the decade-old protein folding problem and is transforming drug discovery. However, as artificial intelligence raises the ceiling of biological engineering and helps distribute these powers to a tremendous number of individuals, there is a serious risk that it will enable ill-intentioned actors like Aum Shinrikyo, to potentially devastating effect. As I have discussed in a recent preprint paper, large language models (LLMs) like ChatGPT, as well as novel AI-powered biological design tools, may significantly increase the risks from biological weapons and bioterrorism.
How AI language models are a threat multiplier for bioweapons
Large language models — which are very good at answering questions and teaching about dual-use knowledge — may in particular increase the accessibility of biological weapons. In a recent exercise at MIT, it took just one hour for ChatGPT to instruct non-scientist students about four potential pandemic pathogens, including options for how they could be acquired by anyone lacking the skills to create them in the lab, and how to avoid detection by obtaining genetic material from providers who do not screen orders.
At the same time, the story of Aum Shinrikyo’s lack of knowledge about the difference between Clostridium botulinum and botulinum toxin is not an isolated example. Past biological weapons programs have frequently been bottlenecked by not having the right staff, with the required knowledge and expertise, to create an effective bioweapon. Al-Qaeda’s exploration of bioterrorism was led by Rauf Ahmed, who had originally studied microbes related to food production, and thus tried to quickly learn about anthrax and other pathogens. Over the course of 2001, Rauf used his scientific credentials to make headway toward acquiring anthrax. It is not publicly known how far he got; he was arrested that December.
Despite having access to the relevant equipment, Saddam Hussein’s Iraq never turned its anthrax weapon from a less potent liquid form into a more dangerous powder form, which can be stored and released at much higher and more predictable concentration. That’s likely because its scientists lacked the knowledge of the relevant process for drying and milling anthrax. As chatbots become more sophisticated, however, they may inadvertently help individuals with malicious intent to upskill on topics that empower them to do harm.
But how much can you learn from an AI-powered lab assistant alone? After all, to make a pathogen or a bioweapon, you don’t just need instructional knowledge of the sort that can be dished out by an LLM, you need hands-on, tacit knowledge. Tacit knowledge describes all knowledge that cannot be verbalized and can only be acquired through direct experience. Think of how to ride a bike, or for that matter, how to perform molecular biology procedures, which might require knowing how to hold a pipette, shake a flask, or treat your cells. It is difficult to define the extent of this tacit knowledge barrier and how much impact LLMs like ChatGPT may have on lowering it. However, one fact seems clear: If chatbots and AI-powered lab assistants make the creation and modification of biological agents seem more accessible, then it is likely that more individuals will try their hand. And the more who try, the more who will eventually succeed.
Additionally, ChatGPT is just the beginning of language models and related forms of artificial intelligence. Already now, language models are revolutionizing the way scientists can instruct lab robots on what work to perform. Soon, artificial intelligence systems will be able to perform ideation and design of experimental strategies. Thus, artificial intelligence will enable and accelerate the increasing automation of science, reducing the number of scientists required to advance large-scale projects. This will make it easier to develop biological weapons covertly.
Biological design tools could simplify bioweapons
While large language models may eventually push the ceiling of biological design capabilities, more specialized AI tools are already doing this now. Such biological design tools (BDTs) include protein folding models like AlphaFold2 and protein design tools like RFdiffusion. These artificial intelligence tools are usually trained on biological data, such as genetic sequences. They are developed by many different companies and academics to help with important biological design challenges, such as developing therapeutic antibodies. As biological design tools become more powerful, they will enable many beneficial advances like the creation of new medications based on novel proteins or designer viruses.
But such powerful design capabilities may also exacerbate biological risks. At the extreme, biological design tools could allow the design of biological agents with unprecedented properties. It has been hypothesized that natural pathogens feature a trade-off between how transmissible and how deadly they are; designed pathogens might not feature such evolutionary constraints. A group like Aum Shinrikyo could potentially create a pandemic virus much worse than anything nature could produce and thus, biological design tools could turn pandemics from the catastrophic risks they are now into true existential threats. Biological design tools could also enable the creation of biological agents targeted at specific geographies or populations.
In the short term, new design capabilities may challenge existing measures to control access to dangerous toxins and pathogens. Existing security measures tend to focus on proscribed lists of dangerous organisms or screening for known threatening genetic sequences. But design tools may simply generate other agents with similar dangerous properties that such measures wouldn’t catch.
The good news is that — at least initially — new cutting-edge possibilities enabled by biological design tools will likely remain only accessible to a manageable number of existing experts who will use these facilities for legitimate and beneficial purposes. However, this access barrier will fall as biological design tools become so proficient that their outputs require little additional laboratory testing; in particular, as AI language models learn to interface effectively with the tools. Language models are already being linked up to specialized science tools to help with specific tasks and then automatically apply the right tool for the task at hand. Thus, the heights of biological design could quickly become accessible to a very large number of individuals, including ill-intentioned actors.
Why we need mandatory gene synthesis rules
What can be done to mitigate risks emerging from the intersection of AI and biology? There are two important angles: strengthening general biosecurity measures and advancing risk mitigation approaches specific to new artificial intelligence systems.
In the face of increasingly powerful and accessible biological design capabilities, one crucial biosecurity measure is universal gene synthesis screening. The production of the genetic building blocks for a protein or organism is the crucial step in turning digital designs into physical agents. A range of companies specializes in producing such DNA or RNA building blocks. Since 2010, the US government has recommended that such gene synthesis companies screen orders and customers to ensure only legitimate researchers are accessing genetic material for controlled agents. Many cutting-edge gene synthesis companies perform such screening voluntarily and have formed the International Gene Synthesis Consortium to coordinate these activities. However, a significant number of gene synthesis providers still do not screen. Indeed, as the MIT exercise demonstrates, ChatGPT is very adept at pointing out this fact and giving instructions on how to exploit such weaknesses in supply chain security.
What is needed is a mandatory baseline for screening synthetic DNA products. Requiring such baseline screening does not go against the interests of companies: Industry leaders across the US and UK have been screening orders voluntarily and are actively calling for a regulatory baseline to prevent competitors from skimping on safety. Measures to make gene synthesis screening mandatory should capture increasingly common benchtop gene synthesis devices and need to be future-proof to include screening for functional equivalents of concerning agents. Similar customer screening baselines are also needed for other crucial service providers at the boundary of the digital-to-physical, such as contract research organizations providing services to synthesize organisms.
Advancing governance of artificial intelligence
In addition to general biosecurity measures, we also need artificial intelligence-specific interventions. The first focus should be mitigating risks from large language models because not only are these models likely already lowering barriers to biological misuse, but also because their capabilities may increase quickly and unpredictably. One crucial challenge that applies across the whole range of risks posed by large language models is that new and dangerous capabilities may only become clear after the release of the model.
A particularly crucial role in mitigating risks from LLMs may be played by pre-release evaluations of model capabilities. Such pre-evaluations are necessary to ensure that new models do not contain dangerous capabilities on public release — and if conducted by a third party, they could ensure that companies have taken appropriate steps during training and fine-tuning to reduce the chance that these models could enable biological misuse. Releasing models through structured access methods, such as the web ChatGPT interface, can ensure that safeguards can be continuously updated. In contrast, open-sourcing a powerful LLM has significant risks because fine-tuning and safeguards may be easily removed, and if new dangerous capabilities are discovered, it would be impossible to retract a model or update its safeguards.
Generally, the potential impact of artificial intelligence tools on the risk of biological misuse raises a profound question: Who should be able to access dual-use scientific capabilities? For policymakers trying to answer this question, it will be vital to consider diverse voices from across different disciplines, demographics, and geographies. This will require difficult trade-offs between the openness of scientific areas relating to pathogens, law enforcement and monitoring of data streams for illicit activities, and increasing risk of misuse.
One sensible position might be that language models like ChatGPT do not need to provide anyone with detailed step-by-step instructions to create a dangerous strain of pandemic flu. Therefore, it might on balance be preferable if public versions of such models do not give detailed answers to questions on this and other dual-use topics. Notably, Anthropic’s recently released cutting-edge language model Claude 2 features a notably higher barrier than GPT-4 for handing its users detailed instructions for dangerous experiments.
At the same time, it is important that these tools enable scientists with appropriate training and approval to develop new medications and vaccines. Thus, differentiated access methods are needed for AI-powered lab assistants and biological design tools. This might require advancing ways for legitimate scientists to authenticate themselves online. For instance, to access model capabilities for predicting immune evasion variants of influenza virus to inform vaccine design, a scientist might need to authenticate and provide appropriate documentation of biosafety and dual-use review.
Beyond exacerbating biosecurity risks, advances in artificial intelligence also present an opportunity. As progress in AI spurs more rigorous gene synthesis screening, this will strengthen biosecurity more broadly. And as biological risks drive AI governance measures like pre-release evaluations of large language models, this will mitigate a wider array of artificial intelligence risks. Swift action by policymakers will not only enhance safety but also pave the way for reaping the many benefits of artificial intelligence.
Jonas Sandbrink is a biosecurity researcher at the University of Oxford and a biological security adviser at the UK Cabinet Office. This article is based on his recently published preprint titled “Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools.” This article reflects only the opinion of the author and not of the organizations with which he is affiliated.