Network Links All Known Chemical Compounds, Reactions
Northwestern Univ. scientists have connected 250 years of organic chemical knowledge into one giant computer network – a chemical Google on steroids. This “immortal chemist” will never retire and take away its knowledge but instead will continue to learn, grow and share.
A decade in the making, the software optimizes syntheses of drug molecules and other important compounds, combines long (and expensive) syntheses of compounds into shorter and more economical routes and identifies suspicious chemical recipes that could lead to chemical weapons.
“I realized that if we could link all the known chemical compounds and reactions between them into one giant network, we could create not only a new repository of chemical methods but an entirely new knowledge platform where each chemical reaction ever performed and each compound ever made would give rise to a collective ‘chemical brain,’” says Bartosz Grzybowski, who led the work. “The brain then could be searched and analyzed with algorithms akin to those used in Google or telecom networks.”
Called Chematica, the network comprises some seven million chemicals connected by a similar number of reactions. A family of algorithms that searches and analyzes the network allows the chemist at his or her computer to easily tap into this vast compendium of chemical knowledge. And the system learns from experience, as more data and algorithms are added to its knowledge base.
Details and demonstrations of the system are published in three back-to-back papers in the journal Angewandte Chemie. Grzybowski is the senior author of all three papers.
In the Angewandte papers, the researchers have demonstrated algorithms that find optimal syntheses leading to drug molecules and other industrially important chemicals.
“The way we coded our algorithms allows us to search within a fraction of a second billions of chemical syntheses leading to a desired molecule,” Grzybowski says. “This is very important since within even a few synthetic steps from a desired target the number of possible syntheses is astronomical and clearly beyond the search capabilities of any human chemist.”
Chematica can test and evaluate every possible synthesis that exists, not only the few a particular chemist might have an interest in. In this way, the algorithms find truly optimal ways of making desired chemicals.
The software already has been used in industrial settings, Grzybowski says, to design more economical syntheses of companies’ products. Synthesis can be optimized with various constraints, such as avoiding reactions involving environmentally dangerous compounds. Using the Chematica software, such green chemistry optimizations are just one click away.
Another important area of application is the shortening of synthetic pathways into the so-called “one-pot” reactions. One of the holy grails of organic chemistry has been to design methods in which all the starting materials could be combined at the very beginning and then the process would proceed in one pot – much like cooking a stew – all the way to the final product.
The chemists have taught their network some 86,000 chemical rules that check – again, in a fraction of a second – whether a sequence of individual reactions can be combined into a one-pot procedure. Thirty predictions of one-pot syntheses were tested and fully validated. Each synthesis proceeded as predicted and had excellent yields.
In one striking example, Grzybowski and his team synthesized an anti-asthma drug using the one-pot method. The drug typically would take four consecutive synthesis and purification steps.
“Our algorithms told us this sequence could be combined into just one step, and we were naturally curious to check it out in a flask,” Grzybowski says. “We performed the one-pot reaction and obtained the drug in excellent yield and at a fraction of the cost the individual steps otherwise would have accrued.”
The third area of application is the use of the Chematica network approach for predicting and monitoring syntheses leading to chemical weapons.
“Since we now have this unique ability to scrutinize all possible synthetic strategies, we also can identify the ones that a potential terrorist might use to make a nerve gas, an explosive or another toxic agent,” Grzybowski says.
Algorithms known from game theory first are applied to identify the strategies that are hardest to detect by the federal government – the use of substances, for example, such as kitchen salt, clarifiers, grain alcohol and a fertilizer, all freely available from a local convenience store. Characteristic combinations of seemingly innocuous chemicals, such as this example, are red flags.
This strategy is very different from the government’s current approach of monitoring and regulating individual substances, Grzybowski says. Chematica can be used to monitor patterns of chemicals that together become suspicious, instead of monitoring individual compounds. Grzybowski is working with the federal government to implement the software.
Chematica now is being commercialized. “We chose this name,” Grzybowski says, “because networks will do to chemistry what Mathematica did to scientific computing. Our approach will accelerate synthetic design and discovery and will optimize synthetic practice at large.”