Supercomputers find molecular needles in virtual haystacks
Two new studies from SciLifeLab researchers showcase different strategies on how supercomputers can rapidly identify new drug candidates.
One of the biggest challenges in any drug discovery campaign is finding the most promising molecules among a vast number of possible candidates. Traditional efforts focus on testing molecules from physical chemical collections in the lab, which is not only costly but also overlooks potential drugs that could have been discovered in larger virtual collections. Two recent publications in Nature Computational Science and Nature Communications demonstrate how promising drug candidates can be found in virtual databases by using computer algorithms.
“Our computer models can search through databases containing billions of virtual molecules, which can speed up the costly drug development process”, says SciLifeLab Fellow Jens Carlsson (UU), who co-led the studies.
Machine learning hits two targets with one arrow
In their search for promising molecules, the researchers sifted through virtual databases containing billions of compounds that can be purchased from Enamine, a Ukrainian company that builds molecules on demand.
“These databases are enormous and even access to the strongest Swedish supercomputers would not allow us to computationally dock all molecules against our proteins of interest,” says DDLS Fellow Andreas Luttens (KI), first author of the studies.
To address this challenge, the researchers trained machine learning models to identify promising molecules for a particular protein. They demonstrated that, out of billions of candidates, these models could rapidly pinpoint the best-scoring molecules. In collaboration with the University of Santiago de Compostela, they also showed that their computer models were capable of discovering a novel molecule that simultaneously binds to two proteins, A2AR and D2R, both implicated in Parkinson’s disease.
“It’s amazing that we can now design these complex molecules and show that they actually work exactly as we hoped,” says Jens Carlsson.
Molecular jigsaw puzzles
In the second study, the researchers, in collaboration with Karolinska Institutet and Stockholm University, investigated the enzyme OGG1 – a protein linked to cancer and inflammatory diseases. In search for molecules that bind to this protein and inhibit its activity, they first used computer models of the protein to identify very small molecules, or fragments. The researchers then grew the fragments step by step into bigger molecules, which showed promising anti-inflammatory effects in experiments.
“It’s like solving a molecular jigsaw puzzle, where one piece is attached to new pieces until it forms a drug molecule that perfectly fits the target protein,” says Jens Carlsson.
A new algorithm developed by Andreas Luttens, called UniverseGenerator, revealed that their stepwise approach would not only work for the billions of molecules currently available for purchase, but also for sextillions (1 followed by 22 zeros) of theoretical molecules.
“This strategy allows us to efficiently search through the largest databases. However, the true breakthrough will come when we can reliably predict which of these molecules can also be synthesized in the lab,” as the researchers explain.
Read the articles:
Luttens, A. et al. Rapid traversal of vast chemical space using machine learning-guided docking screens. Nature Computational Science, (2025). DOI: 10.1038/s43588-025-00777-x
Luttens, A. et al. Virtual fragment screening for DNA repair inhibitors in vast chemical space. Nature Communications 16, 1741 (2025). DOI: 10.1038/s41467-025-56893-9