Generation of graphical structures through deep reinforcement learning: application to molecular chemistry

When:
12/06/2023 all-day
2023-06-12T02:00:00+02:00
2023-06-12T02:00:00+02:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LERIA – Université d’Angers
Durée : 3 years
Contact : nicolas.gutowski@univ-angers.fr
Date limite de publication : 2023-06-12

Contexte :
In many areas of chemistry, the discovery of new molecules often involves building upon an existing effective compound through chemical reactions (addition, substitution, etc.) to enhance its properties. The emergence of truly new molecules is a rarer phenomenon but can pave the way for intensification and profound transformations in the field. It is precisely with this goal in mind that research on de novo generation of molecules with desired properties has been developed, particularly for drug and material discovery. Challenges in this research domain include the size of the search space and the difficulty of generating molecules that can be synthesized.

Molecules can be represented as graphs, where the vertices are labeled according to atom types and the edges represent bond types. This is a problem of generating graph structures, where the objective is the combination of one or more functions to optimize and constraints to satisfy. To address our application in chemistry, we have recently proposed an evolutionary algorithm for molecule generation called EvoMol [1], which can freely explore the chemical space and tackle diverse problems. This generator has achieved benchmark results in multi-property optimization and applied problems. It is capable of incorporating synthesizability constraints [2] and promoting diversity in the generated molecules [3]. However, significant challenges still need to be addressed, and two of these challenges are at the core of the proposed topic.

The first area of improvement is the selection of actions on the graph, which is currently random in EvoMol. One would expect an intelligent method to apply a policy for choosing actions that have been successful in the past, similar to a chemist adding a known chemical function to enhance a target property. Preliminary work by N. Gutowski and B. Da Mota has shown the potential of reinforcement learning using bandit algorithms and Q-Learning for certain problems, highlighting the need for further methods. The second area of improvement relates to synthesizability, which is a crucial objective for real-world applications. We have proposed synthesizability constraints [2] that make the generated molecules likely to be synthesizable. Other works propose heuristic scores [4,5], sometimes based on retrosynthesis [6]. A method capable of constructing a molecule with the desired properties, along with the steps for its synthesis, would have many advantages. Although some work has emerged in this field [7], they are limited to simple problems.

Artificial learning involving graphs is a promising theme with numerous applications that has benefited from major advances in deep learning, such as Graph Neural Networks (GNN [8]), Graph Convolutional Networks (GCN [9]), and more recently, Graph Transformer Networks (GTN [10]). These approaches have been quickly adapted and applied to various applications, including molecular generation [11,12]. However, these powerful architectures have been relatively less explored in the context of constrained exploration and diverse optimization objectives, such as those of interest in our project (e.g., organic solar cells). For these complex objectives of sequentially constructing useful and realistic molecules, reinforcement learning appears to be a promising alternative to the meta-heuristics and latent space manipulation approaches [13] (such as variational autoencoders for molecules [14]) that have been employed thus far. The latter approaches face well-known challenges of posterior collapse and are limited to optimizing simple (i.e., differentiable) properties. While reinforcement learning is primarily used in the domain of games and robotics to learn optimal sequences of actions, its application for controlled generation of complex data has recently shown diverse promising developments, particularly in the field of natural language generation [15,16]. In protein chemistry, it has achieved widespread success through tools like AlphaFold [17]. However, although there are some works on molecular optimization problems [11,18,19], the issue of synthesizability is almost always minimized or neglected, leaving this task to retrosynthesis tools. However, even if such methods were used on molecules generated by current generative models, there is no guarantee that the necessary reagents to produce these molecules themselves can be obtained. This top-down method, performed a posteriori, is computationally expensive and weak. It does not allow setting objectives for the construction cost of the structure or other criteria that would help narrow down the search space. In chemistry, for example, one might want to minimize synthesis costs, the number of steps, minimize hazardous or difficult-to-recycle waste, etc. The joint integration of the bottom-up graph construction process and the optimization of these properties would be an elegant, effective, and original approach.

The application of deep reinforcement learning techniques for the discovery of molecules that are both stable, synthesizable, and exhibit properties of interest in the target domains is not without various scientific challenges that need to be overcome, based on the expertise of the supervising team in sequential learning: the input objects for policies are graphs, the actions are non-trivial in the case of reaction patterns, and the underlying problem is a multi-objective optimization problem. However, this type of application lends itself well to transfer or progressive learning (curriculum learning [20]), with the possibility of learning a policy on a simpler problem such as constraint optimization or synthesizability heuristics, and then improving this policy to optimize more complex objectives that include these synthesizability issues.

Beyond the intended application, the development of reinforcement learning (RL) techniques for policies conditioned on objective graphs, as suggested by the synthesis of molecules using available reactions, is an important theme for the Machine Learning community. While multi-task RL, specifically goal-conditioned RL, is expanding in the literature of the field, very few approaches deal with complex graph structures. In this context, the search for invariants in the manipulated structures will be a key lever for establishing effective policies. Additionally, automatic curriculum learning, which has seen numerous recent developments [20,21,22,23] by dynamically determining task specification distributions adapted to the current level of the learning agent, has not yet been deployed to our knowledge in environments with known dynamics, as in the case of our application. The exploitation and adaptation of Monte Carlo Tree Search (MCTS) planning algorithms for this framework seems to be a particularly promising research direction that we intend to develop in this thesis.

Références
[1] J. Leguy, T. Cauchy, M. Glavatskikh, B. Duval, et B. Da Mota. EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation. Journal of Cheminformatics, 2020.
[2] T. Cauchy, J. Leguy et B. Da Mota. Definition and exploration of realistic chemical spaces using the connectivity and cyclic features of ChEMBL and ZINC. Digital Discovery, Royal Society of Chemistry, under review.
[3] J. Leguy, M. Glavatskikh, T. Cauchy, et B. Da Mota. Scalable Estimator of the Diversity for De Novo Molecular Generation Resulting in a More Robust QM Dataset (OD9) and a More Efficient Molecular Optimization. Journal of Cheminformatics, 2021.
[4] Voršilák et al. SYBA: Bayesian estimation of synthetic accessibility of organic compounds. Journal of Cheminformatics, 2020.
[5] Bühlmann et al. ChEMBL-Likeness Score and Database GDBChEMBL. Frontiers in Chemistry, 2020.
[6] Thakkar et al. Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chemical Science, 2021.
[7] Bradshaw et al. Generating Molecules via Chemical Reactions. Workshop DeepGenStruct, ICLR 2019.
[8] Scarselli et al. The Graph Neural Network Model. IEEE Transactions on Neural Networks, 2009.
[9] Kipf et al. Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations (ICLR), 2017.
[10] Yun et al. Graph Transformer Networks. Advances in Neural Information Processing Systems, NeurIPS, 2019.
[11] J. Leguy, T. Cauchy, B. Duval, et B. Da Mota. Goal-directed generation of new molecules by AI methods. Chapter of Computational and Data-Driven Chemistry Using Artificial Intelligence, Elsevier, 2022.
[12] Thölke et al. TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials. arXiv preprint arXiv:2202.02541, 2022.
[13] Zhang et al. Comparative Study of Deep Generative Models on Chemical Space Coverage. Journal of Chemical Information and Modeling, 2021.
[14] Liu et al. Constrained Graph Variational Autoencoders for Molecule Design. Advances in Neural Information Processing Systems, 2019.
[15] S. Lamprier, T. Scialom, A. Chaffin, V. Claveau, E. Kijak, J. Staiano, B. Piwowarski. Generative Cooperative Networks for Natural Language Generation. ICML, 2022
[16] T. Scialom, S. Lamprier, B. Piwowarski, J. Staiano. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models. EMNLP/IJCNLP, 2019.
[17] Jumper et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021
[18] Khemchandani et al. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. Journal of Cheminformatics, 2020.
[19] Zhou et al. Optimization of Molecules via Deep Reinforcement Learning. Scientific Reports, Nature, 2019.
[20] N. Castanet, S. Lamprier, O. Sigaud. Stein Variational Goal Generation For Reinforcement Learning in Hard Exploration Problems. CoRR 2022
[21] Andrychowicz et al. Hindsight Experience Replay. arXiv preprint arXiv:1707.01495, 2017.
[22] Florensa et al. Automatic goal generation for reinforcement learning agents. International conference on machine learning, PMLR, 2018.
[23] P-A Kamienny, J. Tarbouriech, S. Lamprier, A. Lazaric, L. Denoyer: Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching. ICLR 2022

Sujet :
The first objective of the thesis will be to propose and implement reinforcement learning methods adapted to the problem, and then conduct a methodological study on toy problems, domain benchmarks, and realistic applications proposed by our chemistry partner. Learning a policy for generating molecules can be studied within the framework of classical reinforcement algorithms. Since the inputs to the policies are molecular graphs, it will be possible to use the descriptors proposed in our previous work or neural networks (Graph Neural Networks) from recent research that need to be adapted to the characteristics of molecular graphs.

The second objective of the thesis will be to transform a list of known reactions into actions applicable to molecular graphs, and then learn to sequence these chemical reactions to synthesize a target molecule (i.e., goal-conditioned RL, in the perspective of innovative conditioned bottom-up generation, rather than the usual top-down approaches involving complex and non-generalizable retrosynthesis calculations). It will also be possible to derive a fine-grained estimator of synthesizability. This objective will be the main focus of the research effort in machine learning, with valuable contributions to the statistical learning community (e.g., adaptation of planning approaches to the automatic curriculum framework).

The third and final objective will be to apply the sequencing of chemical reactions within the algorithms and developments proposed in the first part of the thesis, and then study the use of these actions in terms of performance and synthesizability criteria. The secondary benefit of this method is the possibility of not only proposing an optimized target but also justifying our proposal through the sequence of reactions that led to its elaboration.

Profil du candidat :
– H/F
– Master 2 ou école d’ingénieur

Formation et compétences requises :
– Bac+5
– Apprentissage par renforcement
– Deep Learning / Machine Learning
– Python

Adresse d’emploi :
UFR Sciences, 2 Bd de Lavoisier, 49000 Angers