CoRNAGaT : Algorithmes basés sur la théorie de jeux pour la prédiction de complexes ARN-ARN et A

When:
17/05/2021 – 18/05/2021 all-day
2021-05-17T02:00:00+02:00
2021-05-18T02:00:00+02:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IBISC
Durée : 3 ans
Contact : fariza.tahi@univ-evry.fr
Date limite de publication : 2021-05-17

Contexte :
The objective of the doctoral project will consist, from the initial work carried out in the IBISC and DAVID laboratories, to define, theoretically validate and then experimentally validate an approach for the prediction of the 3D structure of RNA-RNA and RNA-Protein complexes.
Most of the work in structural biology concerns protein molecules. But in recent years, RNAs, which constitute another type of molecules having, like proteins, a 3D conformation, have aroused growing interest. They have various functions, supposedly related to their shape and physicochemical properties, and also to their interactions with other molecules (proteins, as well as RNAs). Awareness of the existence of these non-coding RNAs during the last decade resulted in a renewed interest in studying their structure. For example, they are now being considered as possible therapeutic targets, as are various classes of proteins. Moreover, the determination of the complexes they form when they interact with proteins or with other RNAs would help to better understand their role in diseases like cancer. Many of such complexes, whose structures have been determined by experimental methods like crystallography or NMR, are available in the PDB database (https://www.rcsb.org). However, because of the important cost of these methods, computational methods are needed to make faster the discovery of complexes of RNAs and proteins, by proposing models allowing to predict potential structures that could be validated in a second step by experimental methods.
The computational methods that are proposed in the literature, like GARN2 [9] or RNAcomposer [19], are mostly interested in predicting the 3D structure of an RNA (often based on the energy which must be as low as possible) without taking into account its environment, i.e. its interactions with other RNAs or proteins. However, RNAs are very flexible molecules, and their 3D structure can vary under the effect of these interactions. Therefore, the 3D structure of a given RNA involved in a complex is not always that of minimum energy, since the stability of the complex (i.e. the global energy) is also required. Some methods are therefore proposed for taking into account these interactions, but in our knowledge, all are limited to predicting interaction between two RNAs such as Rascal [23], or between an RNA and a protein such as ICM [1].

Sujet :
In this thesis project we propose to develop methods based on game theory, which would take into account the interdependence between the 3D structure of RNA molecules and the interactions they have with each other and with proteins. We will consider that for each RNA, different possible 3D structures are predicted upstream by existing tools. We will also consider, as a first step, that potential interaction regions are known, eventually predicted. The problem will be modeled as a game in a graph G(V,E), where the vertices (the players) represent the 3D structures of RNAs or proteins, and the edges the possible interactions between the associated 3D structures. Each vertex, or agent, will represent a different RNA, and for each vertex v of V, we know a set of possible 3D structures S_v, and for each 3D structure we know a set of potential interactions areas that may be involved if the RNA represented by v interacts with other RNAs. Each player will have at his disposal 3 sets of actions: he will have to choose a configuration among a (large) set of configurations, a subset of adjacent edges to indicate with which other RNAs he decides to have an interaction, and for each selected edge, its potential interaction area among a set of potential interaction areas calculated beforehand. In order to better guide the search, it will be possible to introduce a distance on the set of 3D structures S_v, similarly we can know in advance that some edges of the graph G will never be used because the corresponding interactions are too weak.
We will look for complexes which are Nash equilibria. To compute Nash equilibria (stable solutions predicting complexes) in such a game theory approach, reinforcement learning and online learning techniques will be used. This approach has been previously used for the computation of 3D RNA structures [8,9]. For this, several algorithms exist,) [2], [3]. The main challenge, compared to classical models, is the definition of the utility functions for the players, that have on one hand to be calculated very quickly, and on the other hand to be sufficiently complex so that the Nash equilibria found are of good quality with respect to a global objective function (energy for example) that is too expensive, in computing time, to optimize. As a first approximation the utility function of a player could be equal to the energy of the configuration he has chosen plus the energies of the interactions with the other RNAs with which he has decided to interact (or a score function related to docking). Another possibility could be to use a multi-criteria approach along with game theory [13]. Indeed, we recently have shown that additional criteria (based on experimental data like SHAPE or based on the satisfaction of user constraints) could be used to improve the predictions of structures (see also BiORSEO [7]), considering insertion of 3D motifs as an additional criterion. The utility functions to be defined must also take into account, for each molecule, any spatial congestion of all of its neighboring molecules, for example by checking intersections of spheres approximating each 3D configuration. Another difficulty is that the actions that players take should have to be symmetrical (an interaction is considered only if both involved molecules choose it). This gives rise to generalized Nash equilibria [10], and the search for decentralized learning algorithms in such a context is a subject of research [24].
The first step of the doctoral project will therefore to finalize the game model described above. Then, it will be a question of correctly identifying and setting up the right online learning approach combined with local optimization methods, in order to make this distributed system converge towards equilibria close to realistic complexes. Finally, the approach finally retained and experimentally validated will be used to treat real cases of RNA-RNA and RNA-protein complexes available in PDB database (https://www.rcsb.org).
Références :
[1] Arnautova Y.A., Abagyan R., Totrov M., Protein-RNA docking using ICM, J. Chem. Theory Comput. 2018
[2] Auer P, Cesa-Bianchi N, Fischer P., Finite-time analysis of the multiarmed bandit problem, Machine learning, 2002; 47(23):235-256.
[3] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE, The nonstochastic multiarmed bandit problem, SIAM Journal on Computing, 2002; 32(1):48–77.
[4] Barth D., Bougueroua S., Gaigeot M.-P., Quessette F., Spezia R., et al. Graph theory for automatic structural recognition in molecular dynamics simulations. The Journal of chemical physics 149 (18), 2018.
[5] de Beauchene IC, de Vries SJ, Zacharias M. Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins, Nucleic Acids Res. 2016;44(10):4565-4580.
[6] Becquey L, Angel E, Tahi F., RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures, Bioinformatics. 2020; btaa944.
[7] Becquey L., Angel E., Tahi F., BiORSEO: A bi-objective method to predict RNA secondary structures with pseudoknots using RNA 3D modules, Bioinformatics, 2020 btz962.
[8] Boudard M, Bernauer J, Barth D, Cohen J, Denise A., GARN: Sampling RNA 3D Structure Space with Game Theory and Knowledge-Based Scoring Strategies, PLoS One. 2015;10(8):e0136444. Published 2015 Aug 27.
[9] Boudard M, Barth D, Bernauer J, Denise A, Cohen J., GARN2: coarse-grained prediction of 3D structure of large RNA molecules by regret minimization, Bioinformatics (Oxford, England). 2017 Aug;33(16):2479-2486.
[10] Dutang C., Existence theorems for generalized Nash equilibrium problems: an analysis of assumptions, Journal of Nonlinear Analysis and Optimization, Sompong Dhompongsa and SomyotPlubtieng, 2013, 4 (2), pp.115-126.
[11] Engelen S, Tahi F. Tfold: efficient in silico prediction of non-coding RNA secondary structures, Nucleic Acids Res. 2010;38(7):2453-2466.
[12] Fortunel N.O., Chadli L, Coutier J, Lemaître G, Auvré F, Domingues S, Bouissou-Cadio E, Vaigot P, Cavallero S, Deleuze JF, Roméo PH, and Martin MT, KLF4 inhibition promotes expansion of adult human epidermal precursors and embryonic stem-cell-derived keratinocytes, Nature Biomed Eng, 2019, Dec;3(12): 985-997.
[13] A. Kagrecha et al., Constrained regret minimization for multi-criterion multi-armed bandits, arXiv:2006.09649, juin 2020
[14] Lamiable A., Barth D., Denise A., Quessette F., Vial S., Westhof E.: Automated prediction of three-way junction topological families in RNA secondary structures, Comput. Biol. Chem. 37: 1-5 (2012)
[15] Lamiable A., Quessette F., Vial S., Barth D., Denise A.: An Algorithmic Game-Theory Approach for Coarse-Grain Prediction of RNA 3D Structure, IEEE ACM Trans. Comput. Biol. Bioinform. 10(1): 193-199 (2013)
[16] Legendre A, Angel E, Tahi F., Bi-objective integer programming for RNA secondary structure prediction with pseudoknots, BMC Bioinformatics. Jan 15;19(1) :13. 2018.
[17] Legendre, A., E. Angel, and F. Tahi., RCPred: RNA Complex Prediction as a constrained maximum weight clique problem, BMC Bioinformatics, 2019 Mar 29;20(Suppl 3):128.
[18] Legendre, A., Ibéné M., E. Angel, and F. Tahi, C-RCPred: A multi-objective algorithm for interactive prediction of RNA complexes integrating user knowledge and probing data, to be submitted to ISMB’2021
[19] Popenda, M., Szachniuk, M., Antczak, M., Purzycka, K.J., Lukasiak, P., Bartol, N., Blazewicz, J., Adamiak, R.W., Automated 3D structure composition for large RNAs, Nucleic Acids Research, 2012, 40(14):e112
[20] Tav C, Tempel S, Poligny L, Tahi F. miRNAFold: a web server for fast miRNA precursor prediction in genomes. Nucleic
Acids Res. 2016;44(W1):W181-W184.
[21] Tempel S, Tahi F., A fast ab-initio method for predicting miRNA precursors in genomes, Nucleic Acids Res. 2012;40(11):e80.
[22] A Vulin, M Sedkaoui, S Moratille, N Sevenet, P Soularue, O Rigaud, L Guibbal, J Dulong, P Jeggo, JF Deleuze, J Lamartine and MT Martin. Severe PATCHED1 deficiency in cancer-prone Gorlin patient cells results in intrinsic radiosensitivity. Int J Radiat Oncol Biol Phys.2018,1;102(2):417-425.
[23] Yamasaki S, Hirokawa T, Asai K, Fukui K., Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method, J Chem Inf Model. 2014;54:672–682
[24] C. Yu, M. Van der Schaar and A. H. Sayed, Distributed Learning for Stochastic Generalized Nash Equilibrium Problems, IEEE Transactions on Signal Processing, vol. 65, no. 15, pp. 3893-3908, 1 Aug.1, 2017

Profil du candidat :
Candidats avec un niveau Master 2 ou équivalent (3eme année d’ingénieur).

Formation et compétences requises :
Formation de niveau M2 en Informatique (avec une certaine formation en biologie), ou en Bioinformatique / Biologie Computationnelle. Le candidat doit avoir une solide formation en algorithmique et en optimisation combinatoire. Une certaine expérience en bioinformatique structurale serait appréciée.

Adresse d’emploi :
Bâtiment IBGBI. 23 bv. de France. 91000 Evry, France