Research
Projects
Discovery of molecules using machine learning and combinatorial optimization
The main purpose of this research project is to develop methods for automatic generation of molecules that satisfy desired properties, with a focus on the chemistry of organic molecular materials. I have proposed a generic and interpretable evolutionary algorithm named EvoMol, that I have shown to optimize with great success a variety of molecular properties [2]. I have been working on a filter-based approach designed to lead to the generation of realistic molecules [10]. I also published a review of the state of the art of the field of de novo molecular generation [9].
Most properties of interest in the field of organic molecular materials depend on costly quantum chemistry computations (DFT calculations). This motivates the use of machine learning algorithms as fast estimators of these properties. I worked on machine learning methods to predict the DFT-optimized geometry of molecules, which is closely related to the target electronic properties [5, 8]. I worked with a postdoctoral researcher to measure the importance of chemical diversity in the training datasets of machine learning models of molecular properties. We showed that models trained on a commonly used synthetic dataset do suffer from a lack of diversity [3]. We further proposed an efficient method based on EvoMol to maximize various measures of chemical diversity, which we used to obtain a large and diverse dataset of molecules [1].
I have also proposed to combine an optimization method with a machine learning model, in the form of a surrogate-based black-box optimization method. The surrogate function is a machine learning model that estimates the values of a costly molecular property and that is used to select solutions in the search space. I showed that our approach is more efficient than an evolutionary search for the optimization of a costly electronic property [6]. Finally, the use of ML models for molecular chemistry raises questions about their interpretability. I proposed an approach based on EvoMol to generate counterfactual explanations to any binary classification model of molecules [7].
Dereplication in vegetal-based chemistry
I worked with a group of scientists in vegetal-based chemistry during my MSc studies. The aim was to improve a pre-existing tool using NMR spectrum for the identification of compounds in a mixture. I performed a refactoring of the source code and I formalized the matching algorithm and improved its efficiency. The tool was later made public [4].
Publications
Publications in scientific journals
[1] Jules Leguy, Marta Glavatskikh, Thomas Cauchy, and Benoit Da Mota. “Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization”. In: Journal of Cheminformatics 13.1 (Oct. 2021). DOi: 10.1186/s13321-021-00554-8
[2] Jules Leguy, Thomas Cauchy, Marta Glavatskikh, Béatrice Duval, and Benoit Da Mota. “EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation”. In: Journal of Cheminformatics 12.1 (Sept. 2020). DOi: 10.1186/s13321-020-00458-z
[3] Marta Glavatskikh, Jules Leguy, Gilles Hunault, Thomas Cauchy, and Benoit Da Mota. “Dataset’s chemical diversity limits the generalizability of machine learning predictions”. In: Journal of Cheminformatics 11.1 (Dec. 2019). DOi: 10.1186/s13321-019-0391-2
[4] Antoine Bruguière, Séverine Derbré, Joël Dietsch, Jules Leguy, Valentine Rahier, Quentin Pottier, Dimitri Bréard, Sorphon Suor‑Cherer, Guillaume Viault, Anne‑Marie Le Ray, Frédéric Saubion, and Pascal Richomme. “MixONat, a Software for the Dereplication of Mixtures Based on 13C NMR Spectroscopy”. In: Analytical Chemistry 92.13 (July 2020). DOi: 10.1021/acs.analchem.0c00193
[5] Jules Leguy, Thomas Cauchy, Béatrice Duval, and Benoit Da Mota. “Predicting Interatomic Distances of Molecular Quantum Chemistry Calculations”. en. In: Advances in Knowledge Discovery and Management: Volume 9. Studies in Computational Intelligence. Submitted : 2019. Springer International Publishing, 2022. DOi: 10.1007/978-3-030-90287-2_8
Publications in scientific conferences
[6] Jules Leguy, Béatrice Duval, Benoit Da Mota, and Thomas Cauchy. “Surrogate‑Based Black‑Box Optimization Method for Costly Molecular Properties”. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI). Nov. 2021. DOi: 10.1109/ICTAI52525.2021.00124
[7] Jules Leguy, Bryan Garreau, Thomas Cauchy, Benoit Da Mota, and Béatrice Duval. “Génération d’explications contre‑factuelles pour la chimie moléculaire”. In: Workshop EXPLAIN’AI hosted at EGC 2022. French‑speaking. Extraction et Gestion des connaissances, EGC 2022, Blois, France, Jan. 2022
[8] Jules Leguy, Thomas Cauchy, Béatrice Duval, and Benoit Da Mota. “Des réseaux de neurones pour prédire des distances interatomiques extraites d’une base de données ouverte de calculs en chimie quantique”. In: Extraction et Gestion des connaissances, EGC 2019, Metz, France, January 21‑25, French‑speaking. 2019
Publications in scientific books
[9] Jules LEGUY, Thomas CAUCHY, Béatrice DUVAL et Benoit DA MOTA. In : Chapter 2 ‑ Goal‑directed generation of new molecules by AI methods, in Computational and Data‑Driven Chemistry Using Artificial Intelligence. Elsevier, jan. 2022. DOi : 10.1016/B978-0-12-822249-2.00004-9
Preprints
[10] Thomas Cauchy, Jules Leguy, and Benoit Da Mota. “Definition and exploration of realistic chemical spaces using the connectivity and cyclic features of ChEMBL and ZINC.”. en. In: (Dec. 2022). DOi: 10.26434/chemrxiv-2022-2b41l