Predicting scalar coupling constants via machine learning

FFI-Report 2021

About the publication

Report number

21/02531

ISBN

978-82-464-3382-0

Format

PDF-document

Size

549.9 KB

Language

English

Download publication
Fredrik Bakken Lars L Sandberg Dennis Christensen Thor Engøy Hallvar Gisnås Lars Aurdal
Over the preceding decade, machine learning techniques have been successfully applied in several fields of research, including the prediction of chemical properties of atoms and molecules. Whereas conventional quantum chemical methods can be very computationally expensive, machine learning algorithms give rise to fast and accurate predictions beyond the known data set, given that they have been trained with a sufficient amount of quality data. Online platforms, such as Kaggle (kaggle.com), host machine learning competitions with welldefined problem descriptions and a substantial amount of accompanying data. These provide a well-defined objective with a clear-cut deadline, making them ideal for short-term focused research work. In addition, the Kaggle website serves as an interactive learning environment, with a continually updated scoreboard and public discussion forum. During the summer of 2019, a team of students and scientists at the Norwegian Defence Research Establishment (FFI) participated in the Kaggle competition Predicting Molecular Properties, where the task was to predict the scalar coupling constant via machine learning. The scalar coupling constant is an expression of the magnetic interactions between atoms in a molecule and depends on its atomic composition and geometry. We investigated several mathematical representations of molecular data as inputs to various supervised learning algorithms, including deep neural networks and gradient boosting trees. Combining the molecules’ distance matrices with angular information provided a flexible data representation, enabling accurate predictions. Our most successful model comprised an ensemble of deep neural networks and gradient boosting trees, resulting in a 308th place among the 2,737 competing teams. A key factor of the team’s success was the mixture of relevant domain expertise and machine learning experience.

Newly published