Machine Learning for bioinformatics and systems biology

Coordinator: Perry Moerland

Programme: Bioinformatics and Systems Biology Research School (BioSB); PhD students

Lecturers

  • Marcel Reinders (Delft University of Technology)
  • Lodewyk Wessels (Netherlands Cancer Institute)
  • Perry Moerland (Amsterdam UMC, location: Academic Medical Center)

Course credits

1.5 ECTS for following the course, 3 ECTS when successfully completing a final assignment

Course material

Find all course materials for the September 2022 edition here. The course reached the maximum number of participants, but you can still be added to the waiting list or pre-register for the 2023 edition.

Course overview

Modern biology is a data-rich science, driven by our ability to measure the detailed molecular characteristics of cells, organs, and individuals at many different levels. Interpretation of these large-scale biological data requires the detection of statistical dependencies and patterns in order to establish useful models of complex biological systems. Techniques from machine learning are key in this endeavour. Typical examples are the visualization of single-cell RNA-seq data using dimensionality reduction methods, base calling for nanopore sequencing data using hidden Markov models and (recurrent) neural networks, and classification of high-throughput microscopy image data using convolutional neural networks. In this one-week course, the foundations of machine learning will be laid out and commonly used methods for unsupervised (clustering, dimensionality reduction, visualization) and supervised (mainly classification) learning will be explained in detail. Methods will be illustrated using recent examples from the fields of systems biology and bioinformatics. Methods discussed in the morning lectures will be put into practice during the afternoon computer lab sessions.

Topics include:

  • Density estimation, including histograms, nearest neighbour, Parzen
  • Evaluation, including ROC, cross-validation
  • Parametric and non-parametric classifiers, including linear discriminant analysis, k-nearest neighbours, logistic regression, decision trees and random forests
  • Feature selection, including search algorithms (forward, backward, branch & bound) and sparse classifiers (ridge, lasso, elastic net)
  • Dimensionality reduction, including principal component analysis, multi-dimensional scaling, t-SNE.
  • Clustering, including hierarchical clustering, k-means, Gaussian mixture models
  • Hidden Markov models
  • (Deep) neural networks
  • Kernel-based methods, including support vector machines

After having followed this course, the student has a good understanding of a wide range of machine learning techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics and systems biology applications.

Target audience

The course is aimed at PhD students with a background in bioinformatics, systems biology, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background.