Machine Learning (BioSB)

Machine Learning for bioinformatics and systems biology

Programme: Bioinformatics and Systems Biology Research School (BioSB); PhD students

Lecturers

Marcel Reinders (Delft University of Technology)
Lodewyk Wessels (Netherlands Cancer Institute)
Perry Moerland (Amsterdam UMC, location: Academic Medical Center)

Course credits

1.5 ECTS for following the course, 3 ECTS when successfully completing a final assignment

Course material

Find all course materials for the January 2025 edition here. You can pre-register for the 2026 edition (May/June probably).

Course overview

Modern biology is a data-rich science, driven by our ability to measure the detailed molecular characteristics of cells, organs, and individuals at many different levels. Interpretation of these large-scale biological data requires the detection of statistical dependencies and patterns in order to establish useful models of complex biological systems. Techniques from machine learning are key in this endeavour. Typical examples are the visualization of single-cell RNA-seq data using dimensionality reduction methods, base calling for nanopore sequencing data using hidden Markov models and (recurrent) neural networks, and classification of high-throughput microscopy image data using convolutional neural networks. In this one-week course, the foundations of machine learning will be laid out and commonly used methods for unsupervised (clustering, dimensionality reduction, visualization) and supervised (mainly classification) learning will be explained in detail. Methods will be illustrated using recent examples from the fields of systems biology and bioinformatics. Methods discussed in the morning lectures will be put into practice during the afternoon computer lab sessions.

Topics include:

Density estimation, including nearest neighbour, Parzen
Performance evaluation, including ROC, cross-validation
Parametric and non-parametric classifiers, including linear discriminant analysis, k-nearest neighbours, logistic regression, decision trees and random forests
Feature selection, including search algorithms (forward, backward, branch & bound) and sparse classifiers (ridge, lasso)
Dimensionality reduction, including principal component analysis, multi-dimensional scaling, t-SNE.
Clustering, including hierarchical clustering, k-means, Gaussian mixture models
(Deep) Neural networks
Kernel-based methods, including support vector machines
Recent developments: variational autoencoders, diffusion models

After having followed this course, the student has a good understanding of a wide range of machine learning techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics and systems biology applications.

Target audience

The course is aimed at PhD students with a background in bioinformatics, systems biology, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Related Posts

Application of Machine Learning Techniques on Gene Expression Data To Unravel Mechanisms in Systemic Lupus Erythematosus Disease Activity

Modelling the dynamics of TNF-alpha

Systematic evaluation of the robustness of deconvolution methods for spatial transcriptomics data

Introduction to Bioinformatics (ARCAID)