In bioinformatics, the comparison and clustering of biological molecules serve as essential methodologies for deciphering the intricate complexities of biological systems. These processes play a pivotal role in elucidating evolutionary relationships, functional similarities, and structural motifs across diverse organisms and biomolecules. Through sequence alignment, structural superposition, and clustering techniques, researchers can discern patterns, similarities, and differences within vast datasets of DNA, RNA, proteins, and small molecules. Such analyses not only aid in understanding the fundamental mechanisms governing cellular processes but also pave the way for advancements in drug discovery, protein engineering, and personalized medicine. Moreover, by organizing biological data into meaningful clusters based on shared characteristics, these methodologies streamline data interpretation, enabling researchers to extract valuable insights and make informed decisions in their quest to unravel the mysteries of life at the molecular level.
In this seminar, we will look at two groups of tools. The first group computes clusters directly while the second group computes pairwise inter-sample similarities or distances that can then be used by classical clustering algorithms such as spectral clustering or agglomerative clustering. The goal for this seminar is to equip participants with a comprehensive understanding of the principles, methodologies, and practical applications of comparing and clustering biological molecules using cutting-edge tools such as FoldSeek, MASH, and CD-HIT. By the end of the seminar, participants should be able to proficiently utilize these tools to perform sequence alignment, structural comparison, and clustering analyses on various biological datasets. Moreover, they should gain insights into how these analyses contribute to advancements in genomics, proteomics, and drug discovery.
This (pro-)seminar has no formal requirements.
We will check every submission for plagiarism with TurnItIn. This is an online tool automatically checking submissions for plagiarism. You are free (and encouraged) to use it before submitting your final report. Following the link above, you can login with your UdS-credentials (as you use for the students email) and use TurnItIn for free. With attendance of this seminar, you agree that we upload your report to TurnItIn.
If we detect plagiarism in your work, you will have the chance to explain yourself. Ultimately, you will fail this seminar if your explanation is not convincing.
Please register to this seminar by writing an email to Roman Joeres before 19.04.2023 23:59. Please also attach your transcript of records which can be downloaded from the LSF/HISPOS. We will distribute the topics among students in the mandatory-to-attend kickoff meeting at 23.04.2024 12 PM in E2.1 SR 007.
Comparison means a tool computes a matrix of pairwise distances or similarities.
Clustering means a tool computes clusters of samples without telling how similar/distance these are.
Tuesday, Oct. 8
09:00 - Welcome & Opening words
09:05 - TMalign (Johanna Straub)
09:50 - CD-HIT (Maximilian Bähr, Proseminar)
10:25 - MMSeqs (Johanna Bechher)
11:10 - Break
11:15 - FoldSeek (Zyad Ahmed)
12:00 - Weisfeiler Lehman Graph Kernel (Varvara Kotelnikova, Proseminar)
12:35 - MCES (Katja Räda, Proseminar)
13:05 - projected end
Wednesday, Oct. 9
09:00 - DIAMOND (Anastasia Lesnikov)
09:45 - MashMap (Pranjali Jain)
10:30 - MCL (Masbah Sayeeda Musheer)
11:15 - Break
11:20 - MASH (Max Asenow)
12:05 - GClust (Sarnsh Shiva Nair)
12:50 - SANS (Vishak Kadamalithaya)
13:30 - projected end
This site was created with the Nicepage