the bioinformatics chat, a podcast about bioinformatics

The bioinformatics chat is a podcast about computational biology, bioinformatics, and next generation sequencing.

The bioinformatics chat is produced and hosted by Roman Cheplyaka. Several awesome machine learning-themed episodes have been hosted by Jacob Schreiber.

Subscribe to the bioinformatics chat on Apple Podcasts, Pocket Casts, Spotify, or any other podcasting app via the RSS feed link. You can also follow the podcast on Mastodon and Twitter and support it on Patreon.

Prioritizing drug target genes with Marie Sadler (#70)

December 21, 2023

In this episode, Marie Sadler talks about her recent Cell Genomics paper, Multi-layered genetic approaches to identify approved drug targets.

Previous studies have found that the drugs that target a gene linked to the disease are more likely to be approved. Yet there are many ways to define what it means for a gene to be linked to the disease. Perhaps the most straightforward approach is to rely on the genome-wide association studies (GWAS) data, but that data can also be integrated with quantitative trait loci (eQTL or pQTL) information to establish less obvious links between genetic variants (which often lie outside of genes) and genes. Finally, there’s exome sequencing, which, unlike GWAS, captures rare genetic variants. So in this paper, Marie and her colleagues set out to benchmark these different methods against one another.

Listen to the episode to find out how these methods work, which ones work better, and how network propagation can improve the prediction accuracy.

Download MP3

Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa (#69)

September 29, 2023

Today on the podcast we have Tomasz Kociumaka and Dominik Kempa, the authors of the preprint Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space.

The suffix array is one of the foundational data structures in bioinformatics, serving as an index that allows fast substring searches in a large text. However, in its raw form, the suffix array occupies the space proportional to (and several times larger than) the original text.

In their paper, Tomasz and Dominik construct a new index, δ-SA, which on the one hand can be used in the same way (answer the same queries) as the suffix array and the inverse suffix array, and on the other hand, occupies the space roughly proportional to the gzip’ed text (or, more precisely, to the measure δ that they define — hence the name).

Moreover, they mathematically prove that this index is optimal, in the sense that any index that supports these queries — or even much weaker queries, such as simply accessing the i-th character of the text — cannot be significantly smaller (as a function of δ) than δ-SA.

Download MP3

Phylogenetic inference from raw reads and Read2Tree with David Dylus (#68)

August 28, 2023

In this episode, David Dylus talks about Read2Tree, a tool that builds alignment matrices and phylogenetic trees from raw sequencing reads. By leveraging the database of orthologous genes called OMA, Read2Tree bypasses traditional, time-consuming steps such as genome assembly, annotation and all-versus-all sequence comparisons.