The bioinformatics chat is a podcast about computational biology, bioinformatics, and next generation sequencing.
The bioinformatics chat is produced and hosted by Roman Cheplyaka.
Several awesome machine learning-themed episodes have been hosted by Jacob Schreiber.
Subscribe to the bioinformatics chat on
or any other podcasting app via the RSS feed link.
You can also follow the podcast on Mastodon and Twitter and
support it on Patreon.
September 29, 2023
Today on the podcast we have Tomasz Kociumaka and Dominik Kempa,
the authors of the preprint
Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space.
The suffix array is one of the foundational data structures in bioinformatics,
serving as an index that allows fast substring searches in a large text.
However, in its raw form, the suffix array occupies the space proportional to (and
several times larger than) the original text.
In their paper, Tomasz and Dominik construct a new index, δ-SA, which on the
one hand can be used in the same way (answer the same queries) as the suffix
array and the inverse suffix array, and on the other hand, occupies the space
roughly proportional to the gzip’ed text (or, more precisely, to the measure δ
that they define — hence the name).
Moreover, they mathematically prove that this index is optimal, in the sense
that any index that supports these queries — or even much weaker queries, such
as simply accessing the i-th character of the text — cannot be significantly
smaller (as a function of δ) than δ-SA.
August 28, 2023
In this episode,
David Dylus talks about
a tool that builds alignment matrices and phylogenetic trees from raw
By leveraging the database of orthologous genes called OMA, Read2Tree bypasses traditional, time-consuming steps such as genome assembly, annotation and all-versus-all sequence comparisons.
July 29, 2023
This is the third and final episode in the AlphaFold series, originally recorded on February 23, 2022,
with Amelie Stein, now an associate professor at the University of Copenhagen.
In the episode, Amelie explains what 𝛥𝛥G is, how it informs us
whether a particular protein mutation affects its stability, and how AlphaFold 2
helps in this analysis.
AlphaFold and shape-mers with Janani Durairaj (#66)
AlphaFold and protein interactions with Pedro Beltrao (#65)
Enformer: predicting gene expression from sequence with Žiga Avsec (#64)
Bioinformatics Contest 2021 with Maksym Kovalchuk and James Matthew Holt (#63)
Steady states of metabolic networks and Dingo with Apostolos Chalkis (#62)
3D genome organization and GRiNCH with Da-Inn Erika Lee (#61)
Differential gene expression and DESeq2 with Michael Love (#60)
Proteomics calibration with Lindsay Pino (#59)
B cell maturation and class switching with Hamish King (#58)
Enhancers with Molly Gasperini (#57)
Polygenic risk scores in admixed populations with Bárbara Bitarello (#56)
Phylogenetics and the likelihood gradient with Xiang Ji (#55)
Seeding methods for read alignment with Markus Schmidt (#54)
Real-time quantitative proteomics with Devin Schweppe (#53)
How 23andMe finds identical-by-descent segments with William Freyman (#52)
Basset and Basenji with David Kelley (#51)
ENCODE3 with Jill Moore (#50)
Most Permissive Boolean Networks with Loïc Paulevé (#49)
Machine learning for drug development with Marinka Zitnik (#48)
Reproducible pipelines and NGLess with Luis Pedro Coelho (#47)
HiFi reads and HiCanu with Sergey Nurk and Sergey Koren (#46)
Genome assembly and Canu with Sergey Koren and Sergey Nurk (#45)
DNA tagging and Porcupine with Kathryn Doroschak (#44)
Generalized PCA for single-cell data with William Townes (#43)
Spectrum-preserving string sets and simplitigs with Amatur Rahman and Karel Břinda (#42)
Epidemic models with Kris Parag (#41)
Plasmid classification and binning with Sergio Arredondo-Alonso and Anita Schürch (#40)
Amplicon sequence variants and bias with Benjamin Callahan (#39)
Issues in legacy genomes with Luke Anderson-Trocmé (#38)
Causality and potential outcomes with Irineo Cabreros (#37)
scVI with Romain Lopez and Gabriel Misrachi (#36)
The role of the DNA shape in transcription factor binding with Hassan Samee (#35)
Power laws and T-cell receptors with Kristina Grigaityte (#34)
Genome assembly from long reads and Flye with Mikhail Kolmogorov (#33)
Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber (#32)
Bioinformatics Contest 2019 with Alexey Sergushichev and Gennady Korotkevich (#31)
Bayesian inference of chromatin structure from Hi-C data with Simeon Carstens (#30)
Haplotype-aware genotyping from long reads with Trevor Pesout (#29)
Space-efficient variable-order Markov models with Fabio Cunial (#28)
Classification of CRISPR-induced mutations and CRISPRpic with HoJoon Lee and Seung Woo Cho (#27)
Feature selection, Relief and STIR with Trang Lê (#26)
Transposons and repeats with Kaushik Panda and Keith Slotkin (#25)
Read correction and Bcool with Antoine Limasset (#24)
RNA design, EteRNA and NEMO with Fernando Portela (#23)
smCounter2: somatic variant calling and UMIs with Chang Xu
Linear mixed models, GWAS, and lme4qtl with Andrey Ziyatdinov (#21)
B cell receptor substitution profile prediction and SPURF with Kristian Davidsen and Amrit Dhar (#20)
Genome fingerprints with Gustavo Glusman (#19)
Bioinformatics Contest 2018 with Alexey Sergushichev and Ekaterina Vyahhi (#18)
Rarefaction, alpha diversity, and statistics with Amy Willis (#17)
Javier Quilez on what makes large sequencing projects successful (#16)
Optimal transport for single-cell expression data with Geoffrey Schiebinger (#15)
Generating functions for read mapping with Guillaume Filion (#14)
Bracken with Jennifer Lu (#13)
Modelling the immune system and C-ImmSim with Filippo Castiglione (#12)
Collective cell migration with Linus Schumacher (#11)
Spatially variable genes and SpatialDE with Valentine Svensson (#10)
Michael Tessler and Christopher Mason on 16S amplicon vs shotgun sequencing (#9)
Perfect k-mer hashing in Sailfish (#8)
Metagenomics and Kraken (#7)
Allele-specific expression (#6)
Relative data analysis and propr with Thom Quinn (#5)
ChIP-seq and GenoGAM with Georg Stricker and Julien Gagneur (#4)
miRNA target site prediction and seedVicious with Antonio Marco (#3)
Single-cell RNA sequencing with Aleksandra Kolodziejczyk (#2)
Transcriptome assembly and Scallop with Mingfu Shao (#1)