The bioinformatics chat is a podcast about computational biology, bioinformatics, and next generation sequencing.
The bioinformatics chat is produced by Roman Cheplyaka.
Don't miss the next episode! Subscribe on Apple Podcasts, Google Podcasts, Spotify, or via an RSS feed link. You can also follow the podcast on Twitter and Mastodon.
#39 Amplicon sequence variants and bias with Benjamin Callahan
November 29, 2019
In this episode Benjamin Callahan talks about some of the issues faced by microbiologists when conducting amplicon sequencing and metagenomic studies. The two main themes are:
- Why one should probably avoid using OTUs (operational taxonomic units) and use exact sequence variants (also called amplicon sequence variants, or ASVs), and how DADA2 manages to deduce the exact sequences present in the sample.
- Why abundances inferred from community sequencing data are biased, and how we can model and correct this bias.

#38 Issues in legacy genomes with Luke Anderson-Trocmé
October 22, 2019
In this episode Luke Anderson-Trocmé talks about his findings from the 1000 Genomes Project. Namely, the early sequenced genomes sometimes contain specific mutational signatures that haven’t been replicated from other sources and can be found via their association with lower base quality scores. Listen to Luke telling the story of how he stumbled upon and investigated these fake variants and what their impact is.

#37 Causality and potential outcomes with Irineo Cabreros
September 27, 2019
In this episode I talk with Irineo Cabreros about causality. We discuss why causality matters, what does and does not imply causality, and two different mathematical formalizations of causality: potential outcomes and directed acyclic graphs (DAGs). Causal models are usually considered external to and separate from statistical models, whereas Irineo’s new paper shows how causality can be viewed as a relationship between particularly chosen random variables (potential outcomes).

Previous episodes
#36 scVI with Romain Lopez and Gabriel Misrachi
#35 The role of the DNA shape in transcription factor binding with Hassan Samee
#34 Power laws and T-cell receptors with Kristina Grigaityte
#33 Genome assembly from long reads and Flye with Mikhail Kolmogorov
#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber
#31 Bioinformatics Contest 2019 with Alexey Sergushichev and Gennady Korotkevich
#30 Bayesian inference of chromatin structure from Hi-C data with Simeon Carstens
#29 Haplotype-aware genotyping from long reads with Trevor Pesout
#28 Space-efficient variable-order Markov models with Fabio Cunial
#27 Classification of CRISPR-induced mutations and CRISPRpic with HoJoon Lee and Seung Woo Cho
#26 Feature selection, Relief and STIR with Trang Lê
#25 Transposons and repeats with Kaushik Panda and Keith Slotkin
#24 Read correction and Bcool with Antoine Limasset
#23 RNA design, EteRNA and NEMO with Fernando Portela
#22 smCounter2: somatic variant calling and UMIs with Chang Xu
#21 Linear mixed models, GWAS, and lme4qtl with Andrey Ziyatdinov
#20 B cell receptor substitution profile prediction and SPURF with Kristian Davidsen and Amrit Dhar
#19 Genome fingerprints with Gustavo Glusman
#18 Bioinformatics Contest 2018 with Alexey Sergushichev and Ekaterina Vyahhi
#17 Rarefaction, alpha diversity, and statistics with Amy Willis
#16 Javier Quilez on what makes large sequencing projects successful
#15 Optimal transport for single-cell expression data with Geoffrey Schiebinger
#14 Generating functions for read mapping with Guillaume Filion
#12 Modelling the immune system and C-ImmSim with Filippo Castiglione
#11 Collective cell migration with Linus Schumacher
#10 Spatially variable genes and SpatialDE with Valentine Svensson
#9 Michael Tessler and Christopher Mason on 16S amplicon vs shotgun sequencing
#8 Perfect k-mer hashing in Sailfish
#5 Relative data analysis and propr with Thom Quinn
#4 ChIP-seq and GenoGAM with Georg Stricker and Julien Gagneur
#3 miRNA target site prediction and seedVicious with Antonio Marco