bioinformatics chat

the bioinformatics chat

The bioinformatics chat is a podcast about computational biology, bioinformatics, and next generation sequencing.

The bioinformatics chat is produced by Roman Cheplyaka.

#33 Genome assembly from long reads and Flye with Mikhail Kolmogorov

May 31, 2019

Modern genome assembly projects are often based on long reads in an attempt to bridge longer repeats. However, due to the higher error rate of the current long read sequencers, assemblers based on de Bruijn graphs do not work well in this setting, and the approaches that do work are slower.

In this episode Mikhail Kolmogorov from Pavel Pevzner’s lab joins us to talk about some of the ideas developed in the lab that made it possible to build a de Bruijn-like assembly graph from noisy reads. These ideas are now implemented in the Flye assembler, which performs much faster than the existing long read assemblers without sacrificing the quality of the assembly.

#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber

April 29, 2019

In this episode we hear from Jacob Schreiber about his algorithm, Avocado.

Avocado uses deep tensor factorization to break a three-dimensional tensor of epigenomic data into three orthogonal dimensions corresponding to cell types, assay types, and genomic loci. Avocado can extract a low-dimensional, information-rich latent representation from the wealth of experimental data from projects like the Roadmap Epigenomics Consortium and ENCODE. This representation allows you to impute genome-wide epigenomics experiments that have not yet been performed.

Jacob also talks about a pitfall he discovered when trying to predict gene expression from a mix of genomic and epigenomic data. As you increase the complexity of a machine learning model, its performance may be increasing for the wrong reason: instead of learning something biologically interesting, your model may simply be memorizing the average gene expression for that gene across your training cell types using the nucleotide sequence.

#31 Bioinformatics Contest 2019 with Alexey Sergushichev and Gennady Korotkevich

March 24, 2019

The third Bioinformatics Contest took place in February 2019.

Alexey Sergushichev, one of the organizers of the contest, and Gennady Korotkevich, the 1st prize winner, join me to discuss this year’s problems.

