Perfect k-mer hashing in Sailfish (#8)
August 5, 2017
The original version of Sailfish, an RNA-Seq quantification tool, used minimal perfect hash functions to replace k-mers with unique integers. (The current version appears to be using a Cuckoo hashmap instead.)
This is my attempt to explain how a minimal perfect hash function could be built. The algorithm described here is not exactly the same as the one Sailfish used, but it follows the same idea.
Sections:
- Sailfish and perfect hashing (1:15)
- Perfect hashing based on binary search or hash tables (4:34)
- Random hash functions (7:34)
- Perfect hash function based on an acyclic graph (12:16)
Links:
- The Sailfish paper
- The paper describing the perfect hashing algorithm
- The birthday paradox
- Integrative Biology & Medicine
Music: Eric Skiff — Come and Find Me (modified, licensed under CC BY 4.0).
Subscribe to the bioinformatics chat on Apple Podcasts, Pocket Casts, Spotify, or any other podcasting app via the RSS feed link. You can also follow the podcast on Mastodon and Twitter and support it on Patreon.