#8 Perfect k-mer hashing in Sailfish
August 5, 2017
The original version of Sailfish, an RNA-Seq quantification tool, used minimal perfect hash functions to replace k-mers with unique integers. (The current version appears to be using a Cuckoo hashmap instead.)
This is my attempt to explain how a minimal perfect hash function could be built. The algorithm described here is not exactly the same as the one Sailfish used, but it follows the same idea.
Sections:
- Sailfish and perfect hashing (1:15)
- Perfect hashing based on binary search or hash tables (4:34)
- Random hash functions (7:34)
- Perfect hash function based on an acyclic graph (12:16)
Links:
- The Sailfish paper
- The paper describing the perfect hashing algorithm
- The birthday paradox
- Integrative Biology & Medicine
Don't miss the next episode! Subscribe on Apple Podcasts, Google Podcasts, Spotify, or via an RSS feed link. You can also follow the podcast on Twitter and Mastodon.
Music: Eric Skiff — Come and Find Me (modified, licensed under CC BY 4.0).