Oleg Zabluda's blog
Saturday, May 20, 2017
 
Clustering by compression (2003) Rudi Cilibrasi, Paul Vitanyi
Clustering by compression (2003) Rudi Cilibrasi, Paul Vitanyi
"""
we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method.
[..]
The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors.
"""
https://arxiv.org/abs/cs/0312044

http://homepages.cwi.nl/~paulv/papers/cluster.pdf

Labels:


| |

Home

Powered by Blogger