I'm trying to understand the intuition behind the features method that Aaron used in one of his demos. I believe this feature will just work for detecting the character set (i.e., language used). 

Can someone help ? 


def featurize(s: String): Vector = {
val n = 1000
val result = new Array[Double](n)
val bigrams = s.sliding(2).toArray
 
for (h <- bigrams.map(_.hashCode % n)) {
result(h) += 1.0 / bigrams.length
}
 
Vectors.sparse(n, result.zipWithIndex.filter(_._1 != 0).map(_.swap))
}