spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis Gillain <>
Subject MLlib Prefixspan implementation
Date Thu, 20 Aug 2015 09:00:26 GMT
I want to use prefixspan so I had a look at the code and the cited paper :
"Distributed PrefixSpan Algorithm Based on MapReduce".

There is a result in the paper I didn't really undertstand and I could'nt
find where it is used in the code.

Suppose a sequence database S = {­1­,2...­n}, a sequence <a...> is a
length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is a
prefix of a length-(L-1) sequential pattern <a...a>, when the support count
of <a> is not less than min_support, it is equal to obtaining a length-L
sequential pattern < a ... a > from projected databases that obtaining a
length-L sequential pattern < a ... a > from a sequence database S.

According to the paper It's supposed to add a pruning step in the reduce
function but I couldn't find where.

This result seems to come from a previous paper : "Wang Linlin, Fan Jun.
Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan [J].
Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
understand it and how it can improve the algorithm.

View raw message