spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Machado <jom...@me.com>
Subject Re: [MLlib] What is the best way to forecast the next month page visit?
Date Mon, 01 Feb 2016 13:17:50 GMT
Hi Guru,

So First transform your Name pages with OneHotEncoder ( https://spark.apache.org/docs/latest/ml-features.html#onehotencoder
<https://spark.apache.org/docs/latest/ml-features.html#onehotencoder>) then make the
same thing for months:

You will end with something like: 
	(first tree are the pagename, the other the month,)
	(0,0,1,0,0,1) 

then you have your label that is what you want to predict. At the end you will have an LabeledPoint
with (10000 -> (0,0,1,0,0,1)) this will represent (10000 -> (PageA, UV_NOV))
After that try a regression tree with 

val model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo, impurity,maxDepth,
maxBins)


Regards
Jorge

> On 01/02/2016, at 12:29, diplomatic Guru <diplomaticguru@gmail.com> wrote:
> 
> Any suggestions please?
> 
> 
> On 29 January 2016 at 22:31, diplomatic Guru <diplomaticguru@gmail.com <mailto:diplomaticguru@gmail.com>>
wrote:
> Hello guys,
> 
> I'm trying understand how I could predict the next month page views based on the previous
access pattern.
> 
> For example, I've collected statistics on page views:
> 
> e.g.
> Page,UniqueView
> -------------------------
> pageA, 10000
> pageB, 999
> ...
> pageZ,200
> 
> I aggregate the statistics monthly.
> 
> I've prepared a file containing last 3 months as this:
> 
> e.g.
> Page,UV_NOV, UV_DEC, UV_JAN
> ---------------------------------------------------
> pageA, 10000,9989,11000
> pageB, 999,500,700
> ...
> pageZ,200,50,34
> 
> 
> Based on above information, I want to predict the next month (FEB).
> 
> Which alogrithm do you think will suit most, I think linear regression is the safe bet.
However, I'm struggling to prepare this data for LR ML, especially how do I prepare the X,Y
relationship.
> 
> The Y is easy (uniqiue visitors), but not sure about the X(it should be Page,right).
However, how do I plot those three months of data.
> 
> Could you give me an example based on above example data?
> 
> 
> 
> Page,UV_NOV, UV_DEC, UV_JAN
> ---------------------------------------------------
> 1, 10000,9989,11000
> 2, 999,500,700
> ...
> 26,200,50,34
> 
> 
> 
> 
> 


Mime
View raw message