mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghuveer <>
Subject Analysing NDT data for POC
Date Fri, 22 May 2015 11:42:34 GMT
I am doing a POC and have a dataset of the format ( client_ip, timestamp, bytes_transferred
) and trying to do the usecase "Predict bytes_transferred for a particular client for a give
timestamp". I got as dataset for example a client_ip which has downloaded bytes 234
for timestamp 1432292516696. Similarly lets say i have datasets for 22nd morning, 23rd evening
and 24th afternoon. So now we need to apply the usecase here. Therefore i pass this individual
bytes_transferred sets to classification to categorize into low_download (<2500), medium_download
(>2500 and <5000) and high_download (>5000).

How can i pass this dataset to classification algorithm like cBayes to categorize the client_ips
based on timestamp.
Since the data file has 3 columns should i pass the file as is to sequence file conversion
and then to vector or any pre-processing is required? since this is a time series data is
there any specific algorithms that can do the job?

I need your help, kindly suggest.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message