mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Butkus <>
Subject Re: Information
Date Tue, 15 Oct 2013 12:21:59 GMT

After giving some more thought, you could do something like this:



then build up a bigger model, which extracts timestamp from the road on the route and the
time it takes to run that road, and calculate an average on a per day basis, (for example,
if you travel this route every monday at 9am, then extract the timestamp which matches every
monday at 9am, and average the time_to_run_road data you have collected on a monday for that
road. If you want to see how long it takes to run a road on every monday at 9am in january,
then you extract all timestamps that match that road for january at 9am on monday

Not entirely sure where mahout fits in here, but this could be a potential way forward for
you (assuming you can collect/have data about the road)

Hope that helps


On 15 Oct 2013, at 13:09, Andrew Butkus <> wrote:

> Also to add to this you probably wouldn't want to do it by route, but
> maybe break it down by road, this gives more coverage and greater
> granularity
> Sent from my Windows Phone From: Andrew Butkus
> Sent: 15/10/2013 13:07
> To: Bertrand Dechoux;
> Subject: RE: Information
> IM not sure, i think the last 2 can be predicted, for example in
> january in the uk we get bad weather which causes delays and on average
> it will take longer to run a route in this month because of that,
> To consider weather as a variable is probably not scalable, recording
> the time to run a route with a timestamp should be good enough.
> Also consider once a year there is a festival in reading, so over this
> weekend routes through reading will always take longer.
> IM not sure where mahout can fit this problem, other than, but if u can
> train route time and add a timestamp this would give u something
> scalable. Then figure out on average how long it takes to run a route
> at similar time stamp, for example, minute, hour, week, month, year.
> Sent from my Windows Phone From: Bertrand Dechoux
> Sent: 15/10/2013 08:33
> To:
> Subject: Re: Information
> The biggest point is what data do you have and what exactly is your problem.
> The maximum speed of the route can be easily known and in the best case
> that would be your speed. From a very broad point of view, there is three
> reasons for a slowdown.
> 1) traffic jam
> 2) accident
> 3) bad weather
> But without up to date observations, those three points are non trivial to
> predict (especially the last two). Doing simple statistics (like average)
> can be a good start to see the variations and understand what factors
> should be taken into account.
> At the end, you want to do a regression but classification and clustering
> might help before that. Hard to say more without knowing why the medium
> speed is important, for which area, at which time...
> Bertrand
> On Tue, Oct 15, 2013 at 9:14 AM, Pavan K Narayanan <
>> wrote:
>> Based on the information you have provided, street routing is potentially a
>> Vehicle Routing Problem which is based on TSPs. You can check out the below
>> link:
>> Secondly, if you want to use Mahout for Forecasting, it is not possible yet
>> as the solution methodology for Forecasting (LWR) is still an open problem.
>> Bottomline: IMHO, you cannot use Mahout for forecasting at the moment; good
>> luck with your project.
>> Also, you can explore parallel computing paradigms if you have relatively
>> high volumes of data.
>> On 15 October 2013 12:19, Angelo Immediata <> wrote:
>>> Hi there
>>> I'm pretty new to learning machine and apache mahout as well so pardon me
>>> if this question is not too correct :)
>>> I'm in a street routing project where, beside other functionalities, we
>>> have to make forecasts. Precisely we should be able in forecasting the
>>> medium speed in a street in a well know period season (e.g we should be
>>> able in answering to this kind of question: on the american route 66 what
>>> will be the medium speed in spring 2015?)
>>> As far as I know in order to offer this functionality we should use some
>>> learning machine; this is the reason I'm checking mahout (moreover we
>> need
>>> to guarantee high performance and since mahout is based on Apache hadoop
>>> and since it uses Map/Reduce, it seems to me very amazing)
>>> The first question I'ld love to do is: can I use Apache mahout in order
>> to
>>> implement the previously written funcionality?
>>> If I can use it sure I'll need some data in order to "train"
>> mahout....can
>>> I train mahout in a different time respect to when i need the prevision?
>> I
>>> mean: can I make the train let's say every week at 10pm and then offer
>> the
>>> forecasting functionality only when a user is interested in it? Should I
>>> store the training result in some way?
>>> And the last, but not the least :), always if I can use mahout....which
>>> algoritm should I use in order to implement my scenario?
>>> Thank you for the help and pardon me if i was not too much corrected

View raw message