mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angelo Immediata <>
Subject Re: Information
Date Tue, 15 Oct 2013 20:01:45 GMT
hi All

First of all thank you for the great suggestions you gave me; you are
simply great :)
Anyway, returning to my problem, I'll try to be as much clear as
possible...As far as I know (but we are still collecting requirements and
understanding which kind of data we will have) we should have a situation
of this type:
on street XYZ in Spring without any events (an event can be manifestation,
parade etc...) the medium velocity is 50 Km/h
on street XYZ in Spring with an event the medium velocity is 20 Km/h
on street XYZ in Autumn without any events (an event can be manifestation,
parade etc...) the medium velocity is 40 Km/h
on street XYZ in Autumn with an event the medium velocity is 15 Km/h

and so on for all the interested street (basically using the Open Street
Map data); note that we are not interested in the worst case that is the
case with accident (at least as far as I know).

Now my customer would like to offer this kind functionality to the clients:
a client connects to the site (or downloads an app) and he/she wants to go
by car to the restaurant W; he/she would like to know if it's a good idea
to go on that street or search for a different street; so by knowing the
period of time (Spring, Autumn, Summer or Winter) and by knowing if there
are some events (manifestations, parades etc...) I should tell him/her: if
you go on street XYZ probably you will travel at 50Km/h or 20Km/h (the best
would be if I may suggest a different way...but this is another topic :) )

So, since i should use old data in order to suggest to the client the
velocity he/she may have on street XYZ, I was thinking to use mahout....but
maybe I was wrong (sadly I'm really new in this kind of world...though I'm
finding it amazing)

Now by using the "old" data (the one I listed previously)

2013/10/15 Andrew Butkus <>

> After giving some more thought, you could do something like this:
> Store:
> route
> {
>         road
>         {
>                 timestamp,
>                 time_to_run_road,
>         }
> }
> then build up a bigger model, which extracts timestamp from the road on
> the route and the time it takes to run that road, and calculate an average
> on a per day basis, (for example, if you travel this route every monday at
> 9am, then extract the timestamp which matches every monday at 9am, and
> average the time_to_run_road data you have collected on a monday for that
> road. If you want to see how long it takes to run a road on every monday at
> 9am in january, then you extract all timestamps that match that road for
> january at 9am on monday
> Not entirely sure where mahout fits in here, but this could be a potential
> way forward for you (assuming you can collect/have data about the road)
> Hope that helps
> Andy
> On 15 Oct 2013, at 13:09, Andrew Butkus <> wrote:
> > Also to add to this you probably wouldn't want to do it by route, but
> > maybe break it down by road, this gives more coverage and greater
> > granularity
> >
> > Sent from my Windows Phone From: Andrew Butkus
> > Sent: 15/10/2013 13:07
> > To: Bertrand Dechoux;
> > Subject: RE: Information
> > IM not sure, i think the last 2 can be predicted, for example in
> > january in the uk we get bad weather which causes delays and on average
> > it will take longer to run a route in this month because of that,
> >
> > To consider weather as a variable is probably not scalable, recording
> > the time to run a route with a timestamp should be good enough.
> >
> > Also consider once a year there is a festival in reading, so over this
> > weekend routes through reading will always take longer.
> >
> > IM not sure where mahout can fit this problem, other than, but if u can
> > train route time and add a timestamp this would give u something
> > scalable. Then figure out on average how long it takes to run a route
> > at similar time stamp, for example, minute, hour, week, month, year.
> >
> > Sent from my Windows Phone From: Bertrand Dechoux
> > Sent: 15/10/2013 08:33
> > To:
> > Subject: Re: Information
> > The biggest point is what data do you have and what exactly is your
> problem.
> >
> > The maximum speed of the route can be easily known and in the best case
> > that would be your speed. From a very broad point of view, there is three
> > reasons for a slowdown.
> > 1) traffic jam
> > 2) accident
> > 3) bad weather
> >
> > But without up to date observations, those three points are non trivial
> to
> > predict (especially the last two). Doing simple statistics (like average)
> > can be a good start to see the variations and understand what factors
> > should be taken into account.
> >
> > At the end, you want to do a regression but classification and clustering
> > might help before that. Hard to say more without knowing why the medium
> > speed is important, for which area, at which time...
> >
> > Bertrand
> >
> > On Tue, Oct 15, 2013 at 9:14 AM, Pavan K Narayanan <
> >> wrote:
> >
> >> Based on the information you have provided, street routing is
> potentially a
> >> Vehicle Routing Problem which is based on TSPs. You can check out the
> below
> >> link:
> >>
> >> Secondly, if you want to use Mahout for Forecasting, it is not possible
> yet
> >> as the solution methodology for Forecasting (LWR) is still an open
> problem.
> >>
> >>
> >> Bottomline: IMHO, you cannot use Mahout for forecasting at the moment;
> good
> >> luck with your project.
> >>
> >> Also, you can explore parallel computing paradigms if you have
> relatively
> >> high volumes of data.
> >>
> >>
> >> On 15 October 2013 12:19, Angelo Immediata <> wrote:
> >>
> >>> Hi there
> >>>
> >>> I'm pretty new to learning machine and apache mahout as well so pardon
> me
> >>> if this question is not too correct :)
> >>>
> >>> I'm in a street routing project where, beside other functionalities, we
> >>> have to make forecasts. Precisely we should be able in forecasting the
> >>> medium speed in a street in a well know period season (e.g we should be
> >>> able in answering to this kind of question: on the american route 66
> what
> >>> will be the medium speed in spring 2015?)
> >>> As far as I know in order to offer this functionality we should use
> some
> >>> learning machine; this is the reason I'm checking mahout (moreover we
> >> need
> >>> to guarantee high performance and since mahout is based on Apache
> hadoop
> >>> and since it uses Map/Reduce, it seems to me very amazing)
> >>> The first question I'ld love to do is: can I use Apache mahout in order
> >> to
> >>> implement the previously written funcionality?
> >>> If I can use it sure I'll need some data in order to "train"
> >> mahout....can
> >>> I train mahout in a different time respect to when i need the
> prevision?
> >> I
> >>> mean: can I make the train let's say every week at 10pm and then offer
> >> the
> >>> forecasting functionality only when a user is interested in it? Should
> I
> >>> store the training result in some way?
> >>> And the last, but not the least :), always if I can use mahout....which
> >>> algoritm should I use in order to implement my scenario?
> >>>
> >>> Thank you for the help and pardon me if i was not too much corrected
> >>>
> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message