mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruv Kumar <dhru...@gmail.com>
Subject Re: HMM - baum welch and hmmpredict
Date Sun, 06 Jan 2013 21:06:52 GMT
Hi Simon,

Are you using the standalone HMM trainer or are you running with the MapReduce variant using
the patch available at https://issues.apache.org/jira/browse/MAHOUT-627?

As Ted mentioned, these trainers can experience arithmetic underflow when the set of states
is large. Did you try the log scaled APIs for the Baum Welch trainer? The log scaled versions
are more immune to underflows.

-Dhruv

On Jan 6, 2013, at 12:34 PM, simon.2.thompson@bt.com wrote:

> Hi Ted, 
> 
> thanks very much for the response, very helpful to hear these thoughts. 
> 
> What I will do is look at the data set issue and report back as to what I find out. I'll
prod round the code and see if I can get a clue as to how it produces infinities and so on.
> 
> I think that one of the Mahout algorithms (DF) does use NaN for "undecidable" 
> 
> (ref) http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E
> 
> So perhaps there is a long term need to think through the output semantics of the library?

> 
> I ran an open source project (Zeus Agents - still on source forge! but antique) for many
years before it faded, so I know that random suggestions with no technical input is fairly
unhelpful, but give me some time and I'll try and come back with something more useful! 
> 
> Best,
> 
> Simon
> 
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
> 
> Note :
> 
> This email contains BT information, which may be privileged or confidential. It's meant
only for the individual(s) or entity named above. If you're not the intended recipient, note
that disclosing, copying, distributing or using this information is prohibited. If you've
received this email in error, please let me know immediately on the email address above. Thank
you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: 06 January 2013 20:16
> To: user@mahout.apache.org
> Subject: Re: HMM - baum welch and hmmpredict
> 
> It sounds like you are getting some numerical stability issues with the
> training program.  With HMM's, the most common problem that leads to this
> is numerical underflow.  I haven't looked at this in detail, however, so I
> can't comment very knowledgeably.  It is possible that the current
> implementation has no regularization which might lead to problems for
> synthetic data-sets such as your counting example because there are no
> observations for some transitions and the trainer may try to represent this
> as -Inf in log space.
> 
> I can say that the Mahout HMM implementations are a student project and
> have not seen much run-time or critical review.  That means that the
> probability of serious bugs in the implementation is much higher than code
> that is heavily used such as the recommender or the math library.  The
> student who did the work is good, but that doesn't take the place of wide
> usage.
> 
> On Sat, Jan 5, 2013 at 11:44 AM, <simon.2.thompson@bt.com> wrote:
> 
>> Hi there,
>> 
>> I've got a couple of questions about the hmm elements of Mahout.
>> 
>> - when I get models that are made of NaN I guess this is telling me that
>> the algorithm can't make a prediction?
>> - I can train models with 1 hidden state, or 2 hidden states and once or
>> twice with 3 hidden states.. but when I try to train anything more complex
>> it always seems to come back with NaNs - even with data sets like 1 2 3 4
>> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
>> or 5 hidden states : what am I doing wrong?
>> - I have used hmmpredict to produce some... predictions! but how can I
>> give it a sequence and then ask for the next state? Or should I simply use
>> the code to create a custom predictor of my own?
>> 
>> All the best,
>> 
>> Simon
>> 
>> 
>> ----
>> Dr. Simon Thompson
>> Chief Researcher, Customer Experience.
>> BT Research.
>> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
>> IP5 3RE
>> 
>> Note :
>> 
>> This email contains BT information, which may be privileged or
>> confidential. It's meant only for the individual(s) or entity named above.
>> If you're not the intended recipient, note that disclosing, copying,
>> distributing or using this information is prohibited. If you've received
>> this email in error, please let me know immediately on the email address
>> above. Thank you.
>> We monitor our email system, and may record your emails.
>> British Telecommunications plc
>> Registered office: 81 Newgate Street London EC1A 7AJ
>> Registered in England no: 1800000


Mime
View raw message