mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Test naivebayes task running really slowly and not in distributed mode
Date Sun, 01 Dec 2013 08:35:53 GMT
Did the training run use both machines?

How large is the input for the test run?

Is it contained in a single file?




On Sat, Nov 30, 2013 at 11:22 AM, Fernando Santos <
fernandoleandro1991@gmail.com> wrote:

> Hello everyone,
>
> I'm trying to do a text classification task. My dataset is not that big, I
> have around 700.000 small comments.
>
> Following the 20newsgroups example, I created the vector from the text,
> splited it and trained the model. Now I'm trying to test it but it is
> really slow and also I cannot make it to run in the cluster. Whatever I do
> it always just run in one machine. And I think the testnb algorithm is
> supposed to run using mapReduce, right?
>
> I also tried this example here (
>
> http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/
> )
> but also, the other box in the cluster is not executing any task. In fact,
> when I execute the testnb or using the MapReduceClassifier proposed in this
> tutorial above, I get one job, executing one task and this task runs really
> slowly (like 6 minutes to achieve 0.13% of the task).
>
> I think I must be doing something wrong so that the cluster is not working
> how it is supposed to be.
>
> I have a cluster with 2 box configured with hadoop 0.20.205.0 and using
> mahout 0.8.
>
> I also tried versions 0.7 and 0.6 of mahout but nothing changed.
>
> Any help would be aprreciated.
>
>
> The logs I have from this task:
>
>
> *stdout logs*
>
> Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
> /usr/local/hadoop/lib/libhadoop.so which might have disabled stack
> guard. The VM will try to fix the stack guard now.
> It's highly recommended that you fix the library with 'execstack -c
> <libfile>', or link it with '-z noexecstack'.
>
>
> *syslog logs*
>
> 2013-11-30 17:09:19,191 WARN org.apache.hadoop.util.NativeCodeLoader:
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 2013-11-30 17:09:19,400 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
> already exists!
> 2013-11-30 17:09:19,472 INFO org.apache.hadoop.util.ProcessTree:
> setsid exited with exit code 0
> 2013-11-30 17:09:19,474 INFO org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5810d963
> 2013-11-30 17:09:19,543 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
> = 100
> 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: data
> buffer = 79691776/99614720
> 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: record
> buffer = 262144/327680
>
>
>
>
>
> --
> Fernando Santos
> +55 61 8129 8505
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message