mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: ItemSimilarityJob creates no output
Date Wed, 06 Jun 2012 05:59:14 GMT
Is your input very small? It is probably getting mostly pruned as a
result, as most of it looks like low-count data. And then there is
almost no info on which to compute similarity.

On Tue, Jun 5, 2012 at 7:13 PM, Something Something
<mailinglists19@gmail.com> wrote:
> One thing I noticed is that in step 4 of this process
> (RowSimilarityJob-VectorNormMapper-Reducer)
>
> Mapper input:  6,925
> Mapper output: 3
>
> Reducer input: 3
> Reducer output: 0
>
> Most of the values going into the RowSimilarityJob are defaults.  Here's
> what I see in the code:
>
>    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>      int numberOfUsers = HadoopUtil.readInt(new Path(prepPath,
> PreparePreferenceMatrixJob.NUM_USERS),
>          getConf());
>
>      ToolRunner.run(getConf(), new RowSimilarityJob(), new String[] {
>          "--input", new Path(prepPath,
> PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
>          "--output", similarityMatrixPath.toString(),
>          "--numberOfColumns", String.valueOf(numberOfUsers),
>          "--similarityClassname", similarityClassName,
>          "--maxSimilaritiesPerRow", String.valueOf(maxSimilarItemsPerItem),
>          "--excludeSelfSimilarity", String.valueOf(Boolean.TRUE),
>          "--threshold", String.valueOf(threshold),
>          "--tempDir", getTempPath().toString() });
>    }
>
>
> Any ideas?
>
>
> On Mon, Jun 4, 2012 at 7:36 PM, Something Something <
> mailinglists19@gmail.com> wrote:
>
>> My job setup is really simple.  It looks like this:
>>
>>     public int run(String[] args) throws Exception {
>>         String datasetDate = args[0];
>>         String inputPath = args[1];
>>         String configFile = args[2];
>>         String ouputLocation = args[3];
>>
>>         Configuration config = getConf();
>>         config.addResource(new Path(configFile));
>>         logger.error("config: " + config.toString());
>>
>>         File inputFile = new File(inputPath);
>>         File outputDir = new File(ouputLocation);
>>         outputDir.delete();
>>         File tmpDir = new File("/tmp");
>>
>>         ItemSimilarityJob similarityJob = new ItemSimilarityJob();
>>
>>         Configuration conf = new Configuration();
>>         conf.set("mapred.input.dir", inputFile.getAbsolutePath());
>>         conf.set("mapred.output.dir", outputDir.getAbsolutePath());
>>         conf.setBoolean("mapred.output.compress", false);
>>
>>         similarityJob.setConf(conf);
>>
>>         similarityJob.run(new String[]{"--tempDir",
>> tmpDir.getAbsolutePath(), "--similarityClassname",
>>                 PearsonCorrelationSimilarity.class.getName(),});
>>
>>         return 0;
>>     }
>>
>>
>> The input file is sorted by UserId, ItemId & Preference.  Preference is
>> always '1'.  A few lines from the file look like this:
>>
>> -1000000334008648908    1    1
>> -1000000334008648908    70    1
>> -1000000334008648908    2090    1
>> -1000000334008648908    12872    1
>> -1000000334008648908    32790    1
>> -1000000334008648908    32799    1
>> -1000000334008648908    32969    1
>> -1000000397028994738    1    1
>> -1000000397028994738    12872    1
>> -1000000397028994738    32790    1
>> -1000000397028994738    32796    1
>> -1000000397028994738    32939    1
>> -100000083781885705    1    1
>> -100000083781885705    12872    1
>> -100000083781885705    32790    1
>> -100000083781885705    32837    1
>> -100000083781885705    33723    1
>> -1000001014586220418    1    1
>> -1000001014586220418    12872    1
>> -1000001014586220418    32790    1
>> & so on...
>>
>> (UserId is created using MemoryIDMigrator)
>>
>>
>> The job internally runs following 7 Hadoop jobs which all run successfully:
>>
>> PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer
>> PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer
>> PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer
>> RowSimilarityJob-VectorNormMapper-Reducer
>> RowSimilarityJob-CooccurrencesMapper-Reducer
>> RowSimilarityJob-UnsymmetrifyMapper-Reducer
>> ItemSimilarityJob-MostSimilarItemPairsMapper-Reducer
>>
>>
>> Problem is that the output file is empty!  What am I missing?  Please
>> help.  Thanks.
>>
>>

Mime
View raw message