Can you gist up a patch and/or post it to a JIRA so we can take a look?
On Fri, Apr 26, 2013 at 12:30 PM, Micah Whitacre <mkwhitacre@gmail.com>wrote:
> So as mentioned I'm currently trying out adding Avro Trevni support to
> Crunch. I think I've gotten everything working with the exception that my
> output is not being copied to the correct directory upon completion.
>
> I'm extending the FileTargetImpl and have the following in my
> implementation:
>
> @Override
> public void configureForMapReduce(Job job, PType<?> ptype, Path
> outputPath, String name) {
> .....
> configureForMapReduce(job, AvroKey.class, NullWritable.class,
> AvroTrevniKeyOutputFormat.class,
> outputPath, name);
>
> //AvroTrevniKeyOutputFormat uses this set value to write content
> directly to this path. Therefore
> // resetting the value with the named value.
> if(name != null){
> FileOutputFormat.setOutputPath(job, new Path(outputPath,
> name));
> }
>
> This produces the following in the crunch tmp directory:
>
> $ pwd
>
> /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0
> $ ls
> _SUCCESS part-m-00000
> $ cd part-m-00000/
> $ ls -l
> total 8
> -rwxrwxrwx 1 mw010351 staff 493 Apr 26 13:52 part-0.trv
> -rw-r--r-- 1 mw010351 staff 0 Apr 26 13:52 part-m-00000
>
> the part-0.trv is the file of the most interest and ideally I'd be able to
> avoid the extra part-m-00000 directory (but I can work on that
> configuration because it is inside of Trevni I think).
>
> Unfortunately the directories from the crunch tmdir isn't getting copied
> to the expected output directory because the CrunchJobHooks for completion
> expects folders to be of the form "out#-*" and the directory that is
> getting created does not have the "-" or take the form like others
> ("out0-m-00000"). Am I missing some configuration in my target that would
> cause the directory to be created like that? Or should the pattern for
> finding directories to copy be lessened to not have the final "-"?
>
> Thoughts?
>
--
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>
|