>> Can you gist up a patch and/or post it to a JIRA so we can take a look?

I'll work on cleaning up my code a bit and attach it to a JIRA.


On Fri, Apr 26, 2013 at 3:19 PM, Josh Wills <jwills@cloudera.com> wrote:
Can you gist up a patch and/or post it to a JIRA so we can take a look?


On Fri, Apr 26, 2013 at 12:30 PM, Micah Whitacre <mkwhitacre@gmail.com> wrote:
So as mentioned I'm currently trying out adding Avro Trevni support to Crunch.  I think I've gotten everything working with the exception that my output is not being copied to the correct directory upon completion.  

I'm extending the FileTargetImpl and have the following in my implementation:

    @Override
    public void configureForMapReduce(Job job, PType<?> ptype, Path outputPath, String name) {
         .....
        configureForMapReduce(job, AvroKey.class, NullWritable.class, AvroTrevniKeyOutputFormat.class,
                outputPath, name);

        //AvroTrevniKeyOutputFormat uses this set value to write content directly to this path.  Therefore
        // resetting the value with the named value.
        if(name != null){
            FileOutputFormat.setOutputPath(job, new Path(outputPath, name));
        }
    
This produces the following in the crunch tmp directory:

$ pwd
/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0
$ ls
_SUCCESS part-m-00000
$ cd part-m-00000/
$ ls -l
total 8
-rwxrwxrwx  1 mw010351  staff  493 Apr 26 13:52 part-0.trv
-rw-r--r--  1 mw010351  staff    0 Apr 26 13:52 part-m-00000

the part-0.trv is the file of the most interest and ideally I'd be able to avoid the extra part-m-00000 directory (but I can work on that configuration because it is inside of Trevni I think).

Unfortunately the directories from the crunch tmdir isn't getting copied to the expected output directory because the CrunchJobHooks for completion expects folders to be of the form "out#-*" and  the directory that is getting created does not have the "-" or take the form like others ("out0-m-00000").  Am I missing some configuration in my target that would cause the directory to be created like that?  Or should the pattern for finding directories to copy be lessened to not have the final "-"?

Thoughts?



--
Director of Data Science
Twitter: @josh_wills