crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <>
Subject CrunchJobHooks.handleMultiPaths(..) file pattern expectations
Date Fri, 26 Apr 2013 19:30:56 GMT
So as mentioned I'm currently trying out adding Avro Trevni support to
Crunch.  I think I've gotten everything working with the exception that my
output is not being copied to the correct directory upon completion.

I'm extending the FileTargetImpl and have the following in my

    public void configureForMapReduce(Job job, PType<?> ptype, Path
outputPath, String name) {
        configureForMapReduce(job, AvroKey.class, NullWritable.class,
                outputPath, name);

        //AvroTrevniKeyOutputFormat uses this set value to write content
directly to this path.  Therefore
        // resetting the value with the named value.
        if(name != null){
            FileOutputFormat.setOutputPath(job, new Path(outputPath, name));

This produces the following in the crunch tmp directory:

$ pwd
$ ls
_SUCCESS part-m-00000
$ cd part-m-00000/
$ ls -l
total 8
-rwxrwxrwx  1 mw010351  staff  493 Apr 26 13:52
-rw-r--r--  1 mw010351  staff    0 Apr 26 13:52 part-m-00000

the is the file of the most interest and ideally I'd be able to
avoid the extra part-m-00000 directory (but I can work on that
configuration because it is inside of Trevni I think).

Unfortunately the directories from the crunch tmdir isn't getting copied to
the expected output directory because the CrunchJobHooks for completion
expects folders to be of the form "out#-*" and  the directory that is
getting created does not have the "-" or take the form like others
("out0-m-00000").  Am I missing some configuration in my target that would
cause the directory to be created like that?  Or should the pattern for
finding directories to copy be lessened to not have the final "-"?


View raw message