So as mentioned I'm currently trying out adding Avro Trevni support to
Crunch. I think I've gotten everything working with the exception that my
output is not being copied to the correct directory upon completion.
I'm extending the FileTargetImpl and have the following in my
implementation:
@Override
public void configureForMapReduce(Job job, PType<?> ptype, Path
outputPath, String name) {
.....
configureForMapReduce(job, AvroKey.class, NullWritable.class,
AvroTrevniKeyOutputFormat.class,
outputPath, name);
//AvroTrevniKeyOutputFormat uses this set value to write content
directly to this path. Therefore
// resetting the value with the named value.
if(name != null){
FileOutputFormat.setOutputPath(job, new Path(outputPath, name));
}
This produces the following in the crunch tmp directory:
$ pwd
/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0
$ ls
_SUCCESS part-m-00000
$ cd part-m-00000/
$ ls -l
total 8
-rwxrwxrwx 1 mw010351 staff 493 Apr 26 13:52 part-0.trv
-rw-r--r-- 1 mw010351 staff 0 Apr 26 13:52 part-m-00000
the part-0.trv is the file of the most interest and ideally I'd be able to
avoid the extra part-m-00000 directory (but I can work on that
configuration because it is inside of Trevni I think).
Unfortunately the directories from the crunch tmdir isn't getting copied to
the expected output directory because the CrunchJobHooks for completion
expects folders to be of the form "out#-*" and the directory that is
getting created does not have the "-" or take the form like others
("out0-m-00000"). Am I missing some configuration in my target that would
cause the directory to be created like that? Or should the pattern for
finding directories to copy be lessened to not have the final "-"?
Thoughts?
|