hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu Xiao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives
Date Tue, 10 Feb 2015 04:00:41 GMT
Liu Xiao created MAPREDUCE-6249:

             Summary: Streaming task will not untar tgz uploaded with -archives
                 Key: MAPREDUCE-6249
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 2.5.2
         Environment: hadoop-2.5.2
            Reporter: Liu Xiao

when writing hadoop streaming task. i used -archives to upload a tgz from local machine to
hdfs task working directory, but it has not been untarred as the document says. I've searched
a lot without any luck.

Here is the hadoop streaming task starting command with hadoop-2.5.2

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
    -files mapper.sh
    -archives /home/hadoop/tmp/test.tgz#test \
    -D mapreduce.job.maps=1 \
    -D mapreduce.job.reduces=1 \
    -input "/test/test.txt" \
    -output "/res/" \
    -mapper "sh mapper.sh" \
    -reducer "cat"

and "mapper.sh"

cat > /dev/null
ls -l test
exit 0

in "test.tgz" there is two files "test.1.txt" and "test.2.txt"

echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt

the output from above task

lrwxrwxrwx 1 hadoop hadoop     71 Feb  8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

but what desired may be like this

-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt

so, why test.tgz has not been untarred automatically as document says, and or there is actually
another way makes the "tgz" being untarred

This message was sent by Atlassian JIRA

View raw message