hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives
Date Tue, 10 Feb 2015 14:46:12 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe resolved MAPREDUCE-6249.
-----------------------------------
    Resolution: Not a Problem

This is something better sent to the [Hadoop User mailing list|http://hadoop.apache.org/mailing_lists.html#User]
rather than JIRA.

The archive was untarred as requested, but it was untarred into a directory (named "test"
per the '#test' URI fragment in the archive argument).  An archive is always unpacked into
a directory specific to that archive, and the distributed cache does not support unpacking
directly into the task's working directory.  If you need files placed in the task working
directory then you will need to specify them separately (e.g.: via the "-files" directive).

> Streaming task will not untar tgz uploaded with -archives
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-6249
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 2.5.2
>         Environment: hadoop-2.5.2
> hadoop-streaming-2.5.2.jar
>            Reporter: Liu Xiao
>
> when writing hadoop streaming task. i used -archives to upload a tgz from local machine
to hdfs task working directory, but it has not been untarred as the document says. I've searched
a lot without any luck.
> Here is the hadoop streaming task starting command with hadoop-2.5.2
> hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
>     -files mapper.sh
>     -archives /home/hadoop/tmp/test.tgz#test \
>     -D mapreduce.job.maps=1 \
>     -D mapreduce.job.reduces=1 \
>     -input "/test/test.txt" \
>     -output "/res/" \
>     -mapper "sh mapper.sh" \
>     -reducer "cat"
> and "mapper.sh"
> cat > /dev/null
> ls -l test
> exit 0
> in "test.tgz" there is two files "test.1.txt" and "test.2.txt"
> echo "abcd" > test.1.txt
> echo "efgh" > test.2.txt
> tar zcvf test.tgz test.1.txt test.2.txt
> the output from above task
> lrwxrwxrwx 1 hadoop hadoop     71 Feb  8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz
> but what desired may be like this
> -rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
> -rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt
> so, why test.tgz has not been untarred automatically as document says, and or there is
actually another way makes the "tgz" being untarred



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message