spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: input file from tar.gz
Date Tue, 29 Sep 2015 19:18:37 GMT
The syntax using '#' is not supported by hdfs natively.

YARN resource localization supports such notion. See
http://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html

Not sure about Spark.

On Tue, Sep 29, 2015 at 11:39 AM, Peter Rudenko <petro.rudenko@gmail.com>
wrote:

> Hi, i have a huge tar.gz file on dfs. This file contains several files,
> but i want to use only one of them as input. Is it possible to filter
> somehow a tar.gz schema, something like this:
> sc.textFile("hdfs:///data/huge.tar.gz#input.txt")
>
> Thanks,
> Peter Rudenko
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message