spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: SparkSQL 1.3.0 cannot read parquet files from different file system
Date Mon, 16 Mar 2015 10:43:23 GMT
Oh sorry, I misread your question. I thought you were trying something 
like |parquetFile(“s3n://file1,hdfs://file2”)|. Yeah, it’s a valid bug. 
Thanks for opening the JIRA ticket and the PR!


Cheng

On 3/16/15 6:39 PM, Cheng Lian wrote:

> Hi Pei-Lun,
>
> We intentionally disallowed passing multiple comma separated paths in 
> 1.3.0. One of the reason is that users report that this fail when a 
> file path contain an actual comma in it. In your case, you may do 
> something like this:
>
> |val  s3nDF  =  parquetFile("s3n://...
> ")
> val  hdfsDF  =  parquetFile("hdfs://...")
> val  finalDF  =  s3nDF.union(finalDF)
> |
>
> Cheng
>
> On 3/16/15 4:03 PM, Pei-Lun Lee wrote:
>
>> Hi,
>>
>> I am using Spark 1.3.0, where I cannot load parquet files from more than
>> one file system, say one s3n://... and another hdfs://..., which worked in
>> older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3.
>>
>> One way to fix this is instead of get a single FileSystem from default
>> configuration in ParquetRelation2, call Path.getFileSystem for each path.
>>
>> Here's the JIRA link and pull request:
>> https://issues.apache.org/jira/browse/SPARK-6351
>> https://github.com/apache/spark/pull/5039
>>
>> Thanks,
>> --
>> Pei-Lun
>>
> ​

​
​

​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message