spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Pyspark dataframe read
Date Tue, 06 Oct 2015 08:55:33 GMT
i personally find the comma separated paths feature much more important
than commas in paths (which one could argue you should avoid).

but assuming people want to keep commas as legitimate characters in paths:
https://issues.apache.org/jira/browse/SPARK-10185
https://github.com/apache/spark/pull/8416



On Tue, Oct 6, 2015 at 4:31 AM, Reynold Xin <rxin@databricks.com> wrote:

> I think the problem is that comma is actually a legitimate character for
> file name, and as a result ...
>
>
> On Tuesday, October 6, 2015, Josh Rosen <rosenville@gmail.com> wrote:
>
>> Could someone please file a JIRA to track this?
>> https://issues.apache.org/jira/browse/SPARK
>>
>> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> i ran into the same thing in scala api. we depend heavily on comma
>>> separated paths, and it no longer works.
>>>
>>>
>>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snuderl@gmail.com> wrote:
>>>
>>>> Hello everyone.
>>>>
>>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>>
>>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>>> paths specified in job.
>>>>
>>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>>
>>>> Blaž
>>>>
>>>
>>>
>>

Mime
View raw message