spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: lower&upperBound not working/spark 1.3
Date Sun, 22 Mar 2015 20:02:27 GMT
I went over JDBCRelation#columnPartition() but didn't find obvious clue
(you can add more logging to confirm that the partitions were generated
correctly).

Looks like the issue may be somewhere else.

Cheers

On Sun, Mar 22, 2015 at 12:47 PM, Marek Wiewiorka <marek.wiewiorka@gmail.com
> wrote:

> ...I even tried setting upper/lower bounds to the same value like 1 or 10
> with the same result.
> cs_id is a column of the cardinality ~5*10^6
> So this is not the case here.
>
> Regards,
> Marek
>
> 2015-03-22 20:30 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
>
>> From javadoc of JDBCRelation#columnPartition():
>>    * Given a partitioning schematic (a column of integral type, a number
>> of
>>    * partitions, and upper and lower bounds on the column's value),
>> generate
>>
>> In your example, 1 and 10000 are for the value of cs_id column.
>>
>> Looks like all the values in that column fall within the range of 1 and
>> 1000.
>>
>> Cheers
>>
>> On Sun, Mar 22, 2015 at 8:44 AM, Marek Wiewiorka <
>> marek.wiewiorka@gmail.com> wrote:
>>
>>> Hi All - I try to use the new SQLContext API for populating DataFrame
>>> from jdbc data source.
>>> like this:
>>>
>>> val jdbcDF = sqlContext.jdbc(url =
>>> "jdbc:postgresql://localhost:5430/dbname?user=user&password=111", table =
>>> "se_staging.exp_table3" ,columnName="cs_id",lowerBound=1 ,upperBound =
>>> 10000, numPartitions=12 )
>>>
>>> No matter how I set lower and upper bounds I always get all the rows
>>> from my table.
>>> The API is marked as experimental so I assume there might by some bugs
>>> in it but
>>> did anybody come across a similar issue?
>>>
>>> Thanks!
>>>
>>
>>
>

Mime
View raw message