spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: filter by dict() key in pySpark
Date Wed, 16 Mar 2016 00:31:23 GMT
Another solution could be using left-semi join:

keys = sqlContext.createDataFrame(dict.keys())
DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi")

On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter <franc.carter@gmail.com> wrote:
>
> A colleague found how to do this, the approach was to use a udf()
>
> cheers
>
> On 21 February 2016 at 22:41, Franc Carter <franc.carter@gmail.com> wrote:
>>
>>
>> I have a DataFrame that has a Python dict() as one of the columns. I'd
>> like to filter he DataFrame for those Rows that where the dict() contains a
>> specific value. e.g something like this:-
>>
>>     DF2 = DF1.filter('name' in DF1.params)
>>
>> but that gives me this error
>>
>> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
>> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>>
>> How do I express this correctly ?
>>
>> thanks
>>
>> --
>> Franc
>
>
>
>
> --
> Franc

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message