spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AlexanderModestov <AlexanderModes...@yandex.ru>
Subject filter function works incorretly (Python)
Date Wed, 23 Aug 2017 15:40:16 GMT
Hello All!
I'm trying to filter some rows in my DataFrame.
I created a list with ids and I use the construction:
df_new = df.filter(df.user.isin(list_users))
The first (df) DataFrame consists on 29711562 rows but the new one -
5394805.
OK, I've decided to use another one method:
df_new = df.join(df_ids, df.user==df_ids.user, how='inner').
df_ids is a dataframe where in rows ids (ids are unique). And I wanted to
find a common part of ids according to this method but again I got a new
dataframe which is bigger the previous one.
May be someone knows the right answer how to implement this in a right way?
Thank you!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filter-function-works-incorretly-Python-tp29099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message