spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AlexanderModestov <>
Subject filter function works incorretly (Python)
Date Wed, 23 Aug 2017 15:40:16 GMT
Hello All!
I'm trying to filter some rows in my DataFrame.
I created a list with ids and I use the construction:
df_new = df.filter(df.user.isin(list_users))
The first (df) DataFrame consists on 29711562 rows but the new one -
OK, I've decided to use another one method:
df_new = df.join(df_ids, df.user==df_ids.user, how='inner').
df_ids is a dataframe where in rows ids (ids are unique). And I wanted to
find a common part of ids according to this method but again I got a new
dataframe which is bigger the previous one.
May be someone knows the right answer how to implement this in a right way?
Thank you!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message