spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: [Pyspark 2.4] Best way to define activity within different time window
Date Sun, 09 Jun 2019 17:17:12 GMT
Depending on what accuracy is needed, hyperloglogs can be an interesting alternative

> Am 09.06.2019 um 15:59 schrieb big data <>:
> From m opinion, Bitmap is the best solution for active users calculation. Other solution
almost bases on count(distinct) calculation process, which is more slower.
> If you 've implemented Bitmap solution including how to build Bitmap, how to load Bitmap,
then Bitmap is the best choice.
>> 在 2019/6/5 下午6:49, Rishi Shah 写道:
>> Hi All,
>> Is there a best practice around calculating daily, weekly, monthly, quarterly, yearly
active users?
>> One approach is to create a window of daily bitmap and aggregate it based on period
later. However I was wondering if anyone has a better approach to tackling this problem..

>> -- 
>> Regards,
>> Rishi Shah

View raw message