Message view | « Date » · « Thread » |
---|---|
Top | « Date » · « Thread » |
From | Aakash Basu <aakash.spark....@gmail.com> |
Subject | Data growth vs Cluster Size planning |
Date | Mon, 11 Feb 2019 09:40:32 GMT |
Hi, I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master 18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes* for a *very large ML tuning based job* (training). Now, my requirement is to run the *same operation on 3M records*. Any idea on how we should proceed? Should we go for a vertical scaling or a horizontal one? How should this problem be approached in a stepwise/systematic manner? Thanks in advance. Regards, Aakash. | |
Mime |
|
View raw message |