spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From johnzeng <jo...@fossil.com>
Subject Looking for help about stackoverflow in spark
Date Fri, 01 Jul 2016 02:03:58 GMT
I am trying to load a 1 TB collection into spark cluster from mongo. But I am
keep getting stack overflow error  after running for a while.

I have posted a question in stackoverflow.com, and tried all advies they
have provide, nothing works...

how to load large database into spark
<http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark>  

I have tried:
1, use persist to make it MemoryAndDisk,  same error after running same
time.
2, add more instance,  same error after running same time.
3, run this script on another collection which is much smaller, everything
is good, so I think my codes are all right.
4, remove the reduce process, same error after running same time.
5, remove the map process,  same error after running same time.
6, change the sql I used, it's faster, but  same error after running shorter
time.
7,retrieve "_id" instead of "u_at" and "c_at",  same error after running
same time.

Anyone knows how many resources do I need to handle this 1TB database? I
only retrieve two fields form it, and this field is only 1% of a
document(because we have an array containing about 90+ embedded documents in
it.)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message