spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Column width limits?
Date Thu, 07 Aug 2014 04:31:40 GMT
Spark breaks data across machines at partition level, so realistic limit is
on the partition size.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Aug 7, 2014 at 8:41 AM, Daniel, Ronald (ELS-SDG) <
R.Daniel@elsevier.com> wrote:

>  Assume I want to make a PairRDD whose keys are S3 URLs and whose values
> are Strings holding the contents of those (UTF-8) files, but NOT split into
> lines. Are there length limits on those files/Strings? 1 MB? 16 MB? 4 GB? 1
> TB?
>
> Similarly, can such a thing be registered as a table so that I can use
> substr() to pick out pieces of the string?
>
>
>
> Thanks,
>
> Ron
>
>
>

Mime
View raw message