spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nicolas.pa...@riseup.net>
Subject Re: Avro large binary read memory problem
Date Tue, 23 Jul 2019 17:19:01 GMT

On Tue, Jul 23, 2019 at 05:10:19PM +0000, Mario Amatucci wrote:
> https://spark.apache.org/docs/2.2.0/configuration.html#memory-management

thanks for the pointer, however, I tried almost every configuration and
the behavior tends to show that spark keeps things in memory instead of
releasing it 


On Tue, Jul 23, 2019 at 05:10:19PM +0000, Mario Amatucci wrote:
> https://spark.apache.org/docs/2.2.0/configuration.html#memory-management
> 
> MARIO AMATUCCI 
> Senior Software Engineer 
>  
> Office: +48 12 881 10 05 x 31463   Email: mario_amatucci@epam.com 
> Gdansk, Poland   epam.com 
>  
> ~do more with less~ 
>  
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or entity(ies) to which
it is addressed and contains information that is legally privileged and confidential. If you
are not the intended recipient, or the person responsible for delivering the message to the
intended recipient, you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. All unintended recipients are obliged to delete
this message and destroy any printed copies. 
>  
> 
> -----Original Message-----
> From: Nicolas Paris <nicolas.paris@riseup.net> 
> Sent: Tuesday, July 23, 2019 6:56 PM
> To: user@spark.apache.org
> Subject: Avro large binary read memory problem
> 
> Hi
> 
> I have those avro file with the schema id:Long, content:Binary
> 
> the binary are large image with a maximum of 2GB of size.
> 
> I d like to get a subset of row "where id in (...)"
> 
> Sadly I get memory errors even if the subset is 0 of size. It looks like the reader stores
the binary information until the heap size or the container is killed by yarn.
> 
> Any idea how to tune the memory management to avoid to get memory problem?
> 
> Thanks
> 
> -- spark 2.4.3
> 
> --
> nicolas
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

-- 
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message