hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: File descriptor leak, possibly new in CDH5.7.0
Date Mon, 23 May 2016 18:03:49 GMT
On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> Hey everyone,
> We are noticing a file descriptor leak that is only affecting nodes in our
> cluster running 5.7.0, not those still running 5.3.8.

Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs hbase-0.98.6+hadoop-2.5.0.

> I ran an lsof against
> an affected regionserver, and noticed that there were 10k+ unix sockets
> that are just called "socket", as well as another 10k+ of the form
> "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-<int>_1_<int>". The
> 2 seem related based on how closely the counts match.
> We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0 (we
> handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
> experience this issue. The 5.7.0 nodes *do. *We are holding off upgrading
> more regionservers until we can figure this out. I'm not sure if any
> intermediate versions between the 2 have the issue.
> We traced the root cause to a hadoop job running against a basic table:
> 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
> MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS => '50',
> This is very similar to all of our other tables (we have many).

You are doing MR against some of these also? They have different schemas?
No leaks here?

> However,
> it's regions are getting up there in size, 40+gb per region, compressed.
> This has not been an issue for us previously.
> The hadoop job is a simple TableMapper job with no special parameters,
> though we haven't updated our client yet to the latest (will do that once
> we finish the server side). The hadoop job runs on a separate hadoop
> cluster, remotely accessing the HBase cluster. It does not do any other
> reads or writes, outside of the TableMapper scans.
> Moving the regions off of an affected server, or killing the hadoop job,
> causes the file descriptors to gradually go back down to normal.
Any ideas?
Is it just the FD cache running 'normally'? 10k seems like a lot though.
256 seems to be the default in hdfs but maybe it is different in CM or in

What is your dfs.client.read.shortcircuit.streams.cache.size set to?

> Thanks,
> Bryan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message