hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16059) Use SASL Factories Cache to Improve Performance
Date Fri, 05 Apr 2019 06:02:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810519#comment-16810519
] 

Vinayakumar B commented on HADOOP-16059:
----------------------------------------

Thanks [~ayushtkn] for the contribution.

Above screenshots of profiling shows the clear difference in time consumed while loading the
SaslFactory.

As [~jojochuang] mentioned, it may not add much of value in case of RPCs interacting with
same RPC server continuosly as same RPC connection will be maintained. Only in case of client
is idle for 10 seconds (default) connection needs to be recreated.

Also, there are other cases in which this patch will help.
 # Same clients interacting with multiple RPC servers in not-so-frequent intervals.
 ** In this case, RPC connection to second server will be faster, as time to load the SASL
factory will be zero.
 # Clients connecting to DataNodes to read/write data without using cached connection.
 ** HDFS Client's will write data to DataNodes using TCP connection using new connection
everytime. There is NO connection cache for writeBlock() Op.
 ** For ReadBlock() op connection can be cached only after complete read of intended bytes.
Ex: In case of sequential read, client should consume entire block data.
 ** Socket cache capacity is limited ( 16 ) and expires quickly (4 sec) by default.
 ** HDFS Client is Non-data-local, then it might be getting different datanode's location
for each block, in this case, cache-hits will be less.

[~elgoiri] , I believe this change will help above case #2 more as that is more common.
Its evident in the above screenshot of *SaslParticipant.createClientSaslParticipant() and
S**aslParticipant.createServerSaslParticipant()* **takes far less time for same number of
connections.

Hope its clear.

> Use SASL Factories Cache to Improve Performance
> -----------------------------------------------
>
>                 Key: HADOOP-16059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16059
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>         Attachments: After-Dn.png, After-Read.png, After-Server.png, After-write.png,
Before-DN.png, Before-Read.png, Before-Server.png, Before-Write.png, HADOOP-16059-01.patch,
HADOOP-16059-02.patch, HADOOP-16059-02.patch, HADOOP-16059-03.patch, HADOOP-16059-04.patch
>
>
> SASL Client factories can be cached and SASL Server Factories and SASL Client Factories
can be together extended at SaslParticipant  to improve performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message