nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Kawamura <ijokaruma...@gmail.com>
Subject Re: Slow FTP and SFTP nifi transfer rates
Date Mon, 11 Sep 2017 06:54:12 GMT
Thanks Gino for confirming that.

I've submitted a JIRA and PR.
https://issues.apache.org/jira/browse/NIFI-4375

Tried to find something that can improve PutSFTP, but to no avail so far.
NIFI-4375 only addresses PutFTP processor.

On Sat, Sep 9, 2017 at 7:54 AM, Gino Lisignoli <glisignoli@gmail.com> wrote:
> Just built 1.4.0-SNAPSHOT and added in client.setBufferSize(16 * 1024);
> This fixed my problem straight away! Hope it makes it into 1.4.0.
>
> On Sat, Sep 9, 2017 at 12:07 AM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>> Nope.  That would be specific to these using commons net.
>>
>> Nice work koji and Gino!
>>
>>
>> On Sep 8, 2017 6:54 AM, "Gino Lisignoli" <glisignoli@gmail.com> wrote:
>>
>> Wow that sounds promising! would that also be the same for any other
>> get/put processors?
>>
>> On Fri, Sep 8, 2017 at 7:47 PM, Koji Kawamura <ijokarumawak@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> Just a quick update. I've tested
>>> commons-net-3.3::org.apache.commons.net.ftp.FTPClient without NiFi
>>> code.
>>> Here is the test code I used.
>>> https://gist.github.com/ijokarumawak/f5a329e53901bf2be7c19aa531094abd
>>>
>>> NiFi doesn't set its BufferSize currently, and default is only 1KB.
>>> To send 10MB file
>>>
>>> # BufferSize = 1KB (default)
>>> about 8 sec
>>>
>>> # BufferSize = 16KB
>>> about 300 ms
>>>
>>> I'm going to create a JIRA to add a processor property to specify buffer
>>> size.
>>> Also, will test SFTP.
>>> Thanks again for highlighting the issue!
>>>
>>> Koji
>>>
>>> On Fri, Sep 8, 2017 at 8:48 AM, Koji Kawamura <ijokarumawak@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Thanks for clarifying that the number of files is not significant.
>>> > I looked at the PutFTP and FTPTransfer source code, and found that it
>>> > makes few calls to a FTP server in addition to send a file:
>>> >
>>> > 1. Sending a file as a temporal file
>>> > 2. Update modification time, if 'Last Modified Time' is set
>>> > 3. chmod if 'Permissions' is set
>>> > 4. Rename the temporal file
>>> >
>>> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/FTPTransfer.java#L379
>>> >
>>> > PutSFTP and SFTPTransfer does followings additionally:
>>> > 5. chown if 'Remote Owner' is set
>>> > 6. chgrp if 'Remote Group' is set
>>> >
>>> > I wonder if those additional invocations add more latency.
>>> >
>>> > Also, it'd be helpful if you can write simple Java code using the
>>> > underlying (S)FTP client libraries without NiFi layer to investigate
>>> > if NiFi implementation can be improved, or the performance difference
>>> > come from library implementation.
>>> >
>>> > commons-net-3.3::org.apache.commons.net.ftp.FTPClient for FTP
>>> > and
>>> > jsch-0.1.54::com.jcraft.jsch.ChannelSftp for SFTP
>>> >
>>> >
>>> > I will try to do that at my end when I have time, but it'd be very
>>> > helpful if you can do that since you already have testing environment
>>> > and base metrics.
>>> >
>>> > Thanks!
>>> > Koji
>>> >
>>> >
>>> > On Thu, Sep 7, 2017 at 6:30 PM, Gino Lisignoli <glisignoli@gmail.com>
>>> > wrote:
>>> >> Hi
>>> >>
>>> >> I monitor the send rates using collectd and grafana. It doesn't seem
>>> >> to
>>> >> matter if I send 10,000 10MB files or 100 1GB files, the maximum
>>> >> throughput
>>> >> rate of nifi PutFTP and PutSFTP remain the same. 300Mbps and 1Gbs
>>> >>
>>> >> As mention above, the weird thing is when I send files though ftp and
>>> >> sftp
>>> >> (without nifi) then the rates are much better.
>>> >>
>>> >> It's really odd the the rates are significantly slower in NIFI.
>>> >>
>>> >> On Thu, Sep 7, 2017 at 5:45 PM, Koji Kawamura <ijokarumawak@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hello Gino,
>>> >>>
>>> >>> Thanks for sharing your findings on FTP performance.
>>> >>>
>>> >>> How did you measure send rate from NiFi to your FTP server?
>>> >>>
>>> >>> Sending multiple FlowFiles would provide less throughput compared
to
>>> >>> sending one big FlowFile, as PutFTP and PutSFTP make connection
to
>>> >>> each incoming FlowFile. The overhead of establishing connection
each
>>> >>> time might be the performance difference you see with mput command.
>>> >>>
>>> >>> Those processors can decide which FTP servers to use based on
>>> >>> incoming
>>> >>> FlowFiles' attribute when NiFi Expression Language is used.
>>> >>>
>>> >>> If that's the case, there are some room for performance improvement
>>> >>> by
>>> >>> keeping underlying FTP(S) client instance so that it can be reused
>>> >>> among multiple onTrigger() call.
>>> >>>
>>> >>> A possible work-around would be using MergeContent beforehand and
>>> >>> send
>>> >>> it as a single file, if your use-case allows that.
>>> >>>
>>> >>> Thanks,
>>> >>> Koji
>>> >>>
>>> >>> On Thu, Sep 7, 2017 at 12:15 PM, Gino Lisignoli
>>> >>> <glisignoli@gmail.com>
>>> >>> wrote:
>>> >>> > I have this weird issue with PutFTP and PutSFTP transfer rates.
>>> >>> >
>>> >>> > What I am seeing is that no matter what files I transfer from
One
>>> >>> > server
>>> >>> > to
>>> >>> > another over a single connection the maximum rates I can send
are
>>> >>> > 300Mbps
>>> >>> > for PutFTP and 1Gbps for PutSFTP.
>>> >>> >
>>> >>> > The sending nifi is installed on Centos 7, running on a Dell
R730,
>>> >>> > 190GB
>>> >>> > Ram, 16 Cores @ 2.4GHz and 4x10Gb nics bonded. The sending
nifi has
>>> >>> > it's
>>> >>> > content repository on a ramdisk, and the receiving server is
>>> >>> > receiving
>>> >>> > to a
>>> >>> > ramdisk (for testing, to remove disk IO out of the equation).
>>> >>> >
>>> >>> > When I do a ftp send manually (without nifi) with mput I get
ftp
>>> >>> > rates
>>> >>> > of
>>> >>> > ~8Gbs and sftp rates of 2.2Gbs (Which seems slow anyway).
>>> >>> >
>>> >>> > I would have expected transfer rates similar with nifi.
>>> >>> >
>>> >>> > Is there any way to work out why these rates are so much slower,
>>> >>> > but
>>> >>> > also so
>>> >>> > consistent? I'm using Nifi-1.30
>>> >>
>>> >>
>>
>>
>>
>

Mime
View raw message