nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: High volume data with ExecuteSQL processor
Date Mon, 15 Oct 2018 14:03:11 GMT
Dnyaneshwar,

In the upcoming release of NiFi 1.8.0, ExecuteSQL will have Max Rows
Per Flow File [1]. In the meantime, you might try GenerateTableFetch,
it takes incoming flow files and generates SQL for X number of rows
per flow file (it is called Partition Size in that processor). The
limitation is that you can't provide your own SQL, it will generate
SQL based on the columns to return, any max-value columns specified,
and an optional custom WHERE clause. If you have complex SQL this
won't be a viable workaround, but if not it should do the trick for
now.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1251

On Mon, Oct 15, 2018 at 1:37 AM Dnyaneshwar Pawar
<dnyaneshwar_pawar@persistent.com> wrote:
>
> Hi Koji,
>
> As suggested, the "Max Rows Per Flow File" is not available for ExecuteSQL processor,
its available with QueryDatabaseTable processor. But we cannot use QueryDatabaseTable processor
as its not accepting upstream connections and we have requirement of accepting upstream connection
from other processors (e.g. HandleHTTPRequest processor). Please suggest how we can use ExecuteSQL
to process high volume data.
>
> -----Original Message-----
> From: Koji Kawamura <ijokarumawak@gmail.com>
> Sent: Tuesday, September 25, 2018 5:59 AM
> To: users@nifi.apache.org
> Subject: Re: High volume data with ExecuteSQL processor
>
> Hello,
>
> Did you try setting 'Max Rows Per Flow File' at ExecuteSQL processor?
> If the OOM happened when NiFi writes all results into a single FlowFile, then the property
can help breaking the result set into several FlowFiles to avoid that.
>
> Thanks,
> Koji
> On Fri, Sep 21, 2018 at 3:56 PM Dnyaneshwar Pawar <dnyaneshwar_pawar@persistent.com>
wrote:
> >
> > Hi,
> >
> >
> >
> > How to execute/process High volume data with ExecuteSQL processor:
> >
> >
> >
> > We tried to execute query for db2 database which has around 10 lakh
> > records. While executing this query
> >
> > we are getting OutOfMemory error and that request(flowfile) is stuck
> > in queue. When we restart nifi, it still stuck in queue and as soon as
> > we start nifi,
> >
> > we are again getting same error as it is stuck in queue. Is there any way to configure
retry for queue(connection to 2 processor).
> >
> >
> >
> > We also tried to change property for Flow File repository in
> > nifi.properties (nifi.flowfile.repository.implementation) to
> > 'org.apache.nifi.controller.repository.VolatileFlowFileRepository',
> >
> > This is removing flowfile in query while restarting nifi. But it has risk of data
loss in the event of power/machine failure for other processes.
> >
> > So please suggest how to execute high volume data query execution or any retry mechanism
available for queued flowfile.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Dnyaneshwar Pawar
> >
> >
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property
of Persistent Systems Ltd. It is intended only for the use of the individual or entity to
which it is addressed. If you are not the intended recipient, you are not authorized to read,
retain, copy, print, distribute or use this message. If you have received this communication
in error, please notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.

Mime
View raw message