nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aldrin Piri <aldrinp...@gmail.com>
Subject Re: NiFi flow provides 0 output on large files
Date Fri, 25 Sep 2015 13:55:08 GMT
Ryan,

Certainly something that needs to be addressed and has been voiced by a
number of the members in the community.  The flushing of a queue is an item
under the feature proposal for Interactive Queue Management [1].

[1]
https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management

On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ryan.ward2@gmail.com> wrote:

> This is actually very easy to overlook and miss. Often times we change the
> file expiration on a queue to simply empty the queue.
>
> Could we add in a right click empty queue option, with an are you sure
> prompt? Is there already a JIRA for this feature?
>
> Thanks,
> Ryan
>
> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j.007ba7@gmail.com> wrote:
>
>>
>> That was a rookie mistake.
>>
>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>> a log that states a flow file was expired?
>>
>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>> taking advantage of the schema registry. I do not believe the current
>> PutToKafka provides the ability to use this registry correct?   I’m curious
>> if anyone is working on PutToConfluentKafka processor?
>>
>> Thanks for your help.
>>
>> Jeff
>>
>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <matt.c.gilman@gmail.com> wrote:
>>
>> Jeff,
>>
>> What is the expiration setting on your connections? The little clock icon
>> indicates that they are configured to automatically expire flowfiles of a
>> certain age.
>>
>> Matt
>>
>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j.007ba7@gmail.com> wrote:
>>
>>>
>>> Hi Aldrin,
>>>
>>> After the DDA_Processor
>>>
>>> The below image shows that the GetFile Processed 174.6 MB and the
>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>> DDA_Processor box)
>>>
>>> <unknown.gif>
>>>
>>> The below image shows that the DDA_Processor is complete but data did
>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>
>>> <unknown.gif>
>>>
>>> Thanks
>>>
>>>
>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> With regards to:
>>>
>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>> the no other downstream processor shows movement."
>>>
>>> Are you referencing downstream processors starting immediately after the
>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>> ConvertJsonToAvro processor?
>>>
>>> In the case of starting immediately after the DDA Processor, as it is a
>>> custom processor, we would need some additional information as to how this
>>> processor is behaving.  In the case of the second condition, if you have
>>> some additional context as to the format of the data that is problematic to
>>> what you are seeing (the effective "schema" of the JSON) would be helpful
>>> in tracking down the issue.
>>>
>>> Thanks!
>>> Aldrin
>>>
>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>>
>>>> I have a flow that does the following;
>>>>
>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute >
PutFile
>>>>
>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>> number of rows under ~15000 an output file is created.  Anything over, the
>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>> processor shows movement.
>>>>
>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>
>>>> Thanks for your insight.
>>>>
>>>> Jeff
>>>> <Mail Attachment.gif>
>>>>
>>>>
>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> This seems to be a bit different as the processor is showing data as
>>>> having been written and there is a listing of one FlowFile of 381 MB being
>>>> transferred out from the processor.  Could you provide additional
>>>> information as to how data is not being sent out in the manner
>>>> anticipated?  If you can track the issue down more, let us know.  May be
>>>> helpful to create another message to help us track the issues separately
as
>>>> we work through them.
>>>>
>>>> Thanks!
>>>>
>>>> Adam,
>>>>
>>>> Found a sizable JSON file to work against and have been doing some
>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>> with the supporting library, but will have to dive in a bit more.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> I’m having a very similar problem.  The process picks up the file,
a
>>>>> custom processor does it’s thing but no data is sent out.
>>>>>
>>>>> <unknown.gif>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Mime
View raw message