nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: NiFi flow provides 0 output on large files
Date Fri, 25 Sep 2015 16:38:51 GMT
If whatever it would mean is open source friendly it sounds like a
fine idea.  Seems unlikely we'd need to have something vendor
specific.  Jeff are there are docs you can direct us to for this?

On Fri, Sep 25, 2015 at 11:33 AM, Jeff <j.007ba7@gmail.com> wrote:
>
> Thanks of this info on the JIRA
>
> Does anyone have any input on the PutToConfluentKafka idea?
>
>
> On Sep 25, 2015, at 8:55 AM, Matt Gilman <matt.c.gilman@gmail.com> wrote:
>
> Yep. JIRA is already created [1] as well as other features we'll be
> supporting regarding queue management [2].
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-730
> [2] https://issues.apache.org/jira/browse/NIFI-108
>
> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ryan.ward2@gmail.com> wrote:
>>
>> This is actually very easy to overlook and miss. Often times we change the
>> file expiration on a queue to simply empty the queue.
>>
>> Could we add in a right click empty queue option, with an are you sure
>> prompt? Is there already a JIRA for this feature?
>>
>> Thanks,
>> Ryan
>>
>> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j.007ba7@gmail.com> wrote:
>>>
>>>
>>> That was a rookie mistake.
>>>
>>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>>> a log that states a flow file was expired?
>>>
>>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>>> taking advantage of the schema registry. I do not believe the current
>>> PutToKafka provides the ability to use this registry correct?   I’m curious
>>> if anyone is working on PutToConfluentKafka processor?
>>>
>>> Thanks for your help.
>>>
>>> Jeff
>>>
>>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <matt.c.gilman@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> What is the expiration setting on your connections? The little clock icon
>>> indicates that they are configured to automatically expire flowfiles of a
>>> certain age.
>>>
>>> Matt
>>>
>>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j.007ba7@gmail.com> wrote:
>>>>
>>>>
>>>> Hi Aldrin,
>>>>
>>>> After the DDA_Processor
>>>>
>>>> The below image shows that the GetFile Processed 174.6 MB and the
>>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>>> DDA_Processor box)
>>>>
>>>> <unknown.gif>
>>>>
>>>> The below image shows that the DDA_Processor is complete but data did
>>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>>
>>>> <unknown.gif>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> With regards to:
>>>>
>>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>>> the no other downstream processor shows movement."
>>>>
>>>> Are you referencing downstream processors starting immediately after the
>>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>>> ConvertJsonToAvro processor?
>>>>
>>>> In the case of starting immediately after the DDA Processor, as it is a
>>>> custom processor, we would need some additional information as to how this
>>>> processor is behaving.  In the case of the second condition, if you have
>>>> some additional context as to the format of the data that is problematic
to
>>>> what you are seeing (the effective "schema" of the JSON) would be helpful
in
>>>> tracking down the issue.
>>>>
>>>> Thanks!
>>>> Aldrin
>>>>
>>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com> wrote:
>>>>>
>>>>> Hi Adam,
>>>>>
>>>>>
>>>>> I have a flow that does the following;
>>>>>
>>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute
> PutFile
>>>>>
>>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>>> number of rows under ~15000 an output file is created.  Anything over,
the
>>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>>> processor shows movement.
>>>>>
>>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>>
>>>>> Thanks for your insight.
>>>>>
>>>>> Jeff
>>>>> <Mail Attachment.gif>
>>>>>
>>>>>
>>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com>
wrote:
>>>>>
>>>>> Jeff,
>>>>>
>>>>> This seems to be a bit different as the processor is showing data as
>>>>> having been written and there is a listing of one FlowFile of 381 MB
being
>>>>> transferred out from the processor.  Could you provide additional
>>>>> information as to how data is not being sent out in the manner anticipated?
>>>>> If you can track the issue down more, let us know.  May be helpful to
create
>>>>> another message to help us track the issues separately as we work through
>>>>> them.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Adam,
>>>>>
>>>>> Found a sizable JSON file to work against and have been doing some
>>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>>> process.  At cursory inspection, a good portion of processing seems to
be
>>>>> spent on validation.  There are some ways to tweak the strictness of
this
>>>>> with the supporting library, but will have to dive in a bit more.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’m having a very similar problem.  The process picks up the file,
a
>>>>>> custom processor does it’s thing but no data is sent out.
>>>>>>
>>>>>> <unknown.gif>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Mime
View raw message