nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Am I doing this right? with regarding to records
Date Tue, 01 May 2018 20:30:21 GMT
I see, so the partition is helping if you want to route based on the
partition and is also giving you the attribute.

Right now it is the bottleneck because it is one record per flow file,
but once you can stop splitting then presumably you could partition
one big flow into a flow file per state and it would be much better,
and you probably wouldn't need MergeRecord anymore.

On Tue, May 1, 2018 at 4:21 PM, Juan Sequeiros <hellojuan@gmail.com> wrote:
> Thanks Bryan.
> I am partitioning off a key "/state[0]"
> And add a route called "STATE" for "/state[0]"
> Then MergeRecord using correlation attribute: STATE
>
> PartitionRecord also adds an attribute with the value it partitioned so I'll
> get an attribute STATE="FL" for example and after I Merge them I can ingest
> in to something like /data/$STATE/
>
> I think I have to look closer my schema.
>
> On Tue, May 1, 2018 at 3:59 PM Bryan Bende <bbende@gmail.com> wrote:
>>
>> Unfortunately the current JSON record readers are not expecting a JSON
>> document per line because technically that is not a valid JSON
>> document itself. Your file would have to be represented as an array of
>> documents like [ doc1, doc2, doc3, ...]
>>
>> There is a PR up to support the per-line JSON document though:
>> https://github.com/apache/nifi/pull/2640
>>
>> In both of your examples, if you are splitting before partitioning,
>> then what is the partitioning accomplishing?
>>
>> If you had the changes in the PR above then the goal would be to not
>> use SplitRecord... you would just send GetFile -> PartitionRecord ->
>> to whatever else.
>>
>>
>> On Tue, May 1, 2018 at 3:34 PM, Juan Sequeiros <hellojuan@gmail.com>
>> wrote:
>> > Hello all,
>> >
>> > I have one file on local disk with thousands of lines each representing
>> > valid JSON object.
>> > My flow is like this:
>> >
>> > GetFile > SplitText > PartitionRecord ( based on a key ) >  MergeRecord
>> > >
>> > PutElasticSearchRecord
>> >
>> > This works well, however, I seem to bottleneck at PartitionRecord
>> >
>> > So I looked at using
>> > GetFile > ConvertRecord > SplitRecord > PartitionRecord
>> >
>> > But it seems to only convert the first line of the content from my
>> > GetFile.
>> >
>> > Am I missing something?
>> >
>> > I have a bottleneck that could very well be a system resource issue, but
>> > still, what is the best way to take a file with lines of JSON and
>> > convert
>> > them into records? I assume its through the record readers and writers,
>> > and
>> > then its implied that it converts it "object" based on the AvroSchema (
>> > in
>> > my case)?
>> >
>> >

Mime
View raw message