nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <>
Subject Re: Ingesting golden gate messages to Hbase using Nifi
Date Tue, 06 Nov 2018 09:43:49 GMT

"So I'm not sure how do i combine all the flow files together on a single
node? ( I know how to distribute them i.e is by using S2S-RPG) "

I wanted to mention that it's now possible, with NiFi 1.8.0, to send back
flow files on a single node.


Le mar. 6 nov. 2018 à 07:56, Faisal Durrani <> a écrit :

> Hi Boris,
> Thank you for your reply.  Let me try explaining my data flow in detail. I
> am receiving the GG transaction as JSON format through Kafka so I can only
> use the fields provided by the Kafka handler of GG ( Json plug-gable
> format). I think you meant RBA value instead of rbc. I don't think we can
> receive the RBA value in Json but there is a field called POS which is a
> concatenation of source trail file number and RBA. So probably we can use
> that in the Enforce order processor. But if we don't use the timestamp
> information then we will run into the Hbase versioning issue.  The idea
> behind using the Op_ts was to version each row of our target table and also
> help us with the DML operation. We are using the PK of each table as the
> row_key of target Hbase table. Every new transaction(updated/delete) of the
> table is logically inserted as a new row but since its the same pkey so we
> can see the version each row. The operation with the highest timestamp is
> the valid state of the row. I tested the enforce order processor with the
> kafka offset and it skips all the records which arrive later then the older
> offset which i don't understand why. If i decide to use the enforce order
> on POS and use default timestamp in hbase then it will skip ordering the
> the kafka messages arriving late and that will cause the unsync. In
> addition to this I've read the Enforce order only orders the row on a
> single node while we have a 5 node cluster. So I'm not sure how do i
> combine all the flow files together on a single node? ( I know how to
> distribute them i.e is by using S2S-RPG)
> I hope i have been able to explain my situation. Kindly let me know of
> your views on this.
> Regards,
> Faisal
> On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin <>
> wrote:
>> Hi Faisal, I am not Timothy, but you raise an interesting problem we
>> might face soon as well. I did not expect the situation you described and I
>> thought transaction time would be different.
>> Our intent was to use op_ts to enforce order but another option is to use
>> GG rbc value or  oracle rowscn value  - did you consider them? GG
>> RBC should identify unique transaction and within every transaction, you
>> can also get operation# within a transaction. Also you can get trail file#
>> and trail file position. GG is really powerful and gives you a bunch of
>> data elements that you can enable on your message.
>> Logdump tool is an awesome tool to look into your trail files and see
>> what's in there.
>> Boris
>> On Mon, Nov 5, 2018 at 3:07 AM Faisal Durrani <>
>> wrote:
>>> Hi Timothy ,
>>> Hope you are doing well. We have been using your data flow(
>>> )
>>> with slight modifications to store the data in Hbase. To version the
>>> rows we have been using Op_ts of golden gate json . But now we have found
>>> that multiple transactions can have the same Op_ts.  e.g. both update or
>>> delete can have the same Op_ts and if they arrive out of order to the
>>> PutHbaseJson processor then it can cause the target table to go out of
>>> sync. I am using the a cluster of nifi nodes so i cannot use Enforceorder
>>> processor to order the kafka messages as i understand it only order the
>>> flow files on a single node only and not across the cluster. Additionally
>>> we have a separate topic for each table and we have several consumer
>>> groups. I tried using the Current_ts column of the golden gate message but
>>> then if GG abends and restart the replication it will send the past data
>>> with the newer current_ts which will also cause the un-sync. I was
>>> wondering if you can give any idea so that we can order our transaction
>>> correctly.
>>> Regards,
>>> Faisal

View raw message