nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Faisal Durrani <>
Subject Re: Ingesting golden gate messages to Hbase using Nifi
Date Tue, 06 Nov 2018 06:55:53 GMT
Hi Boris,

Thank you for your reply.  Let me try explaining my data flow in detail. I
am receiving the GG transaction as JSON format through Kafka so I can only
use the fields provided by the Kafka handler of GG ( Json plug-gable
format). I think you meant RBA value instead of rbc. I don't think we can
receive the RBA value in Json but there is a field called POS which is a
concatenation of source trail file number and RBA. So probably we can use
that in the Enforce order processor. But if we don't use the timestamp
information then we will run into the Hbase versioning issue.  The idea
behind using the Op_ts was to version each row of our target table and also
help us with the DML operation. We are using the PK of each table as the
row_key of target Hbase table. Every new transaction(updated/delete) of the
table is logically inserted as a new row but since its the same pkey so we
can see the version each row. The operation with the highest timestamp is
the valid state of the row. I tested the enforce order processor with the
kafka offset and it skips all the records which arrive later then the older
offset which i don't understand why. If i decide to use the enforce order
on POS and use default timestamp in hbase then it will skip ordering the
the kafka messages arriving late and that will cause the unsync. In
addition to this I've read the Enforce order only orders the row on a
single node while we have a 5 node cluster. So I'm not sure how do i
combine all the flow files together on a single node? ( I know how to
distribute them i.e is by using S2S-RPG)

I hope i have been able to explain my situation. Kindly let me know of your
views on this.


On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin <> wrote:

> Hi Faisal, I am not Timothy, but you raise an interesting problem we might
> face soon as well. I did not expect the situation you described and I
> thought transaction time would be different.
> Our intent was to use op_ts to enforce order but another option is to use
> GG rbc value or  oracle rowscn value  - did you consider them? GG
> RBC should identify unique transaction and within every transaction, you
> can also get operation# within a transaction. Also you can get trail file#
> and trail file position. GG is really powerful and gives you a bunch of
> data elements that you can enable on your message.
> Logdump tool is an awesome tool to look into your trail files and see
> what's in there.
> Boris
> On Mon, Nov 5, 2018 at 3:07 AM Faisal Durrani <> wrote:
>> Hi Timothy ,
>> Hope you are doing well. We have been using your data flow(
>> )
>> with slight modifications to store the data in Hbase. To version the rows
>> we have been using Op_ts of golden gate json . But now we have found that
>> multiple transactions can have the same Op_ts.  e.g. both update or delete
>> can have the same Op_ts and if they arrive out of order to the PutHbaseJson
>> processor then it can cause the target table to go out of sync. I am using
>> the a cluster of nifi nodes so i cannot use Enforceorder processor to order
>> the kafka messages as i understand it only order the flow files on a single
>> node only and not across the cluster. Additionally we have a separate topic
>> for each table and we have several consumer groups. I tried using the
>> Current_ts column of the golden gate message but then if GG abends and
>> restart the replication it will send the past data with the newer
>> current_ts which will also cause the un-sync. I was wondering if you can
>> give any idea so that we can order our transaction correctly.
>> Regards,
>> Faisal

View raw message