nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <pierre.villard...@gmail.com>
Subject Re: Ingesting golden gate messages to Hbase using Nifi
Date Tue, 06 Nov 2018 09:43:49 GMT
Hi,

Regarding:
"So I'm not sure how do i combine all the flow files together on a single
node? ( I know how to distribute them i.e is by using S2S-RPG) "

I wanted to mention that it's now possible, with NiFi 1.8.0, to send back
flow files on a single node.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#load-balancing

Pierre

Le mar. 6 nov. 2018 à 07:56, Faisal Durrani <te04.0172@gmail.com> a écrit :

> Hi Boris,
>
> Thank you for your reply.  Let me try explaining my data flow in detail. I
> am receiving the GG transaction as JSON format through Kafka so I can only
> use the fields provided by the Kafka handler of GG ( Json plug-gable
> format). I think you meant RBA value instead of rbc. I don't think we can
> receive the RBA value in Json but there is a field called POS which is a
> concatenation of source trail file number and RBA. So probably we can use
> that in the Enforce order processor. But if we don't use the timestamp
> information then we will run into the Hbase versioning issue.  The idea
> behind using the Op_ts was to version each row of our target table and also
> help us with the DML operation. We are using the PK of each table as the
> row_key of target Hbase table. Every new transaction(updated/delete) of the
> table is logically inserted as a new row but since its the same pkey so we
> can see the version each row. The operation with the highest timestamp is
> the valid state of the row. I tested the enforce order processor with the
> kafka offset and it skips all the records which arrive later then the older
> offset which i don't understand why. If i decide to use the enforce order
> on POS and use default timestamp in hbase then it will skip ordering the
> the kafka messages arriving late and that will cause the unsync. In
> addition to this I've read the Enforce order only orders the row on a
> single node while we have a 5 node cluster. So I'm not sure how do i
> combine all the flow files together on a single node? ( I know how to
> distribute them i.e is by using S2S-RPG)
>
> I hope i have been able to explain my situation. Kindly let me know of
> your views on this.
>
> Regards,
> Faisal
>
>
> On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin <boris@boristyukin.com>
> wrote:
>
>> Hi Faisal, I am not Timothy, but you raise an interesting problem we
>> might face soon as well. I did not expect the situation you described and I
>> thought transaction time would be different.
>>
>> Our intent was to use op_ts to enforce order but another option is to use
>> GG rbc value or  oracle rowscn value  - did you consider them? GG
>> RBC should identify unique transaction and within every transaction, you
>> can also get operation# within a transaction. Also you can get trail file#
>> and trail file position. GG is really powerful and gives you a bunch of
>> data elements that you can enable on your message.
>>
>>
>> https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_fileformats.htm#GWUAD735
>>
>> Logdump tool is an awesome tool to look into your trail files and see
>> what's in there.
>>
>> Boris
>>
>>
>>
>> On Mon, Nov 5, 2018 at 3:07 AM Faisal Durrani <te04.0172@gmail.com>
>> wrote:
>>
>>> Hi Timothy ,
>>>
>>> Hope you are doing well. We have been using your data flow(
>>> https://community.hortonworks.com/content/kbentry/155527/ingesting-golden-gate-records-from-apache-kafka-an.html#
>>> )
>>> with slight modifications to store the data in Hbase. To version the
>>> rows we have been using Op_ts of golden gate json . But now we have found
>>> that multiple transactions can have the same Op_ts.  e.g. both update or
>>> delete can have the same Op_ts and if they arrive out of order to the
>>> PutHbaseJson processor then it can cause the target table to go out of
>>> sync. I am using the a cluster of nifi nodes so i cannot use Enforceorder
>>> processor to order the kafka messages as i understand it only order the
>>> flow files on a single node only and not across the cluster. Additionally
>>> we have a separate topic for each table and we have several consumer
>>> groups. I tried using the Current_ts column of the golden gate message but
>>> then if GG abends and restart the replication it will send the past data
>>> with the newer current_ts which will also cause the un-sync. I was
>>> wondering if you can give any idea so that we can order our transaction
>>> correctly.
>>>
>>> Regards,
>>> Faisal
>>>
>>

Mime
View raw message