nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: Ingesting golden gate messages to Hbase using Nifi
Date Wed, 07 Nov 2018 12:55:01 GMT
Sorry I meant RBA.GG <http://rba.gg/> has a bunch of tokens you can add to
your json file - you can even create your own. POS should be good and if
op_ts does not work for you, why not to generate your own timestamp using
POS? (Now() expression). You also add another token that identifies
transaction sequence number and order opts and then by transaction sequence
number. Please share what you will end up doing

On Tue, Nov 6, 2018, 01:55 Faisal Durrani <te04.0172@gmail.com wrote:

> Hi Boris,
>
> Thank you for your reply.  Let me try explaining my data flow in detail. I
> am receiving the GG transaction as JSON format through Kafka so I can only
> use the fields provided by the Kafka handler of GG ( Json plug-gable
> format). I think you meant RBA value instead of rbc. I don't think we can
> receive the RBA value in Json but there is a field called POS which is a
> concatenation of source trail file number and RBA. So probably we can use
> that in the Enforce order processor. But if we don't use the timestamp
> information then we will run into the Hbase versioning issue.  The idea
> behind using the Op_ts was to version each row of our target table and also
> help us with the DML operation. We are using the PK of each table as the
> row_key of target Hbase table. Every new transaction(updated/delete) of the
> table is logically inserted as a new row but since its the same pkey so we
> can see the version each row. The operation with the highest timestamp is
> the valid state of the row. I tested the enforce order processor with the
> kafka offset and it skips all the records which arrive later then the older
> offset which i don't understand why. If i decide to use the enforce order
> on POS and use default timestamp in hbase then it will skip ordering the
> the kafka messages arriving late and that will cause the unsync. In
> addition to this I've read the Enforce order only orders the row on a
> single node while we have a 5 node cluster. So I'm not sure how do i
> combine all the flow files together on a single node? ( I know how to
> distribute them i.e is by using S2S-RPG)
>
> I hope i have been able to explain my situation. Kindly let me know of
> your views on this.
>
> Regards,
> Faisal
>
>
> On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin <boris@boristyukin.com>
> wrote:
>
>> Hi Faisal, I am not Timothy, but you raise an interesting problem we
>> might face soon as well. I did not expect the situation you described and I
>> thought transaction time would be different.
>>
>> Our intent was to use op_ts to enforce order but another option is to use
>> GG rbc value or  oracle rowscn value  - did you consider them? GG
>> RBC should identify unique transaction and within every transaction, you
>> can also get operation# within a transaction. Also you can get trail file#
>> and trail file position. GG is really powerful and gives you a bunch of
>> data elements that you can enable on your message.
>>
>>
>> https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_fileformats.htm#GWUAD735
>>
>> Logdump tool is an awesome tool to look into your trail files and see
>> what's in there.
>>
>> Boris
>>
>>
>>
>> On Mon, Nov 5, 2018 at 3:07 AM Faisal Durrani <te04.0172@gmail.com>
>> wrote:
>>
>>> Hi Timothy ,
>>>
>>> Hope you are doing well. We have been using your data flow(
>>> https://community.hortonworks.com/content/kbentry/155527/ingesting-golden-gate-records-from-apache-kafka-an.html#
>>> )
>>> with slight modifications to store the data in Hbase. To version the
>>> rows we have been using Op_ts of golden gate json . But now we have found
>>> that multiple transactions can have the same Op_ts.  e.g. both update or
>>> delete can have the same Op_ts and if they arrive out of order to the
>>> PutHbaseJson processor then it can cause the target table to go out of
>>> sync. I am using the a cluster of nifi nodes so i cannot use Enforceorder
>>> processor to order the kafka messages as i understand it only order the
>>> flow files on a single node only and not across the cluster. Additionally
>>> we have a separate topic for each table and we have several consumer
>>> groups. I tried using the Current_ts column of the golden gate message but
>>> then if GG abends and restart the replication it will send the past data
>>> with the newer current_ts which will also cause the un-sync. I was
>>> wondering if you can give any idea so that we can order our transaction
>>> correctly.
>>>
>>> Regards,
>>> Faisal
>>>
>>

Mime
View raw message