nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Weeks <swe...@weeksconsulting.us>
Subject RE: Adding HBase Support for AtomicDistributedMapCacheClient
Date Sat, 04 May 2019 16:01:02 GMT
Another item that seems to fail is the call to DiskUtils.deleteRecursively(rootFile); in TestFileSystemRepository.
Is there a reason why rootFile.delete() isn't sufficient? It deletes directories as well.

Thanks
Shawn Weeks

-----Original Message-----
From: Shawn Weeks <sweeks@weeksconsulting.us> 
Sent: Saturday, May 4, 2019 10:02 AM
To: dev@nifi.apache.org
Subject: RE: Adding HBase Support for AtomicDistributedMapCacheClient

I discovered what appears to be a bug while compiling on Windows.

Line 46 of NiFiGroovyTest.groovy should be

private static final String TEST_RES_PATH = Paths.get(NiFiGroovyTest.getClassLoader().getResource(".").toURI()).toString()

Instead of

private static final String TEST_RES_PATH = NiFiGroovyTest.getClassLoader().getResource(".").toURI().getPath()

See posts like https://stackoverflow.com/questions/43972777/exception-in-thread-main-java-nio-file-invalidpathexception-illegal-char
for an explanation.

Thanks
Shawn Weeks

-----Original Message-----
From: Shawn Weeks <sweeks@weeksconsulting.us>
Sent: Saturday, May 4, 2019 9:07 AM
To: dev@nifi.apache.org
Subject: RE: Adding HBase Support for AtomicDistributedMapCacheClient

I've created Pull Request https://github.com/apache/nifi/pull/3462 for this change. I'm still
doing some testing and it might not actually work right but I wanted some other folks to be
able to see it. If anyone knows how to do include timestamp in a checkAndPut for HBase 1.x
let me know and I'll implement it.

Thanks
Shawn Weeks

-----Original Message-----
From: Bryan Bende <bbende@gmail.com>
Sent: Thursday, April 25, 2019 7:05 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Should be available through the existing scan methods, they take a ResultHandler which gets
passed an array of ResultCells, and each one has the timestamp. 

> On Apr 25, 2019, at 7:52 PM, Shawn Weeks <sweeks@weeksconsulting.us> wrote:
> 
> I haven't looked at the other side of equation yet and that's how to get the timestamp
on fetch. That will probably require a change or new scan method.
> 
> Thanks
> Shawn
> 
> -----Original Message-----
> From: Bryan Bende <bbende@gmail.com>
> Sent: Thursday, April 25, 2019 4:29 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> 
> Also just realized that we do have two versions of the HBase DMC client service, so they
could each do different things.
> 
> The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService
could call the method.
> 
> In this approach the 1_1_2 client service could throw unsupported for the new method
since it would never be used.
> 
> On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bbende@gmail.com> wrote:
>> 
>> Thanks, I'm following now...
>> 
>> I think adding the new method to the interface and throwing 
>> UnsupportedOperationException for 1_1_2, or using the original 
>> checkAndPut and implementing it in both services, would both be fine 
>> solutions.
>> 
>> I guess another variation might be to introduce the new method in the 
>> interface, but in the 1_1_2 implementation just delegate back to the 
>> original checkAndPut and ignore the timestamp, and document that it 
>> isn't used in that implementation. I don't love this, but it does 
>> allow both services to implement the functionality and still leverage 
>> the better solution for 2_x.
>> 
>> 
>> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sweeks@weeksconsulting.us> wrote:
>>> 
>>> Here is what I think the new checkAndPut or checkAndMutate method would look
like. This also shows what the new mutate api looks like.
>>> 
>>>    @Override
>>>    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family,
byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>>>        try (final Table table = connection.getTable(TableName.valueOf(tableName)))
{
>>>            Put put = new Put(rowId);
>>>            put.addColumn(
>>>                    column.getColumnFamily(),
>>>                    column.getColumnQualifier(),
>>>                    column.getBuffer());
>>>            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>>>        }
>>>    }
>>> 
>>> If the atomic guarantee for the original checkAndPut is good enough then there
is no reason I can't implement the atomic map cache for both versions of HBase.
>>> 
>>> Thanks
>>> Shawn
>>> 
>>> -----Original Message-----
>>> From: Bryan Bende <bbende@gmail.com>
>>> Sent: Thursday, April 25, 2019 12:39 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Adding HBase Support for 
>>> AtomicDistributedMapCacheClient
>>> 
>>> I'm not totally if would matter if there were changes in between, as long as
the current value is what we thought it was then the changes we are sending back should be
accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A
retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything,
then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>>> 
>>> I can see the argument for using the timestamp though... can you show the method
signature of the new checkAndMutate method that would need to be added to the client service,
and also which method of the HBase client it needs to call?
>>> 
>>> Just so I can get an idea of the differences between 1.x and 2.x.
>>> 
>>> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sweeks@weeksconsulting.us>
wrote:
>>>> 
>>>> While checkAndPut is atomic as it's built now it doesn't support also checking
the timestamp range which is included in the new checkAndMutate API. I had planned on using
the cell's timestamp as the revision along with the value to ensure not only that the value
hadn't been changed but that there hadn't been changes in between that just happened to put
the value back.
>>>> 
>>>> As I was looking at everything I had another question. Why is the cache currently
using a scan instead of a get to fetch values from HBase. It seems like that would be much
less performant considering we know the row key we're looking for.
>>>> 
>>>> 
>>>> Thanks
>>>> Shawn
>>>> 
>>>> -----Original Message-----
>>>> From: Bryan Bende <bbende@gmail.com>
>>>> Sent: Thursday, April 25, 2019 11:56 AM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Adding HBase Support for 
>>>> AtomicDistributedMapCacheClient
>>>> 
>>>> Can it not be done with the existing checkAndPut method? [1]
>>>> 
>>>> I think if you use the value as the revision it should work. Would be similar
to how the Redis implementation works [2].
>>>> 
>>>> [1]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
>>>> tand
>>>> ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
>>>> e/ni
>>>> fi/hbase/HBaseClientService.java#L65
>>>> [2]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
>>>> edis
>>>> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
>>>> serv
>>>> ice/RedisDistributedMapCacheClientService.java#L271
>>>> 
>>>> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sweeks@weeksconsulting.us>
wrote:
>>>>> 
>>>>> I'll need to add a check and mutate method to the HBaseClientService
Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing
interface and just make it raise an exception if you try and use it against hbase 1? While
Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is
part of how I was going to implement the revision requirement for AtomicMapCache.
>>>>> 
>>>>> Thanks
>>>>> Shawn
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Bryan Bende <bbende@gmail.com>
>>>>> Sent: Thursday, April 25, 2019 9:11 AM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Adding HBase Support for 
>>>>> AtomicDistributedMapCacheClient
>>>>> 
>>>>> I'm not aware of a JIRA, so I'd say go for it.
>>>>> 
>>>>> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sweeks@weeksconsulting.us>
wrote:
>>>>>> 
>>>>>> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate
functionality and I was wondering if there is already a Jira for this. Otherwise I might make
an attempt at it. It would be good to be able to support Wait/Notify and other things that
need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a
Hadoop Cluster.
>>>>>> 
>>>>>> Thanks
>>>>>> Shawn


Mime
View raw message