hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Sautins <andy.saut...@returnpath.net>
Subject RE: passing timestamp into importtsv...
Date Mon, 28 Mar 2011 17:13:11 GMT

   Discouraging setting timestamps seems to make sense.  In our situation we bulk import ever
'x' minutes and if for some reason one of the older imports fails and has to be restarted
after a later import happens we would like to import the older records at the appropriate
timestamp before the timestamp of the later import.  It sounds like that may be one of the
situations that could trigger some internals edges cases, correct?  

   Also, just as a separate note since the timestamp is set in the Mapper if the import has
more than one mapper I wouldn't get a consistent timestamp for all the records for a given
load.  For our use case it is helpful to be able to identify all records associated with a
given import.

   I went ahead and added a JIRA ( HBASE-3705 ) and uploaded the basic patch.  I'll update
the documentation as well.  



-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, March 28, 2011 10:51 AM
To: user@hbase.apache.org
Subject: Re: passing timestamp into importtsv...

I have two thoughts about it:

1- We generally discourage users setting their own timestamps since it
messes with the internals in some edge cases. Adding this
functionality goes against that.
2- Almost every interface we offer lets users set their own
timestamps, so to be more consistent we should indeed offer it for

So I think you should open a jira and post your patch.


On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
<andy.sautins@returnpath.net> wrote:
>   We have been having a lot of success using the importtsv utility to load data into
HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue
we have run into is that we would like to assign a specific timestamp to the records associated
with the import.  The current ImportTsv.java class sets the timestamp to the current time
( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system
property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the
property is not set to use the current time.  This has been very helpful for us and allows
for  more control in setting the timestamps for imported records.
>   My question is is this useful functionality in general?  If so I'd be happy to submit
a JIRA and patch with the appropriate changes.
>   Thanks
>   Andy

View raw message