sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilg...@buffalo.edu>
Subject Re: Importing more than one column family in Hbase through Sqoop
Date Thu, 29 Mar 2012 22:50:38 GMT
Hi Kathleen,

Here is the jira i filed for this stuff:
https://issues.apache.org/jira/browse/SQOOP-472

Thanks,
Anil Gupta

On Mon, Mar 19, 2012 at 12:58 PM, Kathleen Ting <kathleen@apache.org> wrote:

> Anil -
>
> Understood. As it happens, the HBase release that supported atomicity came
> after the Sqoop release that included HBase integration, hence the
> limitation.
>
> Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI
> way to let the user specify multiple column families.
>
> Regards, Kathleen
>
> On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <anilgupt@buffalo.edu> wrote:
>
>> Hi Kathleen,
>>
>> Sorry for the delayed reply as i started working on HBase rather than
>> Sqoop.
>> Here is an example code from the book "HBase:The Definitive Guide" which
>> will show that it is possible to load data into more than one column family
>> through java api which was exactly the point i was trying to make.
>>
>> Have a look at these two classes:
>>
>> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java
>>
>> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java
>>
>> Please let me know if you have further questions.
>>
>> Thanks,
>> Anil
>>
>> On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <kathleen@cloudera.com>wrote:
>>
>>> Hi Anil,
>>>
>>> re: Is the above scenario not possible in Hbase Java api?
>>> I would suggest asking that on user@hbase.apache.org.
>>>
>>> Thanks,
>>> Kathleen
>>>
>>> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <anilgupt@buffalo.edu>wrote:
>>>
>>>> Hi Kathleen,
>>>>
>>>> I think my previous messages were misinterpreted, in previous message i
>>>> was talking about generating separate put statement for separate
>>>> columnfamily. I am having hard time understanding how this would violate
>>>> the Hbase atomicity rule?
>>>>
>>>> For instance, on hbase shell my put statement would be like this for
>>>> two column family:
>>>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks'
>>>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545'
>>>>
>>>> Similarly, this can be achieved by using java api of HBase which sqoop
>>>> is using. Is the above scenario not possible in Hbase Java api?
>>>>
>>>> Thanks,
>>>> Anil
>>>>
>>>>
>>>>
>>>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <kathleen@cloudera.com>wrote:
>>>>
>>>>> Hi Anil -
>>>>>
>>>>> Good question and sorry for any confusion earlier. To be sure, because
>>>>> HBase permits atomic operations across a single column family only, Sqoop
>>>>> can not support multiple column families.
>>>>>
>>>>> Regards, Kathleen
>>>>>
>>>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <anilgupt@buffalo.edu>wrote:
>>>>>
>>>>>> Hi Kathleen,
>>>>>>
>>>>>> Yes, that is always an option. Thanks for suggestion.
>>>>>>
>>>>>> I am a beginner at HBase. However, I was thinking of cutting down
the
>>>>>> time to dump the data from Database. If i do it twice(assuming i
have 2
>>>>>> column families) then it increases the time of load the entire HBase
table.
>>>>>> AFAIK, Sqoop generates put statements to import data into HBase.
If
>>>>>> we can generate put statements for more than one column family. Would
it
>>>>>> violate the atomicity principle of HBase? I went through the atomicity
>>>>>> section of http://hbase.apache.org/acid-semantics.html and I cant
>>>>>> find anything which would stop sqoop loading more than one column
family
>>>>>> and Hbase bulk load also allows more than one column family although
the
>>>>>> approach of  HBase bulk loading might be different from Sqoop. Could
you
>>>>>> provide me more insight?  Sorry, if my question is dumb.
>>>>>>
>>>>>> Thanks,
>>>>>> Anil Gupta
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <
>>>>>> kathleen@cloudera.com> wrote:
>>>>>>
>>>>>>> Hi Anil,
>>>>>>>
>>>>>>> Sqoop does not support multiple column families because HBase
only
>>>>>>> permits atomic operations.
>>>>>>>
>>>>>>> One workaround is to run two imports, specifying a different
column
>>>>>>> family each time.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kathleen
>>>>>>>
>>>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <anilgupta84@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I went through the User guide of Sqoop but i could not find
>>>>>>>> anything for importing more than one columnfamily in HBase.
Am i missing
>>>>>>>> something? Is it planned for future release?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks & Regards,
>>>>>>>> Anil Gupta
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anil Gupta
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
View raw message