sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Arenas...@ckarenas.com>
Subject Re: Submitting Sqoop jobs in parallel
Date Sun, 22 Mar 2015 23:41:36 GMT
Solved by changing Hive metastore to postgresql instead of derby.

On Fri, Mar 6, 2015 at 8:16 AM, Jack Arenas <j@ckarenas.com> wrote:

> Abe et al,
>
> How do you mean? Isn't that the point of the --hive-table flag? Based on
> the schema add the table to the proper schema.db folder in <path>/Hive/Lab
> for each sqoop job? I'm not sure what you mean... I tried setting
> --target-dir as <path>/Hive/Lab/<schema>.db/<table> and yes it's able
to
> ingest the data into HDFS into that folder but hive doesn't recognize that
> the tables are there. It's like the step that actually links the data to
> hive breaks when parallelized.
>
> Hope this info helps.
>
> Best,
> Jack
>
> On Mar 3, 2015, at 8:46 PM, Abraham Elmahrek <abe@cloudera.com> wrote:
>
> Jack,
>
> Just a thought... but have you tried using --target-dir?
>
> -Abe
>
> On Mon, Mar 2, 2015 at 12:24 PM, Jack Arenas <j@ckarenas.com> wrote:
>
>> Hi team,
>>
>> I'm building an ETL tool that requires me to pull in a bunch of tables
>> from a db into HDFS and I'm currently doing this sequentially using Sqoop.
>> I figured it might be a faster to submit the Sqoop jobs in parallel, that
>> is with a predefined thread pool (currently trying 8) because it took about
>> two hours to ingest 150 tables of various sizes, frankly not very big
>> tables as this is POC. So sequentially this works fine, but as soon as I
>> add parallelism, roughly 75% of my Sqoop jobs fail, and I'm not saying that
>> they don't ingest any data, simply that the data gets stuck in the staging
>> area (I.e /user/username) as opposed to the proper hive table (I.e
>> /user/username/Hive/Lab). Has anyone experienced this before? I figure I
>> may be able to shoot a separate process that moves the hive tables from the
>> staging area into the hive table area, but I'm not sure if that process
>> would simply be to move the tables or if there is more involved.
>>
>> Thanks!
>>
>> Specs: HDP 2.1, Sqoop 1.4.4.2
>>
>> Cheers,
>> Jack
>>
>>
>


-- 
Jack Arenas
Data Engineer & Web Developer
j@ckarenas.com
+1.805.259.8059
<http://www.linkedin.com/in/jackarenas>

Mime
View raw message