sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Karandikar <siddharth.karandi...@gmail.com>
Subject Re: Zero rows imported while doing Mysql to Hive import
Date Fri, 05 Jul 2013 19:09:53 GMT
Hi Andy,


I have sqoop installed in  '
/root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0' and Hive in '
/root/siddharth/tools/hive-0.11.0-bin'.

I used to run following command from within sqoop's directory -
./bin/sqoop  import --connect jdbc:mysql://localhost/ClassicModels -table
Customers -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin
--hive-import --verbose --mysql-delimiters

This would create temporary HDFS file of given mysql table and then will
start Hive from the same directory to create the table and load data.
Hive's default behavior is to look for metastore_db in current directory
and if not available, create it and use it. Above command used to run
without errors. So I used to go in Hive's directory '
/root/siddharth/tools/hive-0.11.0-bin' and run hive to check the newly
created table by sqoop and never find it there. Issue was, when Hive is
started from its own directory, it again checks for existence of
metastore_db which is not there (as it is create in sqoops dir), so creates
a new one which has no reference of previously created table.

What could have also worked in this case is, after running sqoop's command,
just copy metastore_db from there in Hive's directory before starting Hive
shell.


-
Siddharth



On Fri, Jul 5, 2013 at 7:53 PM, Han Sen Tey <axemen_99@yahoo.com> wrote:

> Good day Siddharth,
>
>    Can you kindly elaborate further what the problem was and how do you
> resolve it ( configure location of metastore_db in hive ? ) ?  Thank you.
>
> Regards,
> Andy
>
>
>   ------------------------------
>  *From:* Siddharth Karandikar <siddharth.karandikar@gmail.com>
> *To:* user@sqoop.apache.org
> *Sent:* Friday, July 5, 2013 10:06:27 AM
> *Subject:* Re: Zero rows imported while doing Mysql to Hive import
>
> Hi Jarek,
>
> Problem solved. It was my error!
>
> I was rereading your response and looked closely into sqoop directory, I
> found a metastore_db directory there. Then I realized what you were talking
> about and what the problem was. My bad!
> As you mentioned, sqoop was doing its work perfectly fine, it was hive
> that was not knowing about new table getting created due to separate
> metastrore_db.
>
> Now I have configured location of metastore_db in hive and everything
> works great!  :)
>
> Thanks Jarek.
>
>
> Regards,
> Siddharth
>
>
>
>
> On Fri, Jul 5, 2013 at 7:17 PM, Siddharth Karandikar <
> siddharth.karandikar@gmail.com> wrote:
>
> Hi,
>
> When I tried doing this import-mysql-into-hive in 3 separate steps, it
> worked!
>
> Here is what I ran -
>
> 1) Import MySQL table in HDFS with command:
> ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop import
> --connect jdbc:mysql://localhost/ClassicModels -table Customers -m 1
> --verbose --fields-terminated-by '\t' --lines-terminated-by '\n'
>
> 2) Create a table in Hive with query: (Using same query found in --verbose
> log of failing sqoop run)
> hive> CREATE TABLE IF NOT EXISTS `Customers` ( `customerNumber` INT,
> `customerName` STRING, `contactLastName` STRING, `contactFirstName` STRING,
> `phone` STRING, `addressLine1` STRING, `addressLine2` STRING, `city`
> STRING, `state` STRING, `postalCode` STRING, `country` STRING,
> `salesRepEmployeeNumber` INT, `creditLimit` DOUBLE) COMMENT 'Imported by
> sqoop on 2013/07/04 00:41:14' ROW FORMAT DELIMITED FIELDS TERMINATED BY
> '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
> OK
> Time taken: 0.214 seconds
>
> 3) Load data from HDFS with query: (Using same query found in --verbose
> log of failing sqoop run)
> hive> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> TABLE `Customers`;
> Loading data to table default.customers
>
> Table default.customers stats: [num_partitions: 0, num_files: 2, num_rows:
> 0, total_size: 15556, raw_data_size: 0]
> OK
> Time taken: 0.317 seconds
>
> hive> select count(*) from Customers;
> .......
> Total MapReduce CPU Time Spent: 2 seconds 460 msec
> OK
> 122
> Time taken: 26.873 seconds, Fetched: 1 row(s)
>
>
> So the table got created and populated properly. I now wonder why it is
> not working when I use --hive-import.
>
>
> Any pointers to debug this issue?
>
>
> Thanks,
> Siddharth
>
>
>
>
> On Thu, Jul 4, 2013 at 12:21 PM, Siddharth Karandikar <
> siddharth.karandikar@gmail.com> wrote:
>
> Hi Jarek,
>
> I am have not re-configured Hive. I am using the default
> settings/locations. I am using --hive-home to tell sqoop where to find
> Hive.
>
> Here are the locations of my sqoop, Hive and Hadoop instances.
> Hadoop:    /root/siddharth/tools/hadoop-1.1.2
> Hive:    /root/siddharth/tools/hive-0.11.0-bin
> Sqoop:    /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0
>
>
> And here are few more details after running it with verbose.
>
> I am using following command to import into hive:
> ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
> import --connect jdbc:mysql://localhost/ClassicModels -table Customers
> -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
> --verbose --mysql-delimiters
>
> Verbose output of above command:
> http://pastebin.com/TcYG8vkr
>
> After running this command here is what I see in Hive and HDFS
>
> HDFS
> =====
> ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
> hdfs://localhost:9000/user/hive/warehouse/*
> Found 2 items
> -rw-r--r--   1 root supergroup          0 2013-07-04 00:41
> /user/hive/warehouse/customers/_SUCCESS
> -rw-r--r--   1 root supergroup      15569 2013-07-04 00:41
> /user/hive/warehouse/customers/part-m-00000
>
>
> Hive (I am running Hive from its own directory so metadata should be
> accessible)
> ===========================================================
> ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive
>
> Logging initialized using configuration in
>
> jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
> Hive history
> file=/tmp/root/hive_job_log_root_23346@ssk01.vxindia.veritas.com_201307040044_383137545.txt
> hive> show databases;
> OK
> default
> Time taken: 8.035 seconds, Fetched: 1 row(s)
>
> hive> use default;
> OK
> Time taken: 0.018 seconds
>
> hive> show tables;
> OK
> Time taken: 4.175 seconds
> hive>
>
> Strange thing is table named default.customers doesn't exist in Hive
> even though sqoop output mentioned that.
>
>
> Thanks,
> Siddharth
>
>
>
> On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <jarcec@apache.org>
> wrote:
> > Hi Siddharth,
> > using directory in LOAD DATA command is completely valid. You can find
> more information about the command in Hive documentation [1]. I would
> estimate that your issue might be more with parsing the data rather than
> accessing them when you are able to see the rows, just with incorrect
> values.
> >
> > Jarcec
> >
> > Links:
> > 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> >
> > On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
> >> Hi,
> >>
> >> While looking into Hive history file, I found this query.
> >>
> >> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> >> TABLE `Customers`"
> >> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560"
> >> .....
> >>
> >> HDFS location mentioned in this query is a directory not a csv file.
> >> This directory contains the part-* file(s) which hold actual data. I
> >> don't know if Sqoop understands this directory structure and knows how
> >> to read those multiple part-* files? Or is this an issue?
> >>
> >> I was hit by a similar thing while creating an external table in Hive
> >> where location specified was such hdfs directory (generated by sqoop
> >> import) containing multiple part-* files. Hive table got created but
> >> all the rows were NULL. And thats why I started looking into
> >> --hive-import option available in sqoop. But looks like it is also not
> >> working for me.
> >>
> >> Am I missing something?
> >>
> >>
> >> Thanks,
> >> Siddharth
> >>
> >> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
> >> <siddharth.karandikar@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I am facing some problems while importing a sample database from MySQL
> >> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
> >> > node setup.
> >> >
> >> > While doing this, I am always seeing following message in job logs -
> >> > Table default.customers stats: [num_partitions: 0, num_files: 2,
> >> > num_rows: 0, total_size: 15556, raw_data_size: 0]
> >> >
> >> > Job ends with success message -
> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds
> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Hive import complete.
> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Export directory is empty,
> removing it.
> >> >
> >> > Full command and log can be found at - http://pastebin.com/03f6Wdga
> >> >
> >> > I am using Sqoop for the first time and I could be missing few things.
> >> > Any pointers to solve thos problem would really help.
> >> >
> >> >
> >> > MySQL to HDFS is working fine though.
> >> >
> >> >
> >> > Thanks,
> >> > Siddharth
>
>
>
>
>
>

Mime
View raw message