sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "arvind@cloudera.com" <arv...@cloudera.com>
Subject Re: [sqoop-user] Re: Sqoop import job failure?
Date Fri, 05 Aug 2011 01:10:37 GMT
[bcc:sqoop-user@cloudera.org, to:sqoop-user@incubator.apache.org]

Kevin,

It looks like the transfer of data was successful but there was a
problem in invoking Hive. What version of Hive are you using?

Hive by default generates a session log file in /tmp. Search for
hive.log in /tmp and let us know what you find in there. From the
looks of it, it appears to be a classpath issue.

Thanks,
Arvind

Note: Please subscribe to sqoop-user@incubator.apache.org and direct
further repsonses there.

On Thu, Aug 4, 2011 at 5:38 PM, Kevin <kevinfifteen@gmail.com> wrote:
> I appreciate your response Arvind. I tried the --direct route you
> mentioned and it seems to have fixed the problems I mentioned earlier.
> Unfortunately, I haven't been successful with Sqoop yet. I'm running
> into this problem:
>
> After executing:
> sqoop import --direct --connect jdbc:postgresql://
> query-4.redfintest.com:5432/stingray_6_5_d --username redfin_oltp -P --
> table brokerages --hive-import --hive-home /usr/share/brisk/hive/ --
> target-dir /data/qa-metrics/
>
> I get:
> 11/08/04 17:26:46 INFO tool.BaseSqoopTool: Using Hive-specific
> delimiters for output. You can override
> 11/08/04 17:26:46 INFO tool.BaseSqoopTool: delimiters with --fields-
> terminated-by, etc.
> 11/08/04 17:26:46 INFO manager.SqlManager: Using default fetchSize of
> 1000
> 11/08/04 17:26:46 INFO tool.CodeGenTool: Beginning code generation
> 11/08/04 17:26:47 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "brokerages" AS t LIMIT 1
> 11/08/04 17:26:47 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "brokerages" AS t LIMIT 1
> 11/08/04 17:26:47 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
> hadoop
> 11/08/04 17:26:47 INFO orm.CompilationManager: Found hadoop core jar
> at: /usr/lib/hadoop/hadoop-0.20.2-cdh3u1-core.jar
> 11/08/04 17:26:48 INFO orm.CompilationManager: Writing jar file: /tmp/
> sqoop-root/compile/8d97f121b1707576d1574cb5ba4653b0/brokerages.jar
> 11/08/04 17:26:48 INFO manager.DirectPostgresqlManager: Beginning psql
> fast path import
> 11/08/04 17:26:48 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "brokerages" AS t LIMIT 1
> 11/08/04 17:26:48 INFO manager.DirectPostgresqlManager: Performing
> import of table brokerages from database stingray_6_5_d
> 11/08/04 17:26:48 WARN util.NativeCodeLoader: Unable to load native-
> hadoop library for your platform... using builtin-java classes where
> applicable
> 11/08/04 17:26:48 INFO manager.DirectPostgresqlManager: Transfer loop
> complete.
> 11/08/04 17:26:48 INFO manager.DirectPostgresqlManager: Transferred
> 78.8574 KB in 0.0396 seconds (1.9445 MB/sec)
> 11/08/04 17:26:48 INFO hive.HiveImport: Loading uploaded data into
> Hive
> 11/08/04 17:26:48 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "brokerages" AS t LIMIT 1
> 11/08/04 17:26:48 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "brokerages" AS t LIMIT 1
> 11/08/04 17:26:48 WARN hive.TableDefWriter: Column created_date had to
> be cast to a less precise type in Hive
> 11/08/04 17:26:48 INFO hive.HiveImport: Exception in thread "main"
> java.lang.NoClassDefFoundError: jline/ArgumentCompletor
> $ArgumentDelimiter
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.lang.Class.forName0(Native Method)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.lang.Class.forName(Class.java:247)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> org.apache.hadoop.util.RunJar.main(RunJar.java:179)
> 11/08/04 17:26:48 INFO hive.HiveImport: Caused by:
> java.lang.ClassNotFoundException: jline.ArgumentCompletor
> $ArgumentDelimiter
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.security.AccessController.doPrivileged(Native Method)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at sun.misc.Launcher
> $AppClassLoader.loadClass(Launcher.java:301)
> 11/08/04 17:26:48 INFO hive.HiveImport:         at
> java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> 11/08/04 17:26:48 INFO hive.HiveImport:         ... 3 more
> 11/08/04 17:26:48 ERROR tool.ImportTool: Encountered IOException
> running import job: java.io.IOException: Hive exited with status 1
>        at
> com.cloudera.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:
> 326)
>        at
> com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:276)
>        at
> com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
>        at
> com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
>        at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
>        at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
>        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
>        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
>        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
>
> It seems to be a Hive issue. I haven't had luck trying to figure out a
> solution. Is it possible that my Hive is corrupt?
>
>
> On Aug 2, 6:18 pm, "arv...@cloudera.com" <arv...@cloudera.com> wrote:
>> [bcc:sqoop-u...@cloudera.org, to:sqoop-u...@incubator.apache.org.
>> Please move the conversation over to Apache mailing list.]
>>
>> Kevin,
>>
>> The OOM error you pointed out is raised when the proportion of the VM
>> time spent is GC crosses a high threshold that should normally not be
>> reached. This could happen if your existing heap space for the map
>> task is small enough that it compares with the record size you are
>> dealing with. You could try increasing your heap space by specifying
>> the property mapred.child.java.opts to something like -Xmx4096m,
>> assuming your nodes have that much memory to spare. You can also add
>> another switch to this property -XX:-UseGCOverheadLimit which will
>> disable the policy of the VM that results in OOM errors like you are
>> seeing, however doing that may not be of any help.
>>
>> Alternatively, you could try using the direct mode of import from
>> PostgreSQL server by specifying --direct option during the import.
>> This option will require that you have the PostgreSQL client (psql)
>> installed on the nodes where the map task will be executed.
>>
>> Thanks,
>> Arvin
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 2, 2011 at 5:09 PM, Kevin <kevinfift...@gmail.com> wrote:
>> > Hi all,
>>
>> > I am trying to use Sqoop alongside Brisk. For those who don't know,
>> > Brisk is DataStax's is a Hadoop/Hive distribution powered by
>> > Cassandra.http://www.datastax.com/products/brisk
>>
>> > I'm attempting to use Sqoop to transfer data from a PostgreSQL db to
>> > Hive.
>> > I used this command: sqoop import --connect jdbc:postgresql://
>> > idb.corp.redfin.com:5432/metrics --username redfin_readonly -P --table
>> > metrics --target-dir /data/qatest --hive-import
>>
>> > A end of the output is:
>>
>> > 11/08/02 16:50:15 INFO mapred.LocalJobRunner:
>> > 11/08/02 16:50:16 INFO mapred.JobClient:  map 100% reduce 0%
>> > 11/08/02 16:50:18 INFO mapreduce.AutoProgressMapper: Auto-progress
>> > thread is finished. keepGoing=false
>> > 11/08/02 16:50:18 WARN mapred.LocalJobRunner: job_local_0001
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 6026311161398729600_27875872_1046092330/file/usr/lib/sqoop/lib/ant-
>> > contrib-1.0b3.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-2957435182115348485_1795484550_1046091330/file/usr/lib/sqoop/
>> > lib/ant-eclipse-1.0-jvm1.2.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 8769703156041621920_1132758636_1046087330/file/usr/lib/sqoop/lib/
>> > avro-1.5.1.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-9111846482501201198_-633786885_1046093330/file/usr/lib/sqoop/
>> > lib/avro-ipc-1.5.1.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-7340432634222599452_756368084_1046091330/file/usr/lib/sqoop/
>> > lib/avro-mapred-1.5.1.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-5046079240639376542_-1808425119_1046090330/file/usr/lib/sqoop/
>> > lib/commons-io-1.4.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 8537290295187062884_-810674145_1046086330/file/usr/lib/sqoop/lib/
>> > ivy-2.0.0-rc2.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-3739620623688167588_-1832479804_1046082330/file/usr/lib/sqoop/
>> > lib/jackson-core-asl-1.7.3.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 3083352659231038596_-1724007002_1046089330/file/usr/lib/sqoop/lib/
>> > jackson-mapper-asl-1.7.3.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 2334745090627744860_-1029425194_1046082330/file/usr/lib/sqoop/lib/jopt-
>> > simple-3.2.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 4321476485305182066_-92574265_1046090330/file/usr/lib/sqoop/lib/
>> > paranamer-2.3.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/
>> > archive/-5164030306491852882_252469521_1046081330/file/usr/lib/sqoop/
>> > lib/snappy-java-1.0.3-rc2.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 7943398653543290704_-1938786533_204956683/file/usr/lib/sqoop/
>> > postgresql-9.0-801.jdbc4.jar
>> > 11/08/02 16:50:18 INFO filecache.TrackerDistributedCacheManager:
>> > Deleted path /tmp/hadoop-root/mapred/local/archive/
>> > 3916205498081349063_799987770_1046094330/file/usr/lib/sqoop/
>> > sqoop-1.3.0-cdh3u1.jar
>> > 11/08/02 16:50:19 INFO mapred.JobClient: Job complete: job_local_0001
>> > 11/08/02 16:50:19 INFO mapred.JobClient: Counters: 6
>> > 11/08/02 16:50:19 INFO mapred.JobClient:   FileSystemCounters
>> > 11/08/02 16:50:19 INFO mapred.JobClient:     FILE_BYTES_READ=4628309
>> > 11/08/02 16:50:19 INFO mapred.JobClient:
>> > FILE_BYTES_WRITTEN=32313964098
>> > 11/08/02 16:50:19 INFO mapred.JobClient:   Map-Reduce Framework
>> > 11/08/02 16:50:19 INFO mapred.JobClient:     Map input records=128000
>> > 11/08/02 16:50:19 INFO mapred.JobClient:     Spilled Records=0
>> > 11/08/02 16:50:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
>> > 11/08/02 16:50:19 INFO mapred.JobClient:     Map output records=128000
>> > 11/08/02 16:50:19 INFO mapreduce.ImportJobBase: Transferred 0 bytes in
>> > 865.1743 seconds (0 bytes/sec)
>> > 11/08/02 16:50:19 INFO mapreduce.ImportJobBase: Retrieved 128000
>> > records.
>> > 11/08/02 16:50:19 ERROR tool.ImportTool: Error during import: Import
>> > job failed!
>>
>> > The prompt indicates that it retrieved 128000 records, but that the
>> > import job failed with 0 bytes transferred. One source of problem I
>> > see is "java.lang.OutOfMemoryError: GC overhead limit exceeded". This
>> > problem has stumped me for a while. Any input would be great
>> > appreciated thanks!
>>
>> > --
>> > NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache
Sqoop mailing list sqoop-u...@incubator.apache.org. Please subscribe to it by sending an email
to incubator-sqoop-user-subscr...@apache.org.
>
> --
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
>

Mime
View raw message