sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Araujo <ara...@pythian.com>
Subject Using Pig to load data imported with Sqoop
Date Mon, 04 Nov 2013 08:18:43 GMT
Hi, all,

I've loaded some data with Sqoop from Oracle onto HDFS, storing it as
SequenceFiles and I'm having problems loading it with Pig.
I'm using Sqoop 1.4.3 and used the following steps (simplified example
using the DUAL table).

Any ideas of why it loads incorrectly? Am I missing any steps?

Thanks,
Andre



*1. Imported data from the table onto HDFS (the DUAL table has only 1 row
with 1 field containing the string "X") *

sqoop import -D mapred.child.java.opts="$JDBC_JAVA_OPTS" --connect $CONNSTR
 -m 1 --query "select DUMMY from dual where \$CONDITIONS" --target-dir test
--as-sequencefile --class-name com.acme.Dual

The Dual.java file is attached.

*2. Generated the Dual.jar file:*

javac -cp
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar:.
com/acme/Dual.java
jar cf /tmp/Dual.jar com/acme/Dual.class

*3. Tried to load the data with Pig, however, the field value is read as 0
(zero) instead of the string "X"):*

REGISTER
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/pig/piggybank.jar;
REGISTER
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar
REGISTER
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar
REGISTER
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar
REGISTER /tmp/Dual.jar
DEFINE SequenceFileLoader
org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD 'test' USING SequenceFileLoader AS (DUMMY:chararray);
DUMP log;


...
2013-11-04 03:21:32,325 [main] INFO
 org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt
 Features
2.0.0-cdh4.3.0  0.11.0-cdh4.3.0 araujo  2013-11-04 03:21:12     2013-11-04
03:21:32     UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
 MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime
MedianReducetime    Alias    Feature Outputs
job_201310230912_0065   1       0       6       6       6       6       0
    0       0       0       log     MAP_ONLY        hdfs://
n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222,

Input(s):
Successfully read 1 records (479 bytes) from: "hdfs://
n1.hadoop.cto.pythian.com:8020/user/araujo/test"

Output(s):
Successfully stored 1 records (8 bytes) in: "hdfs://
n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222"

Counters:
Total records written : 1
Total bytes written : 8
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201310230912_0065


2013-11-04 03:21:32,338 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2013-11-04 03:21:32,342 [main] INFO  org.apache.pig.data.SchemaTupleBackend
- Key [pig.schematuple] was not set... will not generate code.
2013-11-04 03:21:32,350 [main] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2013-11-04 03:21:32,350 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
*(0)  <--- THIS SHOULD SHOW "X"*


-- 
André Araújo
Database Administrator / SDM
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)

-- 


--




Mime
View raw message