Hi, all,

I've loaded some data with Sqoop from Oracle onto HDFS, storing it as SequenceFiles and I'm having problems loading it with Pig.
I'm using Sqoop 1.4.3 and used the following steps (simplified example using the DUAL table).

Any ideas of why it loads incorrectly? Am I missing any steps?

Thanks,
Andre


1. Imported data from the table onto HDFS (the DUAL table has only 1 row with 1 field containing the string "X")

sqoop import -D mapred.child.java.opts="$JDBC_JAVA_OPTS" --connect $CONNSTR  -m 1 --query "select DUMMY from dual where \$CONDITIONS" --target-dir test --as-sequencefile --class-name com.acme.Dual

The Dual.java file is attached.

2. Generated the Dual.jar file:

javac -cp /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar:. com/acme/Dual.java
jar cf /tmp/Dual.jar com/acme/Dual.class

3. Tried to load the data with Pig, however, the field value is read as 0 (zero) instead of the string "X"):

REGISTER /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/pig/piggybank.jar;
REGISTER /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar
REGISTER /tmp/Dual.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD 'test' USING SequenceFileLoader AS (DUMMY:chararray);
DUMP log;


...
2013-11-04 03:21:32,325 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.0.0-cdh4.3.0  0.11.0-cdh4.3.0 araujo  2013-11-04 03:21:12     2013-11-04 03:21:32     UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias    Feature Outputs
job_201310230912_0065   1       0       6       6       6       6       0       0       0       0       log     MAP_ONLY        hdfs://n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222,

Input(s):
Successfully read 1 records (479 bytes) from: "hdfs://n1.hadoop.cto.pythian.com:8020/user/araujo/test"

Output(s):
Successfully stored 1 records (8 bytes) in: "hdfs://n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222"

Counters:
Total records written : 1
Total bytes written : 8
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201310230912_0065


2013-11-04 03:21:32,338 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-11-04 03:21:32,342 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2013-11-04 03:21:32,350 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-11-04 03:21:32,350 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(0)  <--- THIS SHOULD SHOW "X"


--
André Araújo
Database Administrator / SDM
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 OR +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.” — Iker Pou (rock climber)

--