sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheolsoo Park <cheol...@cloudera.com>
Subject Re: loading as sequencefile and running an hadoop mapreduce job
Date Wed, 11 Apr 2012 20:06:06 GMT
Hi Mark,

I was totally wrong about sequence files in my previous email. In fact, I
realized that SqoopRecord is needed by MR jobs to deserialize sequence
files. Again, I am sorry for the confusion.

Thanks,
Cheolsoo

On Wed, Apr 11, 2012 at 11:52 AM, Cheolsoo Park <cheolsoo@cloudera.com>wrote:

> Hi Mark,
>
> It would be helpful if you could provide complete log with the --verbose
> option on.
>
> I believe the result in the hdfs file is a serialization of the java
>> object of a class generated automatically by sqoop, the class name is the
>> table name and extends SqoopRecord: let’s call it table_name.java .
>
>
> A serialization of 'table_name' is not the result.  The auto-generated
> Java class is only for Sqoop to interface with the DB. The result is
> sequence files that contain data.
>
> Now, I am trying to run a MapReduce job against this file but it is
>> failing, I added the class table_name.java in my jar. But when I run the
>> mapreduce job, I get “ClassNotFoundException:
>> com.cloudera.sqoop.lib.SqoopRecord”. Even with the option –libjars
>> sqoop-1.3.0.jar.
>
> **
>
> I am not clear what MR jobs you're running here.
>
> 1) If you're importing data, I am wondering why you have to do this
> manually since it should be automatically done by Sqoop: compile table_name
> into a jar, load the jar into hdfs, pass the path to the jar to import
> mapper jobs, etc
>
> 2) If you're running your own MR jobs on imported data, they don't need to
> know about 'tabe_name' or 'SqoopRecord' since data are already in sequence
> file format, so your MR jobs should be able to understand them.
>
> Hope this is helpful.
>
> Thanks,
> Cheolsoo
>
> On Wed, Apr 11, 2012 at 9:28 AM, Marc Sturm <mas9161@nyp.org> wrote:
>
>>  Hi,****
>>
>> ** **
>>
>> I am new to hadoop and sqoop. So far I was able to run a single node
>> hadoop cluster on my mac and I am trying to load data from sql server using
>> sqoop 1.3 and Microsoft’s sqoop connector.****
>>
>> The data is stored as varbinary column (though it is text blob) and I am
>> loading it into hadoop with sqoop using the --as-sequencefile option. I
>> believe the result in the hdfs file is a serialization of the java object
>> of a class generated automatically by sqoop, the class name is the table
>> name and extends SqoopRecord: let’s call it table_name.java . This was done
>> successfully.****
>>
>> ** **
>>
>> Now, I am trying to run a MapReduce job against this file but it is
>> failing, I added the class table_name.java in my jar. But when I run the
>> mapreduce job, I get “ClassNotFoundException:
>> com.cloudera.sqoop.lib.SqoopRecord”.****
>>
>> Even with the option –libjars sqoop-1.3.0.jar.****
>>
>> ** **
>>
>> I hope all this makes sense to you. If you can help me understand what
>> the problem is or point me to the right documentation that would be great.
>> ****
>>
>> ** **
>>
>> Thanks,****
>>
>> Marc****
>>
>> ** **
>>
>> ------------------------------
>> This electronic message is intended to be for the use only of the named
>> recipient, and may contain information that is confidential or privileged.
>> If you are not the intended recipient, you are hereby notified that any
>> disclosure, copying, distribution or use of the contents of this message is
>> strictly prohibited. If you have received this message in error or are not
>> the named recipient, please notify us immediately by contacting the sender
>> at the electronic mail address noted above, and delete and destroy all
>> copies of this message. Thank you.
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient,
and may contain information that is confidential or privileged.  If you are not the intended
recipient, you are hereby notified that any disclosure, copying, distribution or use of the
contents of this message is strictly prohibited.  If you have received this message in error
or are not the named recipient, please notify us immediately by contacting the sender at the
electronic mail address noted above, and delete and destroy all copies of this message.  Thank
you.
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient,
and may contain information that is confidential or privileged.  If you are not the intended
recipient, you are hereby notified that any disclosure, copying, distribution or use of the
contents of this message is strictly prohibited.  If you have received this message in error
or are not the named recipient, please notify us immediately by contacting the sender at the
electronic mail address noted above, and delete and destroy all copies of this message.  Thank
you.
>>
>>
>>
>>
>

Mime
View raw message