sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Xie <xie3208...@gmail.com>
Subject Re: Why the sequence file generated using sqoop contains only one type?
Date Thu, 22 Mar 2018 15:10:32 GMT
Thank you Greg,  you are absoutely right, text and orc are also the
available options.

However,

If sqoop generates seq file format, it should indicate <K, V> is needed,
otherwise it is contradict to the definition of seq file and its format.

If one field is acceptable (apparently that's what sqoop has been
generating), then the definition of seq ( https://wiki.apache.org/
hadoop/SequenceFile ) should be revised to avoid misleading. I've seen too
many people stuck with the same question as me.


Don't you think so?

*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Thu, Mar 22, 2018 at 10:35 AM, Greg Lindholm <greg.lindholm@gmail.com>
wrote:

> Why are you using Sequence files?
> Sequence files are binary key/value stores, I haven't used them but it
> sounds from the docs that each 'value' is a record, so one type sounds
> correct.
> You might consider trying Textfile or ORC? You might get better results.
>
> /Greg
>
> On Thu, Mar 22, 2018 at 10:06 AM, Raymond Xie <xie3208080@gmail.com>
> wrote:
>
>> I have a sequence file generated using sqoop, why only one type is seen
>> in the file?
>>
>> sqoop import -m 1 \
>> --connect=jdbc:mysql://ms.itversity.com/retail_db \
>> --username=retail_user \
>> --password=itversity \
>> --table=orders \
>> --as-sequencefile \
>> --target-dir=order20180320_seq
>>
>> The head part of the sequence is as below:
>>
>> [paslechoix@gw03 ~]$ hdfs dfs -cat order20180320_seq/part-m-00000 |head
>> SEQ!org.apache.hadoop.io.LongWritableorders7▒▒P▒
>> U3▒3▒$@▒▒-OCLOSED@▒▒PENDING_PAYMENT@▒▒/COMPLETE@▒▒"{CLOSED@▒
>> ▒,COMPLETE@▒COMPLETE@▒▒COMPLET@▒▒
>>
>> As you can see, there is only one type in the sequence file's head:
>> LongWritable.
>>
>> According to this Hadoop WiKi about sequence file format:
>> https://wiki.apache.org/hadoop/SequenceFile
>>
>> sequence file's header should contain:
>>
>> version - A byte array: 3 bytes of magic header 'SEQ', followed by 1 byte of actual
version no. (e.g. SEQ4 or SEQ6)
>> keyClassName - String
>> valueClassName - String
>>
>> However, all the sequence files I generated with sqoop
>> (1.4.6.2.5.0.0-1245 ) contains only one Class.
>>
>> Is there anything missing in the sqoop command? how can generate a
>> sequence with the right info in its head?
>>
>> Thank you very much.
>>
>>
>> *------------------------------------------------*
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>
>

Mime
View raw message