sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sumit ghosh <sumi...@yahoo.com>
Subject Re: Sqoop - utf-8 data load issue
Date Thu, 18 Jul 2013 21:13:10 GMT
Hi Varun,
 
Can you try changing the java code like this -
 
  public void set_type(String type) {
    this.type = new String(type.getbytes(),"UTF8");
  }
  public QueryResult with_type(String type) {
    this.type = new String(type.getbytes(),"UTF8");
    return this;
  }

 
Thanks
Sumit


________________________________
From: varun kumar gullipalli <varunkumar_g@yahoo.com>
To: "user@sqoop.apache.org" <user@sqoop.apache.org> 
Sent: Wednesday, 17 July 2013 5:42 PM
Subject: Re: Sqoop - utf-8 data load issue



Thanks Jarcec.
sqoop version is 1.4.2
 
 
I was verifying the QueryResult.java file that sqoop creates; type is the column name which
has multi-byte data(utf-8). 
Does declaring type as string work for multi-byte data?
 
grep type QueryResult.java
  private String type;
  public String get_type() {
    return type;
  public void set_type(String type) {
    this.type = type;
  public QueryResult with_type(String type) {
    this.type = type;
    equal = equal && (this.type == null ? that.type == null : this.type.equals(that.type));
    this.type = JdbcWritableBridge.readString(5, __dbResults);
    JdbcWritableBridge.writeString(type, 5 + __off, 12, __dbStmt);
        this.type = null;
    this.type = Text.readString(__dataIn);
    if (null == this.type) {
    Text.writeString(__dataOut, type);
    __sb.append(FieldFormatter.escapeAndEnclose(type==null?"\\N":type (file://n%22:type/),
delimiters));
    if (__cur_str.equals("null")) { this.type = null; } else {
      this.type = __cur_str;
    __sqoop$field_map.put("type", this.type);
    else    if ("type".equals(__fieldName)) {
      this.type = (String) __fieldVal;

  



________________________________
From: Jarek Jarcec Cecho <jarcec@apache.org>
To: user@sqoop.apache.org; varun kumar gullipalli <varunkumar_g@yahoo.com> 
Sent: Wednesday, July 17, 2013 8:36 AM
Subject: Re: Sqoop - utf-8 data load issue


Thank you Varun,
the sequence c3 83 c2 a9 indeed do not correspond to correct character. I was able to google
out one entry in stack overflow [1] that might be relevant to your issue somehow. I've tried
to reproduce this on my cluster, but I was not able to. Do you think that you can do mysqldump
of the table in question?  If you could share it with the Sqoop version and exact command
line I would like to explore that a bit.

Jarcec

Links:
1: http://stackoverflow.com/questions/8499852/xmldocument-mis-reads-utf-8-e-acute-character

On Tue, Jul 16, 2013 at 04:24:49PM -0700, varun kumar gullipalli wrote:
> Here is the output Jarcec...
>  
>  
> 
> 
> ________________________________
> From: Jarek Jarcec Cecho <jarcec@apache.org>
> To: user@sqoop.apache.org; varun kumar gullipalli <varunkumar_g@yahoo.com> 
> Sent: Tuesday, July 16, 2013 11:05 AM
> Subject: Re: Sqoop - utf-8 data load issue
> 
> 
> Thank you for the additional information Varun! Would you mind doing something like the
following:
> 
> hadoop dfs -text THE_FILE  | hexdump -C
> 
> And sharing the output? I'm trying to see the actual content of the file rather than
any interpreted value.
> 
> Jarcec
> 
> On Mon, Jul 15, 2013 at 06:52:11PM -0700, varun kumar gullipalli wrote:
> > Hi Jarcec,
> > 
> > I am validating the data by running the following command,
> > 
> > hadoop fs -text <hdfs cluster>
> > 
> > I think there is no issue with the shell (correct me if am wrong) because I am connecting
to MySQL database from the same shell(command line) and  could view the source data properly.
> > 
> > Initially we observed that the following conf files doesn't have utf-8 encoding. 
> > <?xml version="1.0" encoding="UTF-8"?>
> > 
> > sqoop-site.xml
> > sqoop=site-template.xml
> > 
> > But no luck after making the changes too.
> > 
> > Thanks,
> > Varun
> > 
> > 
> > ________________________________
> >  From: Jarek Jarcec Cecho <jarcec@apache.org>
> > To: user@sqoop.apache.org; varun kumar gullipalli <varunkumar_g@yahoo.com>

> > Sent: Monday, July 15, 2013 6:37 PM
> > Subject: Re: Sqoop - utf-8 data load issue
> >  
> > 
> > Hi Varun,
> > we are usually not seeing any issues with transferring text data in UTF. How are
> > you validating the imported file? I can imagine that your shell might be messing
> > the encoding.
> > 
> > Jarcec
> > 
> > On Mon, Jul 15, 2013 at 06:27:25PM -0700, varun kumar gullipalli wrote:
> > > 
> > > 
> > > Hi,
> > > I am importing data from MySql to HDFS using free-form query import.
> > > It works fine but facing issue when the data is utf-8.The source(MySql) db
is utf-8 compatible but looks like sqoop is converting the data during import.
> > > Example - The source value - elémeñt is loaded as elémeñt to HDFS.
> > > Please provide a solution for this.
> > > Thanks in advance!
> 
> 
> 00000000  31 32 33 34 35 36 37 38  39 30 07 31 33 37 33 32  |1234567890.13732|
> 00000010  36 30 33 34 36 31 35 31  07 31 33 37 33 32 36 30  |60346151.1373260|
> 00000020  33 34 36 31 35 31 07 30  07 65 6c c3 83 c2 a9 6d  |346151.0.el....m|
> 00000030  65 c3 83 c2 b1 74 07 c3  a8 c2 b4 c2 bc c3 a2 e2  |e....t..........|
> 00000040  80 9a c2 ac c3 ac e2 80  9a c2 ac c3 ac e2 80 93  |................|
> 00000050  c2 b4 c3 a8 e2 80 b0 c2  be c3 a8 c2 a5 c2 bf 0a  |................|
> 00000060


Here is a sample command line ....
  sqoop --options-file $CONN_FILE --lines-terminated-by '\n' --verbose --query "<<QUERY>>'
and  \$CONDITIONS" -m 1 --target-dir $YYYY/$MM/$DD/${TBL_NAME} --null-string '\\N' --null-non-string
'\\N' >> $LOGFILE 2>&1
Mime
View raw message