hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
Date Tue, 04 Dec 2012 01:12:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509404#comment-13509404
] 

Todd Lipcon commented on HADOOP-9103:
-------------------------------------

bq. Hopefully we haven't overlooked any such existing bugs and nobody accidentally uses UTF8.java
in the future. (At least it's marked @Deprecated.)

Right. That's my thinking.

bq. Agreed that as long as UTF8.java is the thing that reads the bytestream, we can continue
to implement CESU-8 and it can remain partially backwards compatible with previous versions
of UTF8.java.

Java String can also properly decode the CESU-8, so we can move forward by getting rid of
usage of UTF8.readString ASAP, and then some day later kill all the paths that use the write
path (once we're sure the read path is dead).

Could a committer please review this? Thanks.
                
> UTF8 class does not properly decode Unicode characters outside the basic multilingual
plane
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9103
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9103
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>         Environment: SUSE LINUX
>            Reporter: yixiaohua
>            Assignee: Todd Lipcon
>         Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, hadoop-9103.txt,
ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> this the log information  of the  exception  from the SecondaryNameNode: 
> 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.IOException: Found lease for
>  non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/????@???????????????
> ??????????tor.qzone.qq.com/keypart-00174
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
>         at java.lang.Thread.run(Thread.java:619)
> this is the log information  about the file from namenode:
> 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=boss,boss	ip=/10.131.16.34	cmd=create	src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
 @?            tor.qzone.qq.com/keypart-00174	dst=null	perm=boss:boss:rw-r--r--
> 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/  @?
           tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759
> 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile:
file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
 @?            tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0
> 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=boss,boss	ip=/10.131.16.34	cmd=rename	src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
 @?            tor.qzone.qq.com/keypart-00174	dst=/user/boss/pgv/fission/task16/split/  @?
           tor.qzone.qq.com/keypart-00174	perm=boss:boss:rw-r--r--
> after check the code that save FSImage,I found there are a problem that maybe a bug of
HDFS Code,I past below:
> -------------this is the saveFSImage method  in  FSImage.java, I make some mark at the
problem code------------
> /**
>    * Save the contents of the FS image to the file.
>    */
>   void saveFSImage(File newFile) throws IOException {
>     FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
>     FSDirectory fsDir = fsNamesys.dir;
>     long startTime = FSNamesystem.now();
>     //
>     // Write out data
>     //
>     DataOutputStream out = new DataOutputStream(
>                                                 new BufferedOutputStream(
>                                                                          new FileOutputStream(newFile)));
>     try {
>       .........
>     
>       // save the rest of the nodes
>       saveImage(strbuf, 0, fsDir.rootDir, out);------------------problem
>       fsNamesys.saveFilesUnderConstruction(out);------------------problem  detail is
below
>       strbuf = null;
>     } finally {
>       out.close();
>     }
>     LOG.info("Image file of size " + newFile.length() + " saved in " 
>         + (FSNamesystem.now() - startTime)/1000 + " seconds.");
>   }
>  /**
>    * Save file tree image starting from the given root.
>    * This is a recursive procedure, which first saves all children of
>    * a current directory and then moves inside the sub-directories.
>    */
>   private static void saveImage(ByteBuffer parentPrefix,
>                                 int prefixLength,
>                                 INodeDirectory current,
>                                 DataOutputStream out) throws IOException {
>     int newPrefixLength = prefixLength;
>     if (current.getChildrenRaw() == null)
>       return;
>     for(INode child : current.getChildren()) {
>       // print all children first
>       parentPrefix.position(prefixLength);
>       parentPrefix.put(PATH_SEPARATOR).put(child.getLocalNameBytes());------------------problem
>       saveINode2Image(parentPrefix, child, out);
>     }
>    ..........
>   }
>  // Helper function that writes an INodeUnderConstruction
>   // into the input stream
>   //
>   static void writeINodeUnderConstruction(DataOutputStream out,
>                                            INodeFileUnderConstruction cons,
>                                            String path) 
>                                            throws IOException {
>     writeString(path, out);------------------problem
>     ..........
>   }
>   
>   static private final UTF8 U_STR = new UTF8();
>   static void writeString(String str, DataOutputStream out) throws IOException {
>     U_STR.set(str);
>     U_STR.write(out);------------------problem 
>   }
>   /**
>    * Converts a string to a byte array using UTF8 encoding.
>    */
>   static byte[] string2Bytes(String str) {
>     try {
>       return str.getBytes("UTF8");------------------problem 
>     } catch(UnsupportedEncodingException e) {
>       assert false : "UTF8 encoding is not supported ";
>     }
>     return null;
>   }
> ------------------------------------------below is the explain------------------------
> in  saveImage method:  child.getLocalNameBytes(),the  bytes use the method of str.getBytes("UTF8");
> but in writeINodeUnderConstruction, the bytes user the method of Class  UTF8 to get the
bytes.
> I make a test use our messy code file name , found the the two bytes arrsy are not equal.
so I both use the class UTF8,then the problem desappare.
> I think this is a bug of HDFS or UTF8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message