nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <>
Subject Re: svn commit: r265503 - in /lucene/nutch/trunk/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/fs/ java/org/apache/nutch/mapReduce/ java/org/apache/nutch/parse/ java/org/apache/nutch/protocol/ java/org/apache/nutch/searcher/ java/org/a
Date Sun, 04 Sep 2005 21:07:19 GMT
Hello Piotr,

It looks like changes to language indentifer caused language identifier
> test to fail on Windows again.
First, thanks for testing on windows. 

If no charset is given it assumes default
> platform encoding but test files are probably "UTF-8" based. I have
> changed TestLanguageIdentifier.testIdentify() method to use
> String lang =
> idfr.identify(this.getClass().getResourceAsStream(tokens[0]),"UTF-8");
> instead of
> String lang =
> idfr.identify(this.getClass().getResourceAsStream(tokens[0])); 
But probably better solution would be to use UTF-8 if no encoding is
> specifified in main plugin code.
> What do you think?

I think the best solution is only to modify the code in Unit Test (as you 
done it), because, in this case we know it is UTF-8 encoded.
But I think it is better to use the platform's default encoding instead of 
UTF-8 if no encoding is specified. So that for instance for intranet 
fetching, it uses the localy defined encoding.
No ? 
If you are ok, with this, could you please commit your changes in language 
identifier unit tests.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message