nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KuroSaka TeruHiko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search
Date Mon, 02 Jan 2006 19:57:01 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361546 ] 

KuroSaka TeruHiko commented on NUTCH-138:
-----------------------------------------

You are right.  WIth this Tomcat config, UTF-8 characters can be passed.
Also works is having:	useBodyEncodingForURI="true"
in the <Connector> tag within $TOMCAT/conf/service.xml
This is documented in:
http://issues.apache.org/bugzilla/show_bug.cgi?id=29900

What I suggest is to add this note to:
http://lucene.apache.org/nutch/i18n.html
(which currently explains the GUI localization issue only, rather than internationalization
proper),
or perhaps creating a new page:
http://wiki.apache.org/nutch/GettingNutchRunningUTF8Tomcat5

I am willing to write a draft if someone tell me where to submit.

Feel free to close this bug.


> non-Latin-1 characters cannot be submitted for search
> -----------------------------------------------------
>
>          Key: NUTCH-138
>          URL: http://issues.apache.org/jira/browse/NUTCH-138
>      Project: Nutch
>         Type: Bug
>   Components: web gui
>     Versions: 0.7.1
>  Environment: Windows XP, Tomcat 5.5.12
>     Reporter: KuroSaka TeruHiko
>     Priority: Minor

>
> The search.html currently specifies GET method for query submission.
> Tomcat 5.x only allows ISO-8859-1 (aka Latin-1) code set to be submitted over GET because
of some restrictions of HTML or HTTP spec they discovered. (If my memory is correct, non ISO-8859-1
characters were woking OK over GET with older versions of Tomcat as far as setCharacterEncoding()
is called properly.)
> To allow proper transmission of non-ISO-8859-1, POST method should be used.  Here's a
proposed patch:
> *** search.html	Tue Dec 13 15:02:15 2005
> --- search-org.html	Tue Dec 13 15:02:07 2005
> ***************
> *** 59,65 ****
>   </span><span class="bodytext">
>   <center>
>   
> ! <form name="search" action="../search.jsp" method="post"> 
>   <input name="query" size="44">&nbsp;<input type="submit" value="Search">
>   <a href="help.html">help</a>
>   
> --- 59,65 ----
>   </span><span class="bodytext">
>   <center>
>   
> ! <form name="search" action="../search.jsp" method="get"> 
>   <input name="query" size="44">&nbsp;<input type="submit" value="Search">
>   <a href="help.html">help</a>
>   
> BTW, I am aware that Nutch and Lucene won't hanlde non Western languages well as packaged.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message