lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James liu" <liuping.ja...@gmail.com>
Subject Re: wana use CJKAnalyzer
Date Tue, 26 Sep 2006 01:05:58 GMT
2006/9/25, Walter Underwood <wunderwood@netflix.com>:
>
> This document has two problems. First, the document is not well-formed
> XML.
> Open it  in Firefox and you will see this error:
>
>    XML Parsing Error: mismatched tag. Expected: </doc>.
>    Location: file:///Users/wunderwood/Desktop/jl.xml
>    Line Number 15, Column 3:
>
> After I fix that, it still is not legal UTF-8.


Im sorry that it have more <doc>, because i test more data in
solr. In order to transfter attachements, i reduced jl.xml and not check.
so, you find this problem.
yes, it is not legal utf-8.
utf-8 encoding i mean that is file encoding mode.
when you create new xml by using editplus, and save it, it appears window
that have a selection encoding mode.(u can find it with attachements)
That is jl.xml,Index it by post.sh.

if you use "script language", like solrphp(my solrphp not from solr's wiki)
that i modified. you must send your xml with encoding utf-8.
for instance, i try send my.xml to http://localhost:8983/solr/update-< this
url's head information should have ""Content-Type: text/xml;charset=utf-8"";
Solr work well after with head information.


Does Solr report parsing errors? It really should. Maybe a 400 Bad Request
> response with a text/plain body showing the error message.


after i fixed "more <doc" problem, solr work well.

wunder
>
>
> On 9/22/06 6:24 PM, "James liu" <liuping.james@gmail.com> wrote:
> >
> > 2006/9/23, Walter Underwood <wunderwood@netflix.com>:
> >> On 9/21/06 5:37 PM, "James liu" <liuping.james@gmail.com> wrote:
> >>
> >>> > Yes,it working. the root of my problem is xml muse be encoded by
> utf-8.
> >>> > if use php,it not about www browser. just notice that
> >>> > curl header information must be utf-8.
> >>> > if use post.sh,xml muse be encoded by utf-8.(my editplus default
> encode
> >>> > style is ansi)
> >>
> >> This might be a Solr bug. Solr should be able to accept XML in any
> >> of the required encodings (ASCII, Latin 1, UTF-8, and UTF-16).
> >> Getting XML content types exactly right is tricky, see RFC 3023.
> >>
> >> What curl command line was used?
> >
> > No sepcial curl command i use.just solr-nightly/example/exampledocs
> post.sh.
> > but my jl.xml encoded  utf-8(i use editplus, i tried to use  xml
> encoding utf
> > 8, but it is not effect).
> > solrphp i use curl "$header=array("Content-Type:
> > text/xml;charset=utf-8");curl_setopt($ch, CURLOPT_HTTPHEADER,
> $header);", this
> > is php.
> >
> >> What encoding is the XML?
> >>
> >> Can you give a sample XML file?
> >
> > see attachments, anything you need mail me.
> >
> >> wunder
> >> --
> >> Walter Underwood
> >> Search Guru, Netflix
> >>
> >
> >
>
>
>
>


-- 
regards
jl

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message