lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Nithian <anith...@gmail.com>
Subject Re: DIH, UTF8 and default DIH encoding value
Date Sun, 08 Aug 2010 20:08:21 GMT
Thanks Otis. I went ahead and added this section. I hope that others can add
to this too but of course the list should be short :-)

- Amit

On Sun, Aug 1, 2010 at 12:00 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi Amit,
>
> Anyone can edit any Solr Wiki page - just create an account (I think the
> link to
> that is in the page footer) and edit.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Amit Nithian <anithian@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Sat, July 31, 2010 4:41:44 PM
> > Subject: DIH, UTF8 and default DIH encoding value
> >
> > All,
> >
> > I am not sure if this is overly obvious or not (it wasn't to me) but  in
> > trying to index some international characters from XML files using the
>  DIH,
> > I found that setting the encoding attribute on the dataSource element  to
> > "UTF-8" fixed my problem.
> >
> > <dataSource type="FileDataSource"  encoding="UTF-8"/>
> >
> > My question is why the default isn't UTF-8 or if  there is a good reason,
> can
> > the DIH wiki be made more clear that this  encoding attribute can affect
> the
> > indexing of international characters? If I  can get access to edit this
> wiki
> > page, I can add a section to that effect..  perhaps under a
> troubleshooting
> > section?
> >
> > Thanks!
> > Amit
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message