lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: lucene farsi problem
Date Wed, 30 Apr 2008 17:17:14 GMT
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote:
> Caveat: I don't speak, read, write, or dream in Farsi - I
> just know that it mostly shares its orthography with Arabic,
> and that they are both written and read right-to-left.
> 
> How are you constructing the queries?  Using QueryParser?  If
> so, then I suspect the problem is that you intend the range
> you supply to be read entirely right-to-left, but Lucene
> instead reads it left-to-right.  Have you tried using e.g.
> "د-ژ" instead of "د-ژ"?  (That is, placing the lower valued
> term on the left instead of the right.)

Sigh - can't edit RTL text - the example should be (hoping it doesn't get reversed again):

"ژ-د" instead of "د-ژ" (reversing the order of the lower and upper terms)

> AFAICT, RangeFilter (called from ConstantScoreRangeQuery,
> which is called from QueryParser) does not test whether
> lowerTerm is in fact lower than upperTerm.  If it turns out
> that the problem is simply one of order, it might make sense
> to modify RangeFilter so that it flip them when lowerTerm > upperTerm.
> 
> Steve
> 
> On 04/30/2008 at 3:21 AM, esra wrote:
> > 
> > hi,
> > 
> > i am using lucene's "IndexSearcher" to search the given xml
> > by keyword which
> > contains farsi information.
> > while searching i use ranges like
> > 
> > آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
> > 
> > when i do search for  "د-ژ"  range the results are wrong ,
> > they are the
> > results of  " س-ظ "range.
> > 
> > for example when i do search for "د-ژ"  one of the the
> > results is "ساب ووفر"
> > , this result also shown on the " س-ظ " range's result list
> > which is the
> > corret range.
> > 
> > As IndexSearcher use "compareTo" method and this method uses
> > unicodes for
> > comparing, i found the unicodes of the characters.
> > 
> > د=U+62F
> > ژ = U+698
> > and the first letter of "ساب ووفر " is  س = U+633
> > 
> > Do you have any idea how to solve this problem, there are analyzers for
> > different languages , will this be usefull if so do you know where to
> > find a farsi analyzer?
> > 
> > I would bu glad if you help.
> > 
> > thanks ,
> > 
> > Esra


Mime
View raw message