lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problem while modifying IndexSearcher
Date Wed, 24 Jul 2013 13:37:36 GMT
But somehow from "michael" you'll generate the N other terms to search
for?  And then it seems like you could just make a new query with
those expanded terms?

If you need to know all terms in the index, you can use a TermsEnum to
iterate through them.

I'm only pushing this because doing this outside of Lucene is going to
be far, far easier than modifying Lucene's sources to do what seems to
be essentially query expansion.

Each query has it's own Weight impl, and those all implement
Weight.scorer.  E.g. see TermWeight.scorer(), which returns
TermScorer.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jul 24, 2013 at 9:12 AM, Abhishek Gupta <abhi.bansal21@gmail.com> wrote:
> Michael thanks for replying so fast.
>
> No there is no mapping. What I have is some code based on LCS(Longest Common
> Subsequence), Levenshtein Distance, Suffix Tree and a probabilistic error
> model which maps original word(for eg. michael) to the erroneous
> word(mihel). So I think I have to change the way how matching done.
>
> I am not sure whether it is clear to you or not the matching I am talking
> about. So I am explaining it a little bit. I am taking the case of Vector
> Space Model and for indexing I am taking the case of inverted list (I am
> actually not sure what lucene uses). I am talking about matching of each
> query term with the labels of inverted index.
>
> Also just because of curiosity,  the problem I mentioned in the SO question
> is that I am not finding the definition of the weight .scorer(). Can you
> help me with how things are working there. And which model by default Lucene
> selects.
>
> Cheers,
> Abhishek Gupta
>
> On Wed, Jul 24, 2013 at 5:33 PM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Is there some mapping from clean term X to dirty indexed terms A, B,
>> C?  If so, can't you just take a TermQuery(X) and replace with
>> BooleanQuery SHOULD A, B, C?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Jul 24, 2013 at 7:41 AM, Abhishek Gupta <abhi.bansal21@gmail.com>
>> wrote:
>> > As I have some discussion on IRC with . They want to know my objective
>> > in
>> > doing so. I have already post the objective there and posting here
>> > again:
>> > Sry, for late response. I am making a search system in which the data I
>> > have
>> > indexed is erroneous. So I have made some schemes to match a query
>> > term(which is error free) to the indexed term(which might be erroneous).
>> > So
>> > I have to change the Lucene code where it matches a query term to the
>> > indexed data, so that I can code my matching schemes there.
>> >
>> >
>> > On Tue, Jul 23, 2013 at 11:21 PM, Abhishek Gupta
>> > <abhi.bansal21@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >> I have a problem which is explained completely here. Please help!! or
>> >> just
>> >> give me some suggestion about from where to get help.
>> >>
>> >> --
>> >> Abhishek Gupta,
>> >> 897876422, 9416106204, 9624799165
>> >
>> >
>> >
>> >
>> > --
>> > Abhishek Gupta,
>> > 897876422, 9416106204, 9624799165
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Abhishek Gupta,
> 897876422, 9416106204, 9624799165

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message