lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
Date Sun, 19 Dec 2010 15:48:01 GMT


Simon Willnauer commented on LUCENE-2694:

I think we should remove TermsEnum.docFreq and .ord? Ie replace
with .termState().docFreq() and .ord()?

I disagree on that - at least docFreq() is an essential part of the API and we should not
force TermState creation just to get the df. Yet, TermState is an expert API you should not
need to pull an expert API to get something essential like df.
I would leave those as they are or only pull ord into TermState.

Maybe rename TermStateBase -> PrefixCodedTermState? Ie this is
really the TermState impl used by any codec using
PrefixCodedTerms? EG the fact that it stores the filePointer into
a _X.tis file is particular to it..
Yeah that sounds reasonable.

Maybe rename MockTermState -> BasicTermState? At first I was
thinking the codec should return null if it cannot seek by
TermState... (I generally don't like mock returns that hide/lose
information...) but then it's convenient to always have something
to hold the docFreq for the term to avoid lots of special cased
code... so I think it's OK?

I think we can get rid of it entirely. We can use TermStateBase for it and let PrefixCodedTermState
just add the pointer though. That way we get rid of it nicely. I would like to keep that api
as it is since it makes the usage easier especially in the rewrite methods..

bq. We lost the "clone using new" in StandardTermState...
I don't get that really - IMO this is quite minor but I will look into it again... 

Maybe revert changes to AppendingCodec? (Ie let it pass its terms
dict cache size again)

unintentional - will fix 

 wonder if we can somehow make PerReaderTermState use an array
(keyed by sub reader index) instead... seems like a new HashMap
per Term in an MTQ could be heavy. It's tricky because we don't
store enough information (ie to quickly map parent reader + sub
reader -> sub index). But I don't think this should hold up
committing... since our defaults don't typically allow for that
many terms in-flight it should be fine...

I actually had this in a very similar way. I used a custom linked list and relied on the fact
that the incoming reader are applied in the same order and skipped until the next reader with
that term appeared. I changed that back to Map impl to make it simpler since I didn't see
speedups - well this was caused by a very nifty coding error :D 

i think I have that patch around somewhere is the history... lets see..

bq. I think the TQ ctor that takes both docFreq and states can drop the docFreq? Ie it can
ask the states for it?

yeah sure - well the patch is my current state since I had to drop everything and leave on
friday... I clean up an upload a new patch early this week

@Uwe: I will incorporate your fix - thanks

> MTQ rewrite + weight/scorer init should be single pass
> ------------------------------------------------------
>                 Key: LUCENE-2694
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>         Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch,
> Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
> Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer
init also run in the same single pass as rewrite.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message