lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count
Date Sat, 19 Jun 2010 09:10:22 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880463#action_12880463
] 

Michael McCandless commented on LUCENE-2142:
--------------------------------------------

Thanks Uwe!

So your fix avoids any exception altogether.  On 3x, you just stop
loading when we hit a termOrd > number of docs.  On trunk, we keep
loading, simply growing the array as needed.

I'm torn on what the best disposition here is.  This API should only
be used on single-token (per doc) fields, so this handling we're
adding/fixing is about how to handle the misuse of the API.

Neither solution is great -- throwing an exception is nasty since you
could be fine for some time and then only on indexing enough docs,
perhaps well into production, trip the exception.  But then silently
pretending nothing is wrong is also not great because the app then has
no clue.

Really this'd be a great time to use a logging framework -- we'd drop
a error, and then not throw an exception.

Net/net I think your solution (don't throw an exception) is the lesser
evil at this time, so I think we should go with that.

But: I think we should also fix trunk?  Ie, if hit termOrd > numDocs,
silently break, instead of trying to grow the array.  Because now (on
trunk) if you try to load a DocTermsIndex on a large tokenized text
field in a large index you'll (try to) use insane amounts of memory...



> FieldCache.getStringIndex should not throw exception if term count exceeds doc count
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2142
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message