lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Statically store sub-collections for search (faceted search?)
Date Fri, 12 Apr 2013 14:34:46 GMT
Dear list,
I would like to create a sub-set of the documents in an index that is to
be used for further searches. However, the criteria that lead to the
creation of that sub-set are not predefined so I think that faceted
search cannot be applied my this use case.

For instance:
A user searches for documents that contain token 'A' in a field 'text'.
These results form a set of documents that is persistently stored (in a
database). Each document in the index has a field 'id' that identifies
it, so these "external" IDs are stored in the database.

Later on, a user loads the document IDs from the database and wants to
execute another search on this set of documents only. However,
performing a search on the full index and subsequently filtering the
results against that list of documents takes very long if there are many
matches. This is obvious as I have to retrieve the external id from each
matching document and check whether it is part of the desired sub-set.
Constructing a BooleanQuery in the style "id:Doc1 OR id:Doc2 ..." is not
suitable either because there could be thousands of documents exceeding
any limit for Boolean clauses.

Any suggestions how to solve this? I would have gone for the Lucene
document numbers and store them as a bit set that I could use as a
filter during later searches, but I read that the document numbers are
ephemeral.

One possible way out seems to be to create another index from the
documents that have matched the initial search, but this seems quite an
overkill, especially if there are plenty of them...

Thanks for any hint!
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message