lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rebecca Watson <>
Subject Re: How to manage resource out of index?
Date Wed, 07 Jul 2010 06:15:43 GMT
hi li,

i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
sure the original file ended up the same filing system i.e. machine too!).

we ended up just storing the main text field too even though there was a
bit of text -- in the end solr/lucene can handle the index size fine and
disk space is cheaper than man-hours to customise solr/lucene to work
in this way!

that was our conclusion anyway and it works fine -- we also have
separate index / search server(s) so we don't care about merge time
either -- and as i said above - we use the distributed search so don't tend
to need to merge very large indexes anyway.
when your system grows / you go into production you'll probably split
the indexes too to use solr's distributed search func. for the sake of
query speed).

hope that helps,

bec :)

On 7 July 2010 14:07, Li Li <> wrote:
> I used to store full text into lucene index. But I found it's very
> slow when merging index because when merging 2 segments it copy the
> fdt files into a new one. So I want to only index full text. But When
> searching I need the full text for applications such as hightlight and
> view full text. I can store the full text by <url,full text> pair in
> database and load it to memory. And When I search in lucene(or solr),
> I retrive url of doc first, then use url to get full text. But when
> they are stored separately, it is hard to managed. They may be not
> consistent with each other. Does lucene or solr provied any method to
> ease this problem? Or any one  has some experience of this problem?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message