lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Waldura" <renaud.wald...@library.ucsf.edu>
Subject RE: Text storing design and performance question
Date Wed, 10 Jan 2007 21:45:18 GMT
It could well be. Or maybe not; one could speculate the JDBC overhead
(especially over a network) would make DB access slower than disk (local
filesystem). YMMV as they say.



-----Original Message-----
From: moraleslos [mailto:moraleslos@hotmail.com] 
Sent: Wednesday, January 10, 2007 12:14 PM
To: java-user@lucene.apache.org
Subject: RE: Text storing design and performance question


Maybe keeping the data in the DB would make it quicker?  Seems like the I/O
performance would cause most of the performance issues you're seeing.  

-los


Renaud Waldura-5 wrote:
> 
> We used to store a big text field for highlighting purposes too, and 
> it proved a big pain. The index was gigantic, it took forever to 
> build, and the search performance would sometimes suffer from it (just 
> a hunch).
> 
> Now we keep this big text field on disk (in a file), and feed it to 
> the highlighter. Unfortunately the highlighter has to read the file, 
> parse it, etc... It's slooow, sometimes over a second on a large document.
> 
> I vaguely remember reading somewhere newer versions of the highlighter 
> are able to leverage term vectors to avoid re-parsing the field. (I 
> could be
> mistaken.) Maybe just storing term vectors would keep the index lean 
> and allow for fast highlighting?
> 
> --Renaud
> 
>  
> 
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, January 10, 2007 9:54 AM
> To: java-user@lucene.apache.org
> Subject: Re: Text storing design and performance question
> 
> Being stateless should not be much of an issue. As Erick mentioned, 
> the highlighter just expects you to pass it the query again and the 
> text to be highlighted. So when you show the pagination you just need 
> to keep around what query generated the current page...then shove each 
> piece of relevant text from the database through the highlighter (with 
> the query) before displaying it.
> 
> - Mark
> 
> On 1/10/07, moraleslos <moraleslos@hotmail.com> wrote:
>>
>>
>> Hi Mark,
>>
>> Looks like I've got to implement some sort of pagination for my clients.
>> Problem is everything is stateless so looks like there's some work I 
>> need to do on my end.  Thanks.
>>
>> -los
>>
>>
>>
>> markrmiller wrote:
>> >
>> > Usually a user cannot easily browse 50,000 on a single display, and 
>> > so you would only highlight the docs as they became visible to the
>> user.
>> > This is generally a small amount...often one at a time.
>> >
>> > - Mark
>> >
>> > moraleslos wrote:
>> >> Hi Erik,
>> >>
>> >> Would that slow performance a bit?  For example, say I receive 
>> >> 50,000 hits from a search.  From your explanation, I have to 
>> >> retrieve the DB id
>> from
>> >> each hit, perform a query to the DB using the id to retrieve the 
>> >> full contents for each hit, run highlighter on each content, and 
>> >> then
>> return?
>> >> Although I'll give this a shot, it will seem to slow performance 
>> >> on the searching side of things, wouldn't it?  Thanks for the reply.
>> >>
>> >> -los
>> >>
>> >>
>> >>
>> >> Erik Hatcher wrote:
>> >>
>> >>> You don't have to store a field to highlight text.  If you've got 
>> >>> it in your database, retrieve it from there and pass that string 
>> >>> to the highlighter instead.
>> >>>
>> >>>     Erik
>> >>>
>> >>>
>> >>> On Jan 10, 2007, at 10:45 AM, moraleslos wrote:
>> >>>
>> >>>
>> >>>> I'm running into a little dilemma with Lucene highlighting and 
>> >>>> indexing.  I currently index anything and everything that gets 
>> >>>> inserted into a database.
>> >>>> This database includes all the content that is searched.  Now 
>> >>>> I'll have lots and lots of content, thinking of the range of
>> >>>> 50GB+, all stored in the DB.
>> >>>> Using Lucene, I index all of this.  But since I'm using 
>> >>>> highlighting features, I'll also need to store the content into

>> >>>> the index.  Not sure what the performance implications are 
>> >>>> during a search but I know that indexing performance should be 
>> >>>> slower as well as the index size being
>> enormous.
>> >>>> Because I have duplicated data, one in the index and the other 
>> >>>> in the db, are there other ways of handling this situation in a

>> >>>> more efficient and performant way?  Thanks in advance.
>> >>>>
>> >>>> -los
>> >>>> --
>> >>>> View this message in context: 
>> >>>> http://www.nabble.com/Text-storing-
>> >>>> design-and-performance-question-tf2953201.html#a8259883
>> >>>> Sent from the Lucene - Java Users mailing list archive at
>> Nabble.com.
>> >>>>
>> >>>>
>> >>>>
>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: 
>> >>>> java-user-help@lucene.apache.org
>> >>>>
>> >>> -----------------------------------------------------------------
>> >>> ---- To unsubscribe, e-mail: 
>> >>> java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> > -------------------------------------------------------------------
>> > -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>>
> http://www.nabble.com/Text-storing-design-and-performance-question-tf2
> 953201
> .html#a8261739
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

--
View this message in context:
http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201
.html#a8265458
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message