lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: Lucene Scalability Question
Date Wed, 10 Jan 2007 20:30:29 GMT
I think the contrib 'Oracle Full Text' does this (although in the  
reverse).

It uses Lucene for full text queries (embedded into the db), the  
query analyzer works.

It is really a great piece of software. Do bad it can't be done in a  
standard way so that it would work with all dbs.

I think it may be possible to embedded the Apache Derby to do  
something like this, although this might be overkill. A simple b-tree  
db might work best.

It would be interesting if the documents could be stored in a btree,  
and a GUID used to access them (since the lucene docid is constantly  
changing). The only stored field in a lucene Document would be the GUID.

On Jan 10, 2007, at 2:21 PM, J. Delgado wrote:

> This is a more general question:
>
> Given the fact that most applications require querying a combination
> of full-text and structured data has anyone looked into building data
> structures at the most fundamental level  (e.g. combination of b-tree
> and inverted lists) that would enable scalable and performant
> structured (e.g.SQL or XQuery) + Full-Text queries?
>
> Can Lucene be taken as basis for this or do you recommend exploring
> other routes?
>
> -- Joaquin
>
> 2007/1/10, Chris Hostetter <hossman_lucene@fucit.org>:
>>
>> : So you mean lucene can't do better than this ?
>>
>> robert's point is that based on what you've told us, there is no  
>> reason to
>> think Lucene makes sense for you -- if *all* you are doing is finding
>> documents based on numeric rnages, then a relational database is  
>> petter
>> suited to your task.  if you accutally care about the tetual IR  
>> features
>> of Lucene, then there are probably ways to make your searches  
>> faster, but
>> you aren't giving us enough information.
>>
>> you said the example code you gave was in a loop ... but a loop  
>> over what?
>> .. what cahnges with each iteration of the loop? ... if there are
>> RangeFilter's that ge reused more then once, CachingWrapperFilter  
>> can come
>> in handy to ensure that work isn't done more often then it needs  
>> to me.
>>
>> it's also not clear wether your query on "type:0" is just a  
>> placeholder,
>> or indicative of what you acctually want to do in the long run ...  
>> if all
>> of your queries are this simple, and all you care about is getting  
>> a count
>> of things that have type:0 and are in your numeric ranges, then  
>> don'g use
>> the "search" method at all, just put "type:0" in your  
>> ChainedFilter and
>> call the "bits" method directly.
>>
>> you also haven't given us any information about wether or not you are
>> opening a new IndexSearcher/IndexReader every time you execute a  
>> query, or
>> resuing the same instance -- reuse makes the perofrance much better
>> because it can reuse underlying resources.
>>
>> In short: if you state some performance numbers from timing some  
>> code, and
>> want to know how to make that code faster, you have to actualy  
>> show people
>> *all* of the code for them to be able to help you.
>>
>>
>> : >>  I still have the search problem I had before, now search  
>> takes around
>> : >> 750
>> : >> msecs for a small set of documents.
>> : >>
>> : >>     [java] Total Query Processing time (msec) : 38745
>> : >>     [java] Total No. of Documents : 7,500,000
>> : >>     [java] Total No. of Executed queries : 50.0
>> : >>     [java] Execution time per query : 774.9 msec
>> : >>
>> : >>  The index is optimized and its size is 830 MB.
>> : >>  Each document has the following terms :
>> : >>     VSID(integer), data(float), type(short int) , precision  
>> (byte).
>> : >>   The queries are generate in a loop similar to one below :
>> : >> loop ...
>> : >>     RangeFilter rq1 = new
>> : >> RangeFilter 
>> ("data","+5.43243243440000","+5.43243243449999"true,true);
>> : >>     RangeFilter rq2 = new RangeFilter
>> : >> ("precision","+0001","+0002",true,true);
>> : >>     ChainedFilter cf = new ChainedFilter(new
>> : >> Filter[]{rq2,rq1},ChainedFilter.AND);
>> : >>     Query query = qp.parse("type:0");
>> : >>     Hits hits = searcher.search(query,cf);
>> : >> end loop
>> : >>
>> : >>  I would like to know if there exist any solution to improve  
>> the search
>> : >> time ?  (I need to insert more than 500 million of these data  
>> pages into
>> : >> lucene)
>>
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message