lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: How to get document effectively. or FieldCache example
Date Fri, 21 Apr 2017 18:50:06 GMT

: then which one is right tool for text searching in files. please can you
: suggest me?

so far all you've done is show us your *indexing* code; and said that 
after you do a search, calling searcher.doc(docid) on 500,000 documents is 
slow.

But you still haven't described the usecase you are trying to solve -- ie: 
*WHY* do you want these 500,000 results from your search? Once you get 
those Documents back, *WHAT* are you going to do with them?

If you show us some code, and talk us through your goal, then we can help 
you -- otherwise all we can do is warn you that the specific 
searcher.doc(docid) API isn't designed to be efficient at that large a 
scale.  Other APIs in Lucene are designed to be efficient at large scale, 
but we don't really know what to suggest w/o knowing what you're trying to 
do...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5 years 
old, and completley unsupported -- any advice you are given on this list 
is likeley to refer to APIs that are completley different then the version 
of Lucene you are working with.


: 
: 
: On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpountz@gmail.com> wrote:
: 
: > Lucene is not designed for retrieving that many results. What are you doing
: > with those 5 lacs documents, I suspect this is too much to display so you
: > probably perform some computations on them? If so maybe you could move them
: > to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
: > is not the right tool for your problem.
: >
: > Le ven. 21 avr. 2017 à 08:56, neeraj shah <neerajshah84@gmail.com> a
: > écrit :
: >
: > > Yes I fetching around 5 lacs result from index searcher.
: > > Also i am indexing each line of each file because while searching i need
: > > all the lines of a file which has matched term.
: > > Please tell me am i doing it right.
: > > {code}
: > >
: > > InputStream  is = new BufferedInputStream(new FileInputStream(file));
: > >     BufferedReader bufr = new BufferedReader(new InputStreamReader(is));
: > >     String inputLine="" ;
: > >
: > >     while((inputLine=bufr.readLine())!=null ){
: > > Document doc = new Document();
: > >     doc.add(new
: > >
: > > Field("contents",inputLine,Field.Store.YES,Field.Index.
: > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
: > >     doc.add(new
: > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     String newRem = new String(rem);
: > >
: > >     doc.add(new
: > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
: > >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
: > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
: > >
: > >     doc.add(new
: > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     doc.add(new
: > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     doc.add(new
: > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >
: > >     writer.addDocument(doc);
: > >
: > > }
: > >     is.close();
: > >
: > > {/code}
: > >
: > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpountz@gmail.com> wrote:
: > >
: > > > IndexSearcher.doc is the right way to retrieve documents. If this is
: > > > slowing things down for you, I'm wondering that you might be fetching
: > too
: > > > many results?
: > > >
: > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <neerajshah84@gmail.com>
a
: > > > écrit :
: > > >
: > > > > Hello Everyone,
: > > > >
: > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
: > > > > performing the search when i try to reterive documents from seacher
: > > using
: > > > > searcher.doc(docid)  it slows down the search .
: > > > > Please is there any other way to get the document.
: > > > >
: > > > > Also if anyone can give me an end-to-end example for working
: > > FieldCache.
: > > > > While implementing the cache i have :
: > > > >
: > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
: > > > >
: > > > > now i dont know how to further use the fieldIds for improving search.
: > > > > Please give me an end-to-end example.
: > > > >
: > > > > Thanks
: > > > > Neeraj
: > > > >
: > > >
: > >
: >
: 

-Hoss
http://www.lucidworks.com/

Mime
View raw message