lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From neeraj shah <neerajsha...@gmail.com>
Subject Re: How to get document effectively. or FieldCache example
Date Mon, 24 Apr 2017 05:35:45 GMT
This way I am indexing my files:

InputStream  is = new BufferedInputStream(new
FileInputStream(file));BufferedReader bufr = new BufferedReader(new
InputStreamReader(is));String inputLine=""
;while((inputLine=bufr.readLine())!=null ){Document doc = new
Document();
doc.add(new Field("contents",inputLine,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
    String newRem = new String(rem);
doc.add(new Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
"").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
doc.add(new Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));

writer.addDocument(doc);
}
is.close();

Can you explain you please solution in steps?

It will be very helpful

Thanks,

Neeraj




On Sun, Apr 23, 2017 at 1:17 AM, Jacques Uber <uberj@miradortech.com> wrote:

> Have you considered indexing chapters as documents? Using your example you
> would have three documents corresponding to your three chapters: A, B, and
> D. Once you have that structure the query "pain AND head" returns only
> chapters A and B. Using the information gained from this new chapter index
> you could then use your existing index to do "pain AND head AND (chapter:A
> OR chapter:B)"
>
> On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah <neerajshah84@gmail.com>
> wrote:
>
> > Hello,
> > Let me explain my case:
> > - suppose I am  searching word ("pain" (in same chapter) "head") .
> This
> > is my query.
> >  Now what i need to do is i need to first search "pain" and then i need
> to
> > search "head" seperately then i need common file name of both search
> > result.
> > Now the criteria is Suppose:
> >
> > FileA - Chapter A  - has word only "*pain*"
> > FileB - Chapter B  - has word both "*head*" and "*pain*"
> > FileC - Chapter A  - has word only "*head*"
> > FileD - Chapter D  - has only word "*head*"
> > FileE -  Chapter A - has only word "*pain*"
> >
> > Now the result should be:
> > FileA - Chapter A  - has word only "*pain*"
> > FileB - Chapter B  - has word both "*head*" and "*pain*"
> > FileC - Chapter A  - has word only "*head*"
> > FileE -  Chapter A - has only word "*pain*"
> >
> > FileD - Chapter D  - has only word "*head*" will not appear in search
> > result because "Chapter D" name is not same as other chapters which has
> > both search words.
> > In short I have to show only those chapters from any book but with same
> > chapter name which has both search word or atleast one search word. But
> > chapter name should be same.
> >
> > Above is my requirement that is why I was parsing all hits for pain and
> > head seperatly then i was collecting common "title" or chapter name from
> > both results or the result which has atleast one search word and same
> > chapter name.
> > In my result only "pain" word has "5 Lacs result" and "head" word has
> "60K"
> > results.
> >
> > Please suggest me if you have other approach in mind.
> >
> > Thanks,
> > Neeraj
> >
> >
> >
> >
> >
> >
> > On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter <
> > hossman_lucene@fucit.org>
> > wrote:
> >
> > >
> > > : then which one is right tool for text searching in files. please can
> > you
> > > : suggest me?
> > >
> > > so far all you've done is show us your *indexing* code; and said that
> > > after you do a search, calling searcher.doc(docid) on 500,000 documents
> > is
> > > slow.
> > >
> > > But you still haven't described the usecase you are trying to solve --
> > ie:
> > > *WHY* do you want these 500,000 results from your search? Once you get
> > > those Documents back, *WHAT* are you going to do with them?
> > >
> > > If you show us some code, and talk us through your goal, then we can
> help
> > > you -- otherwise all we can do is warn you that the specific
> > > searcher.doc(docid) API isn't designed to be efficient at that large a
> > > scale.  Other APIs in Lucene are designed to be efficient at large
> scale,
> > > but we don't really know what to suggest w/o knowing what you're trying
> > to
> > > do...
> > >
> > > https://people.apache.org/~hossman/#xyproblem
> > > XY Problem
> > >
> > > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> > > with "X", you are assuming "Y" will help you, and you are asking about
> > "Y"
> > > without giving more details about the "X" so that we can understand the
> > > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > > See Also: http://www.perlmonks.org/index.pl?node_id=542341
> > >
> > >
> > > PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5
> > years
> > > old, and completley unsupported -- any advice you are given on this
> list
> > > is likeley to refer to APIs that are completley different then the
> > version
> > > of Lucene you are working with.
> > >
> > >
> > > :
> > > :
> > > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpountz@gmail.com>
> > wrote:
> > > :
> > > : > Lucene is not designed for retrieving that many results. What are
> you
> > > doing
> > > : > with those 5 lacs documents, I suspect this is too much to display
> so
> > > you
> > > : > probably perform some computations on them? If so maybe you could
> > move
> > > them
> > > : > to Lucene using eg. facets? If that does not work, I'm afraid that
> > > Lucene
> > > : > is not the right tool for your problem.
> > > : >
> > > : > Le ven. 21 avr. 2017 à 08:56, neeraj shah <neerajshah84@gmail.com>
> a
> > > : > écrit :
> > > : >
> > > : > > Yes I fetching around 5 lacs result from index searcher.
> > > : > > Also i am indexing each line of each file because while
> searching i
> > > need
> > > : > > all the lines of a file which has matched term.
> > > : > > Please tell me am i doing it right.
> > > : > > {code}
> > > : > >
> > > : > > InputStream  is = new BufferedInputStream(new
> > FileInputStream(file));
> > > : > >     BufferedReader bufr = new BufferedReader(new
> > > InputStreamReader(is));
> > > : > >     String inputLine="" ;
> > > : > >
> > > : > >     while((inputLine=bufr.readLine())!=null ){
> > > : > > Document doc = new Document();
> > > : > >     doc.add(new
> > > : > >
> > > : > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > > : > >     doc.add(new
> > > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_
> ANALYZED));
> > > : > >     String newRem = new String(rem);
> > > : > >
> > > : > >     doc.add(new
> > > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > > : > >     doc.add(new Field("fieldsort2",rem.
> > toLowerCase().replaceAll("-",
> > > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > > : > >
> > > : > >     doc.add(new
> > > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_
> ANALYZED));
> > > : > >     doc.add(new
> > > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > > : > >     doc.add(new
> > > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > > : > >
> > > : > >     writer.addDocument(doc);
> > > : > >
> > > : > > }
> > > : > >     is.close();
> > > : > >
> > > : > > {/code}
> > > : > >
> > > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpountz@gmail.com
> >
> > > wrote:
> > > : > >
> > > : > > > IndexSearcher.doc is the right way to retrieve documents.
If
> this
> > > is
> > > : > > > slowing things down for you, I'm wondering that you might
be
> > > fetching
> > > : > too
> > > : > > > many results?
> > > : > > >
> > > : > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <
> > neerajshah84@gmail.com>
> > > a
> > > : > > > écrit :
> > > : > > >
> > > : > > > > Hello Everyone,
> > > : > > > >
> > > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts.
> > > After
> > > : > > > > performing the search when i try to reterive documents
from
> > > seacher
> > > : > > using
> > > : > > > > searcher.doc(docid)  it slows down the search .
> > > : > > > > Please is there any other way to get the document.
> > > : > > > >
> > > : > > > > Also if anyone can give me an end-to-end example for
working
> > > : > > FieldCache.
> > > : > > > > While implementing the cache i have :
> > > : > > > >
> > > : > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(
> indexMultiReader,
> > > "id");
> > > : > > > >
> > > : > > > > now i dont know how to further use the fieldIds for improving
> > > search.
> > > : > > > > Please give me an end-to-end example.
> > > : > > > >
> > > : > > > > Thanks
> > > : > > > > Neeraj
> > > : > > > >
> > > : > > >
> > > : > >
> > > : >
> > > :
> > >
> > > -Hoss
> > > http://www.lucidworks.com/
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message