lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peng Gao <p...@esri.com>
Subject RE: Accumulating facets over a MultiReader
Date Fri, 05 Jul 2013 15:55:45 GMT
Thanks.

Yes, that's the case. I'll try it out.

Is Option 1 more expensive than re-indexing?


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Friday, July 05, 2013 8:25 AM
> To: java-user@lucene.apache.org
> Subject: Re: Accumulating facets over a MultiReader
> 
> Yes, there are two ways to do that. First, I assume that what you want to
> do is a.addIndexes(b), and if a document D in b is in a, you don't want to
> add it to a, right?
> 
> In that case, two options:
> 
> Option 1
> Iterate on the documents in b (their primary key) and if a doc is found in
> a, delete it from b. Then reopen an IndexReader and add to a, the existing
> docs won't be deleted.
> That's the more expensive way, but easiest to code.
> 
> Option 2
> Obtain b.getLiveDocs and unset the bits of every document that exists in a.
> Then use addIndexes with an AtomicReader which overrides getLiveDocs to
> return the modified live docs.
> Same as option 1, but you don't actually do the delete operation, which is
> more costly than just unsetting a bit.
> 
> Shai
> 
> 
> On Fri, Jul 5, 2013 at 6:10 PM, Peng Gao <pgao@esri.com> wrote:
> 
> > Shai,
> > Once again, thanks for the help.
> > Yes, I am re-indexing. Using FacetFields.addFacets() on the doc works.
> >
> > Given that I need to check the uniqueness before merging an index with
> > facets into a master, is there better way to it without re-indexing?
> >
> > Gao Peng
> >
> >
> > > -----Original Message-----
> > > From: Shai Erera [mailto:serera@gmail.com]
> > > Sent: Wednesday, July 03, 2013 11:49 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Accumulating facets over a MultiReader
> > >
> > > What do you mean addDocument()? You re-index it?
> > > In that case, when you re-index it, just make sure to use
> > > FacetFields.addFacets() on it, so its facets are re-indexed too.
> > >
> > > Shai
> > >
> > >
> > > On Wed, Jul 3, 2013 at 8:52 PM, Peng Gao <pgao@esri.com> wrote:
> > >
> > > > Shai,
> > > > Thanks.
> > > >
> > > > I went with option #3 since the temp indexes are actually created
> > > > in separate processes in my case.
> > > > It works.
> > > >
> > > > Now one more complication.
> > > > I have a case where I need to merge only unique docs in the temp
> > > > indexes into the master index. I have a unique key for each doc.
> > > > Before facets, I loop through the temp index, and for each doc,
> > > > check if it's already in the master,
> > > > addDocument() only if it doesn't exist.
> > > > Now I have facets, how do I selectively merge docs?
> > > >
> > > > Thanks again for your help,
> > > > Gao Peng
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Shai Erera [mailto:serera@gmail.com]
> > > > > Sent: Wednesday, July 03, 2013 9:02 AM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: Re: Accumulating facets over a MultiReader
> > > > >
> > > > > Hi
> > > > >
> > > > > There are a couple of ways you can address that:
> > > > >
> > > > > Not create an index per-thread, but rather update the global
> > > > > index by all threads. IndexWriter and TaxoWriter support multiple
> threads.
> > > > >
> > > > > -- Or, if you need to build an index per-thread --
> > > > >
> > > > > Use a single TaxonomyWriter instance and share between all the
> > threads.
> > > > > TaxoWriter is thread-safe, and that way you can build a single
> > > > > taxonomy index and later use IW.addIndexes.
> > > > >
> > > > > -- Or, if you cannot share TW instance between threads --
> > > > >
> > > > > Have each thread create its own taxonomy index, but then when
> > > > > you call addIndexes, you need to do two things:
> > > > > - Create a new TW instance and call addTaxonomy on it.
> > > > > - Call IW.addIndexes() with an OrdinalMappingAtomicReader. Look
> > > > > at its jdocs for an example code.
> > > > >
> > > > > Let me know if that works for you.
> > > > >
> > > > > Shai
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 3, 2013 at 6:14 PM, Peng Gao <pgao@esri.com> wrote:
> > > > >
> > > > > > Hi Shai,
> > > > > > Thanks for the reply.
> > > > > > Yes I used a single TaxonomyReader instance.
> > > > > > I am adding facets to an existing app, which maintains two
> > > > > > indexes, one for indexing system tools, and the other indexing
> > > > > > user data in folders.
> > > > > > The system tool index contains docs for describing the tool
> > > > > > usage, and etc, which needs to be its own index.
> > > > > >
> > > > > > It turned out that my problem is not MultiReader. The problem
> > > > > > is the index, i.e. the way it's created.
> > > > > > The app crawls folders in multiple threads, and each thread
> > > > > > creates a temp index.
> > > > > > The main thread merges the temp indexes into the master index,
> > > > > > using IndexWriter.AddIndexes().
> > > > > > If the temp index has facet index, this approach creates a bad
> > index.
> > > > > >
> > > > > > Is there a way I can build faceted index in multiple threads?
> > > > > >
> > > > > > - Gao Peng
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Shai Erera [mailto:serera@gmail.com]
> > > > > > > Sent: Monday, July 01, 2013 8:25 PM
> > > > > > > To: java-user@lucene.apache.org
> > > > > > > Subject: Re: Accumulating facets over a MultiReader
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I assume that you use a single TaxonomyReader instance?
It
> > > > > > > must be the
> > > > > > same
> > > > > > > for both indexes, that is, both indexes must share the
same
> > > > > > > taxonomy
> > > > > > index,
> > > > > > > or otherwise their ordinals would not match as well as
you
> > > > > > > may hit such exceptions since one index may have bigger
> > > > > > > ordinals than what the
> > > > > > taxonomy
> > > > > > > reader knows about.
> > > > > > >
> > > > > > > Can you share a little bit about your scenario and why
do
> > > > > > > you need to
> > > > > > use a
> > > > > > > MultiReader?
> > > > > > >
> > > > > > > Shai
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 2, 2013 at 3:31 AM, Peng Gao <pgao@esri.com>
wrote:
> > > > > > >
> > > > > > > > How do I accumulate counts over a MultiReader (2
> IndexReader)?
> > > > > > > > The following code causes an IOException:
> > > > > > > >
> > > > > > > >       ArrayList<FacetRequest> facetRequests
= new
> > > > > > > > ArrayList<FacetRequest>();
> > > > > > > >       for (String groupField : groupFields)
> > > > > > > >         facetRequests.add(new CountFacetRequest(new
> > > > > > > > CategoryPath(groupField, '/'), 1));
> > > > > > > >
> > > > > > > >       FacetSearchParams facetSearchParams = new
> > > > > > > > FacetSearchParams(facetRequests);
> > > > > > > >       StandardFacetsAccumulator accumulator = new
> > > > > > > > StandardFacetsAccumulator(facetSearchParams, reader,
> > > > taxonomyReader);
> > > > > > > >       FacetsCollector facetsCollector =
> > > > > > > > FacetsCollector.create(accumulator);
> > > > > > > >
> > > > > > > >       // perform documents search and facets accumulation
> > > > > > > >       searcher.search(query, facetsCollector);
> > > > > > > >
> > > > > > > >       // return facets results in a proper format
> > > > > > > >       return getFacetResults(facetsCollector, sr);
> > > > > > > >
> > > > > > > >
> > > > > > > > Here reader is a MultiReader of 2. I am using Lucene
4.3.1.
> > > > > > > >
> > > > > > > > The following is the callstack. It looks like it has
> > > > > > > > something to do with the MultiReader.
> > > > > > > > How do I make it work?
> > > > > > > >
> > > > > > > >
> > > > > > > > java.io.IOException: PANIC: Got unexpected exception
while
> > > > > > > > trying to get/calculate total counts
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum
> > > > > > ulat
> > > > > > e(St
> > > > > > andar
> > > > > > > dFacetsAccumulator.java:156)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum
> > > > > > ulat
> > > > > > e(St
> > > > > > andar
> > > > > > > dFacetsAccumulator.java:378)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.FacetsCollector.getFacetResults
> > > > > > (Fac
> > > > > > etsC
> > > > > > ollec
> > > > > > > tor.java:214)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > com.esri.arcgis.search.SearchHandler.getFacetResults(SearchHan
> > > > > > dler
> > > > > > .jav
> > > > > > a:551
> > > > > > > )
> > > > > > > >       at
> > > > > > > >
> > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:350)
> > > > > > > >       at
> > > > > > > >
> > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:239)
> > > > > > > >       at
> > > > > > > >
> > > com.esri.arcgis.search.test.Searcher.invokeSearch(Searcher.java:58)
> > > > > > > >       at
> > > > > > > > com.esri.arcgis.search.test.Searcher.main(Searcher.java:32
> > > > > > > > ) Caused by: java.lang.ArrayIndexOutOfBoundsException:
34
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.CountingAggregator.aggregate(Co
> > > > > > unti
> > > > > > ngAg
> > > > > > grega
> > > > > > > tor.java:43)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.fillA
> > > > > > rray
> > > > > > sFor
> > > > > > Parti
> > > > > > > tion(StandardFacetsAccumulator.java:309)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum
> > > > > > ulat
> > > > > > e(St
> > > > > > andar
> > > > > > > dFacetsAccumulator.java:168)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.complements.TotalFacetCounts.compute(T
> > > > > > otal
> > > > > > Face
> > > > > > tCoun
> > > > > > > ts.java:176)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.comp
> > > > > > uteA
> > > > > > ndCa
> > > > > > che(T
> > > > > > > otalFacetCountsCache.java:157)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.getT
> > > > > > otal
> > > > > > Coun
> > > > > > ts(To
> > > > > > > talFacetCountsCache.java:104)
> > > > > > > >       at
> > > > > > > >
> > > > > > >
> > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum
> > > > > > ulat
> > > > > > e(St
> > > > > > andar
> > > > > > > dFacetsAccumulator.java:129)
> > > > > > > >       ... 7 more
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > > --------------------------------------------------------------
> > > > > > ----
> > > > > > --- To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > --- To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message