lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Accumulating facets over a MultiReader
Date Fri, 05 Jul 2013 15:24:55 GMT
Yes, there are two ways to do that. First, I assume that what you want to
do is a.addIndexes(b), and if a document D in b is in a, you don't want to
add it to a, right?

In that case, two options:

Option 1
Iterate on the documents in b (their primary key) and if a doc is found in
a, delete it from b. Then reopen an IndexReader and add to a, the existing
docs won't be deleted.
That's the more expensive way, but easiest to code.

Option 2
Obtain b.getLiveDocs and unset the bits of every document that exists in a.
Then use addIndexes with an AtomicReader which overrides getLiveDocs to
return the modified live docs.
Same as option 1, but you don't actually do the delete operation, which is
more costly than just unsetting a bit.

Shai


On Fri, Jul 5, 2013 at 6:10 PM, Peng Gao <pgao@esri.com> wrote:

> Shai,
> Once again, thanks for the help.
> Yes, I am re-indexing. Using FacetFields.addFacets() on the doc works.
>
> Given that I need to check the uniqueness before merging an index with
> facets
> into a master, is there better way to it without re-indexing?
>
> Gao Peng
>
>
> > -----Original Message-----
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Wednesday, July 03, 2013 11:49 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Accumulating facets over a MultiReader
> >
> > What do you mean addDocument()? You re-index it?
> > In that case, when you re-index it, just make sure to use
> > FacetFields.addFacets() on it, so its facets are re-indexed too.
> >
> > Shai
> >
> >
> > On Wed, Jul 3, 2013 at 8:52 PM, Peng Gao <pgao@esri.com> wrote:
> >
> > > Shai,
> > > Thanks.
> > >
> > > I went with option #3 since the temp indexes are actually created in
> > > separate processes in my case.
> > > It works.
> > >
> > > Now one more complication.
> > > I have a case where I need to merge only unique docs in the temp
> > > indexes into the master index. I have a unique key for each doc.
> > > Before facets, I loop through the temp index, and for each doc, check
> > > if it's already in the master,
> > > addDocument() only if it doesn't exist.
> > > Now I have facets, how do I selectively merge docs?
> > >
> > > Thanks again for your help,
> > > Gao Peng
> > >
> > >
> > > > -----Original Message-----
> > > > From: Shai Erera [mailto:serera@gmail.com]
> > > > Sent: Wednesday, July 03, 2013 9:02 AM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: Accumulating facets over a MultiReader
> > > >
> > > > Hi
> > > >
> > > > There are a couple of ways you can address that:
> > > >
> > > > Not create an index per-thread, but rather update the global index
> > > > by all threads. IndexWriter and TaxoWriter support multiple threads.
> > > >
> > > > -- Or, if you need to build an index per-thread --
> > > >
> > > > Use a single TaxonomyWriter instance and share between all the
> threads.
> > > > TaxoWriter is thread-safe, and that way you can build a single
> > > > taxonomy index and later use IW.addIndexes.
> > > >
> > > > -- Or, if you cannot share TW instance between threads --
> > > >
> > > > Have each thread create its own taxonomy index, but then when you
> > > > call addIndexes, you need to do two things:
> > > > - Create a new TW instance and call addTaxonomy on it.
> > > > - Call IW.addIndexes() with an OrdinalMappingAtomicReader. Look at
> > > > its jdocs for an example code.
> > > >
> > > > Let me know if that works for you.
> > > >
> > > > Shai
> > > >
> > > >
> > > >
> > > > On Wed, Jul 3, 2013 at 6:14 PM, Peng Gao <pgao@esri.com> wrote:
> > > >
> > > > > Hi Shai,
> > > > > Thanks for the reply.
> > > > > Yes I used a single TaxonomyReader instance.
> > > > > I am adding facets to an existing app, which maintains two
> > > > > indexes, one for indexing system tools, and the other indexing
> > > > > user data in folders.
> > > > > The system tool index contains docs for describing the tool usage,
> > > > > and etc, which needs to be its own index.
> > > > >
> > > > > It turned out that my problem is not MultiReader. The problem is
> > > > > the index, i.e. the way it's created.
> > > > > The app crawls folders in multiple threads, and each thread
> > > > > creates a temp index.
> > > > > The main thread merges the temp indexes into the master index,
> > > > > using IndexWriter.AddIndexes().
> > > > > If the temp index has facet index, this approach creates a bad
> index.
> > > > >
> > > > > Is there a way I can build faceted index in multiple threads?
> > > > >
> > > > > - Gao Peng
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Shai Erera [mailto:serera@gmail.com]
> > > > > > Sent: Monday, July 01, 2013 8:25 PM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: Re: Accumulating facets over a MultiReader
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I assume that you use a single TaxonomyReader instance? It must
> > > > > > be the
> > > > > same
> > > > > > for both indexes, that is, both indexes must share the same
> > > > > > taxonomy
> > > > > index,
> > > > > > or otherwise their ordinals would not match as well as you may
> > > > > > hit such exceptions since one index may have bigger ordinals
> > > > > > than what the
> > > > > taxonomy
> > > > > > reader knows about.
> > > > > >
> > > > > > Can you share a little bit about your scenario and why do you
> > > > > > need to
> > > > > use a
> > > > > > MultiReader?
> > > > > >
> > > > > > Shai
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 2, 2013 at 3:31 AM, Peng Gao <pgao@esri.com>
wrote:
> > > > > >
> > > > > > > How do I accumulate counts over a MultiReader (2 IndexReader)?
> > > > > > > The following code causes an IOException:
> > > > > > >
> > > > > > >       ArrayList<FacetRequest> facetRequests = new
> > > > > > > ArrayList<FacetRequest>();
> > > > > > >       for (String groupField : groupFields)
> > > > > > >         facetRequests.add(new CountFacetRequest(new
> > > > > > > CategoryPath(groupField, '/'), 1));
> > > > > > >
> > > > > > >       FacetSearchParams facetSearchParams = new
> > > > > > > FacetSearchParams(facetRequests);
> > > > > > >       StandardFacetsAccumulator accumulator = new
> > > > > > > StandardFacetsAccumulator(facetSearchParams, reader,
> > > taxonomyReader);
> > > > > > >       FacetsCollector facetsCollector =
> > > > > > > FacetsCollector.create(accumulator);
> > > > > > >
> > > > > > >       // perform documents search and facets accumulation
> > > > > > >       searcher.search(query, facetsCollector);
> > > > > > >
> > > > > > >       // return facets results in a proper format
> > > > > > >       return getFacetResults(facetsCollector, sr);
> > > > > > >
> > > > > > >
> > > > > > > Here reader is a MultiReader of 2. I am using Lucene 4.3.1.
> > > > > > >
> > > > > > > The following is the callstack. It looks like it has something
> > > > > > > to do with the MultiReader.
> > > > > > > How do I make it work?
> > > > > > >
> > > > > > >
> > > > > > > java.io.IOException: PANIC: Got unexpected exception while
> > > > > > > trying to get/calculate total counts
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulat
> > > > > e(St
> > > > > andar
> > > > > > dFacetsAccumulator.java:156)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulat
> > > > > e(St
> > > > > andar
> > > > > > dFacetsAccumulator.java:378)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.FacetsCollector.getFacetResults(Fac
> > > > > etsC
> > > > > ollec
> > > > > > tor.java:214)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > com.esri.arcgis.search.SearchHandler.getFacetResults(SearchHandler
> > > > > .jav
> > > > > a:551
> > > > > > )
> > > > > > >       at
> > > > > > >
> > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:350)
> > > > > > >       at
> > > > > > >
> > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:239)
> > > > > > >       at
> > > > > > >
> > com.esri.arcgis.search.test.Searcher.invokeSearch(Searcher.java:58)
> > > > > > >       at
> > > > > > > com.esri.arcgis.search.test.Searcher.main(Searcher.java:32)
> > > > > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 34
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.CountingAggregator.aggregate(Counti
> > > > > ngAg
> > > > > grega
> > > > > > tor.java:43)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.fillArray
> > > > > sFor
> > > > > Parti
> > > > > > tion(StandardFacetsAccumulator.java:309)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulat
> > > > > e(St
> > > > > andar
> > > > > > dFacetsAccumulator.java:168)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.complements.TotalFacetCounts.compute(Total
> > > > > Face
> > > > > tCoun
> > > > > > ts.java:176)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.computeA
> > > > > ndCa
> > > > > che(T
> > > > > > otalFacetCountsCache.java:157)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.getTotal
> > > > > Coun
> > > > > ts(To
> > > > > > talFacetCountsCache.java:104)
> > > > > > >       at
> > > > > > >
> > > > > >
> > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulat
> > > > > e(St
> > > > > andar
> > > > > > dFacetsAccumulator.java:129)
> > > > > > >       ... 7 more
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > --- To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message