lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Smith <dsmiths...@yahoo.com.INVALID>
Subject Re: Slow faceting performance on a docValues field
Date Tue, 13 Jan 2015 19:24:32 GMT
What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24
seconds.  The documentation for facet.method=fc is quite explicit about how Solr does faceting:


"fc (stands for Field Cache) The facet counts are calculated by iterating over documents
that match the query and summing the terms that appear in each document. This was the default
method for single valued fields prior to Solr 1.4."

If a search yielded millions of hits, I could understand 24 seconds to calculate the facets.
 But not for a search with only 3 hits.  


What am I missing?  

Regards,
David



 

     On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe <tomasflobbe@gmail.com>
wrote:
   

 No, you are not misreading, right now there is no automatic way of
generating the intervals on the server side similar to range faceting... I
guess it won't work in your case. Maybe you should create a Jira to add
this feature to interval faceting.

Tomás

On Tue, Jan 13, 2015 at 10:44 AM, David Smith <dsmithsolr@yahoo.com.invalid>
wrote:

> Tomás,
>
>
> Thanks for the response -- the performance of my query makes perfect sense
> in light of your information.
> I looked at Interval faceting.  My required interval is 1 day.  I cannot
> change that requirement.  Unless I am mis-reading the doc, that means to
> facet a 10 year range, the query needs to specify over 3,600 intervals ??
>
>
> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
>
>
> Each query would be 185MB in size if I structure it this way.
>
> I assume I must be mis-understanding how to use Interval faceting with
> dates.  Are there any concrete examples you know of?  A google search did
> not come up with much.
>
> Kind regards,
> Dave
>
>      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
>
>
>  Range Faceting won't use the DocValues even if they are there set, it
> translates each gap to a filter. This means that it will end up using the
> FilterCache, which should cause faster followup queries if you repeat the
> same gaps (and don't commit).
> You may also want to try interval faceting, it will use DocValues instead
> of filters. The API is different, you'll have to provide the intervals
> yourself.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <apache@elyograg.org>
> wrote:
>
> > On 1/13/2015 10:35 AM, David Smith wrote:
> > > I have a query against a single 50M doc index (175GB) using Solr
> 4.10.2,
> > that exhibits the following response times (via the debugQuery option in
> > Solr Admin):
> > > "process": {
> > >  "time": 24709,
> > >  "query": { "time": 54 }, "facet": { "time": 24574 },
> > >
> > >
> > > The query time of 54ms is great and exactly as expected -- this example
> > was a single-term search that returned 3 hits.
> > > I am trying to get the facet time (24.5 seconds) to be sub-second, and
> > am having no luck.  The facet part of the query is as follows:
> > >
> > > "params": { "facet.range": "eventDate",
> > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> > >  "f.eventDate.facet.range.gap": "+1DAY",
> > >  "start": "0",
> > >
> > >  "rows": "10",
> > >
> > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> > >
> > >  "f.eventDate.facet.mincount": "1",
> > >
> > >  "facet": "true",
> > >
> > >  "debugQuery": "true",
> > >  "_": "1421169383802"
> > >  }
> > >
> > > And, the relevant schema definition is as follows:
> > >
> > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> > multiValued="false" docValues="true"/>
> > >
> > >    <!-- A Trie based date field for faster date range queries and date
> > faceting. -->
> > >    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
> > positionIncrementGap="0"/>
> > >
> > >
> > > During the 25-second query, the Solr JVM pegs one CPU, with little or
> no
> > I/O activity detected on the drive that holds the 175GB index.  I have
> 48GB
> > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> > >
> > > I do NOT have any fieldValue caches configured as yet, because my
> > (perhaps too simplistic?) reading of the documentation was that DocValues
> > eliminates the need for a field-level cache on this facet field.
> >
> > 24GB of RAM to cache 175GB is probably not enough in the general case,
> > but if you're seeing very little disk I/O activity for this query, then
> > we'll leave that alone and you can worry about it later.
> >
> > What I would try immediately is setting the facet.method parameter to
> > enum and seeing what that does to the facet time.  I've had good luck
> > generally with that, even in situations where the docs indicated that
> > the default (fc) was supposed to work better.  I have never explored the
> > relationship between facet.method and docValues, though.
> >
> > I'm out of ideas after this.  I don't have enough experience with
> > faceting to help much.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
>

   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message