lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <abenede...@apache.org>
Subject Re: [Solr 6] Migration from Solr 4.10.2
Date Tue, 31 May 2016 17:02:34 GMT
I think we found our performance killer here :

https://issues.apache.org/jira/browse/SOLR-9176

Basically we were thinking to use Term Enum, but actually under the hood
 Solr forces you to use FCS with single valued numeric fields.
In Solr 4 was not like that.
I checked the commit related , and it is not functionally equivalent and no
message in there related.
Let's continue the discussion in Jira.

On Wed, May 25, 2016 at 9:45 AM, Alessandro Benedetti <abenedetti@apache.org
> wrote:

> I was taking a look into the code again :
> org/apache/solr/search/facet/FacetField.java:115 ( branch 6.0 )
>
> if (!multiToken) {
>> if (ntype != null) {
>> // single valued numeric (docvalues or fieldcache)
>> return new FacetFieldProcessorNumeric(fcontext, this, sf);
>> } else {
>> // single valued string...
>> return new FacetFieldProcessorDV(fcontext, this, sf);
>> }
>> }
>> // multi-valued after this point
>> if (sf.hasDocValues() || method == FacetMethod.DV) {
>> // single and multi-valued string docValues
>> return new FacetFieldProcessorDV(fcontext, this, sf);
>> }
>> // Top-level multi-valued field cache (UIF)
>> return new FacetFieldProcessorUIF(fcontext, this, sf);
>
>
> This part is for the new Json Facet code ( but when you pass the uif
> method in legacy facet, we pass to this code mocking the Json ).
> According to this code if you have docValues for the field, single valued
> or multi Valued you are going to use FacetFieldProcessorDV.
> This seems to be the reason I don't see my fieldValueCache populated, I
> have both single/multi valued fields now, but all of them have docValues!
>
> On Tue, May 24, 2016 at 9:38 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Alessandro,
>>
>> I checked with Solr 6.0 distro on techproducts.
>> Faceting on cat with uif hits fieldValueCache
>>
>> http://localhost:8983/solr/techproducts/select?facet.field=cat&facet.method=uif&facet=on&indent=on&q=*:*&wt=json
>>
>> fieldValueCache
>> - class:org.apache.solr.search.FastLRUCache
>> - description:Concurrent LRU Cache(maxSize=10000, initialSize=10,
>> minSize=9000, acceptableSize=9500, cleanupThread=false)
>> - src:
>> - version:1.0 stats:
>>
>>    - cumulative_evictions:0
>>    - cumulative_hitratio:0.5
>>    - cumulative_hits:1
>>    - cumulative_inserts:2
>>    - cumulative_lookups:2
>>    - evictions:0
>>    - hitratio:0.5
>>    - hits:1
>>    - inserts:2
>>    - item_cat:
>>
>>  {field=cat,memSize=4665,tindexSize=46,time=28,phase1=27,nTerms=16,bigTerms=2,termInstances=21,uses=0}
>>    - lookups:2
>>    - size:1
>>
>> Beware, for example field manu_exact doesn't hit field value cache,
>> because
>> it single valued and goes to FacetFieldProcessorDV instead of
>> FacetFieldProcessorUIF.  And cat is multivalued and hits UIF. see
>> org.apache.solr.search.facet.FacetField.createFacetProcessor(FacetContext)
>> it might need to just debug there.
>>
>> In summary, uif works and you have a chance to hit it. Goof Luck!
>>
>> On Tue, May 24, 2016 at 7:43 PM, Alessandro Benedetti <
>> benedetti.alex85@gmail.com> wrote:
>>
>> > Update , it seems clear I incurred in the bad
>> > https://issues.apache.org/jira/browse/SOLR-8096 :
>> >
>> > Just adding some additional information as I just incurred on the issue
>> > with Solr 6.0 :
>> > Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with
>> high
>> > cardinality on top of grouping.
>> > Groping was not affecting at all.
>> >
>> > All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0
>> around
>> > 550 ms .
>> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in
>> Solr
>> > 6.0.
>> > In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> > cumulative_hitratio of 0.96 .
>> > Switching from enum to fc to fcs to uif did not change that much.
>> >
>> > Moving to DocValues didn't improve that much the situation ( but I was
>> on
>> > an optimized index, so I need to try the multi-segmented one according
>> > to Mikhail
>> > Khludnev
>> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mkhludnev>
>> > contribution
>> > in Solr 5.4.0 ) .
>> >
>> > Moving to field collapsing moved down the query to 110-120 ms ( but
>> this is
>> > normal, we were faceting on 260 /1 million orignal docs)
>> > Adding facet.threads=NCores moved down the queryTime to 100 ms, in
>> > combination with field collapsing we reached 80-90 ms when warmed.
>> >
>> > What are the plan for the future related this ?
>> > Do we want to deprecate the legacy facets implementation and move
>> > everything to Json facets ( like it happened with the UIF ) ?
>> > So backward compatible but different implementation ?
>> >
>> > I think for migrations should be a transparent process.
>> >
>> >
>> > Cheers
>> >
>> > On Mon, May 23, 2016 at 6:49 PM, Alessandro Benedetti <
>> > benedetti.alex85@gmail.com> wrote:
>> >
>> > > Furthermore I was checking the internals of the old facet
>> implementation
>> > (
>> > > which comes when using the classic request parameter based,  instead
>> of
>> > the
>> > > json facet). It seems that if you enable docValues even with the enun
>> > > method passed as parameter , actually fc with docValues will be used.
>> > > i will give some report on the performance we get with docValues.
>> > >
>> > > Cheers
>> > > On 23 May 2016 16:29, "Joel Bernstein" <joelsolr@gmail.com> wrote:
>> > >
>> > >> If you can make min/max work for you instead of sort then it should
>> be
>> > >> faster, but I haven't spent time comparing the performance.
>> > >>
>> > >> But if you're using the top_fc with the min/max param the performance
>> > >> between Solr 4 & Solr 6 should be very close as the data structures
>> > behind
>> > >> them are the same.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Joel Bernstein
>> > >> http://joelsolr.blogspot.com/
>> > >>
>> > >> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti <
>> > >> abenedetti@apache.org
>> > >> > wrote:
>> > >>
>> > >> > Hi Joel,
>> > >> > thanks for the reply, actually we were not using field collapsing
>> > >> before,
>> > >> > we basically want to replace grouping with that.
>> > >> > The grouping performance between Solr 4 and 6 are basically
>> > comparable.
>> > >> > It's surprising I got so big degradation with the field collapsing.
>> > >> >
>> > >> > So basically the comparison we did were based on the Solr4 queries
>> ,
>> > >> > extracted from logs, and modified slightly to include field
>> collapsing
>> > >> > parameter.
>> > >> >
>> > >> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically
>> > >> proceeded
>> > >> > in this way :
>> > >> >
>> > >> > 1) install Solr 4.10.2 and Solr 6.0.0
>> > >> > 2) migrate the index with the related lucene tool ( 4.10.2 ->
>> 5.5.0 ->
>> > >> Solr
>> > >> > 6.0 )
>> > >> > 3) switch on/off the 2 instances and repeating the tests both
with
>> > cold
>> > >> > instances and warm instances.
>> > >> >
>> > >> > This means that the query looks the same.
>> > >> > I have not double checked the results but only the timings.
>> > >> > I will provide additional feedback to see if the query are
>> producing
>> > >> > comparable results as well.
>> > >> >
>> > >> > Related your suggestion about the top_fc, thanks, I will try that
.
>> > >> > I actually discovered that a little bit after I posted the mailing
>> > list
>> > >> ( I
>> > >> > think exactly from another post of yours :) )
>> > >> >
>> > >> > Not sure if setting up docValues for the field we use to collapse
>> > could
>> > >> > give some benefit as well.
>> > >> >
>> > >> > I keep you updated,
>> > >> >
>> > >> > Cheers
>> > >> >
>> > >> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein <
>> joelsolr@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > Were you using the sort param or min/max param in Solr 4
to
>> select
>> > the
>> > >> > > group head? The sort work came later and I'm not sure how
it
>> > compares
>> > >> in
>> > >> > > performance to the min/max param.
>> > >> > >
>> > >> > > Since you are collapsing on a string field you can use the
top_fc
>> > hint
>> > >> > > which will use a top level field cache for the collapse.
This is
>> > >> faster
>> > >> > at
>> > >> > > query time then the default which uses MultiDocValue ordinal
map.
>> > >> > >
>> > >> > > The docs cover the top_fc hint.
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > Joel Bernstein
>> > >> > > http://joelsolr.blogspot.com/
>> > >> > >
>> > >> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
>> > >> > > abenedetti@apache.org> wrote:
>> > >> > >
>> > >> > > > Let's add some additional details guys :
>> > >> > > >
>> > >> > > > 1) *Faceting*
>> > >> > > > Currently the facet method used is "enum" and it runs
over 20
>> > fields
>> > >> > more
>> > >> > > > or less.
>> > >> > > > Mainly using it on low cardinality fields except one
which has
>> a
>> > >> > > > cardinality of 1000 terms.
>> > >> > > > I am aware of the famous Jira related faceting regression
:
>> > >> > > > https://issues.apache.org/jira/browse/SOLR-8096 .
>> > >> > > >
>> > >> > > > Our index is indeed quite static ( we index once per
day) and
>> the
>> > >> > fields
>> > >> > > we
>> > >> > > > facet on are multi-valued ( by schema definition but
not in
>> > >> practise) .
>> > >> > > > But we use Term Enum as method so i was not expecting
to hit
>> the
>> > >> > > > regression.
>> > >> > > > We currently see  query times which are 30% worse than
Solr
>> > 4.10.2 .
>> > >> > > > Our next experiment will be to enable docValues for
all the
>> fields
>> > >> and
>> > >> > > > verify if we get any benefit ( switching the facet method
to
>> fc) .
>> > >> > > > At the moment, switching to json faceting is not an
option as
>> we
>> > >> would
>> > >> > > like
>> > >> > > > first to proceed with a transparent migration and then
possibly
>> > add
>> > >> > > > improvements and refactor in the future.
>> > >> > > > Following will be to fix the schema to set as multi
valued only
>> > >> what is
>> > >> > > > really multi-valued ( do you know if this can affect
? the
>> wrong
>> > >> schema
>> > >> > > > definition is enough to mess up the facet performance
? even if
>> > then
>> > >> > the
>> > >> > > > fields are single valued ?)
>> > >> > > >
>> > >> > > >
>> > >> > > > 2) *Field Collapsing*
>> > >> > > > Field collapsing performance seems much, much worse,
something
>> > like
>> > >> 200
>> > >> > > ms
>> > >> > > > ( Solr 4) vs 1800 ms ( Solr 6) .
>> > >> > > > This is suprising as I never heard about any regression
in
>> field
>> > >> > > > collapsing.
>> > >> > > > I will investigate a little bit more in details about
the
>> > internals
>> > >> of
>> > >> > > the
>> > >> > > > field collapsing and why the performance could be so
degraded.
>> > >> > > > I will also verify if I find any info in the mailing
list or
>> Jira.
>> > >> > > >
>> > >> > > > &fq={!collapse field=string_field sort='TrieDoubleField
asc'}
>> > >> > > >
>> > >> > > > let me know if you faced something similar
>> > >> > > >
>> > >> > > > Cheers
>> > >> > > >
>> > >> > > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti
<
>> > >> > > > abenedetti@apache.org> wrote:
>> > >> > > >
>> > >> > > > > I'm planning a migration from 4.10.2 to 6.0 .
>> > >> > > > > Because we generate the index on daily basis from
scratch, we
>> > >> don't
>> > >> > > need
>> > >> > > > > to migrate the index but actually only migrate
the server
>> > >> instances.
>> > >> > > > > With my team we were doing some experiments on
some dev
>> > machines,
>> > >> > > > > basically comparing Solr 4.10.2 and Solr 6.0 to
check any
>> > >> functional
>> > >> > > and
>> > >> > > > > performance regression in our use cases.
>> > >> > > > >
>> > >> > > > > After setting up two installation on the same machine
(
>> > switching
>> > >> on
>> > >> > > and
>> > >> > > > > off each version for doing comparison and experiments)
we are
>> > >> > > verifying a
>> > >> > > > > degradation of the performances with Solr 6.
>> > >> > > > >
>> > >> > > > > Basically from a queryTime and throughput perspective
Solr 6
>> is
>> > >> not
>> > >> > > > > performing as well as Solr 4.10.2 .
>> > >> > > > > Still need to start the proper investigations but
this
>> appears
>> > >> weird
>> > >> > to
>> > >> > > > me.
>> > >> > > > > Will proceed with all the analysis of the case
and a deep
>> study
>> > of
>> > >> > our
>> > >> > > > > queries ( which anyway are mainly fq , faceting
and
>> grouping).
>> > >> > > > >
>> > >> > > > > Any suggestion in particular to start with ? Has
anyone
>> > >> experienced a
>> > >> > > > > similar migration with similar experience ?
>> > >> > > > > I will anyway explore also the mailing list in
search for
>> > similar
>> > >> > > cases.
>> > >> > > > >
>> > >> > > > > Cheers
>> > >> > > > >
>> > >> > > > > --
>> > >> > > > > --------------------------
>> > >> > > > >
>> > >> > > > > Benedetti Alessandro
>> > >> > > > > Visiting card : http://about.me/alessandro_benedetti
>> > >> > > > >
>> > >> > > > > "Tyger, tyger burning bright
>> > >> > > > > In the forests of the night,
>> > >> > > > > What immortal hand or eye
>> > >> > > > > Could frame thy fearful symmetry?"
>> > >> > > > >
>> > >> > > > > William Blake - Songs of Experience -1794 England
>> > >> > > > >
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > --
>> > >> > > > --------------------------
>> > >> > > >
>> > >> > > > Benedetti Alessandro
>> > >> > > > Visiting card : http://about.me/alessandro_benedetti
>> > >> > > >
>> > >> > > > "Tyger, tyger burning bright
>> > >> > > > In the forests of the night,
>> > >> > > > What immortal hand or eye
>> > >> > > > Could frame thy fearful symmetry?"
>> > >> > > >
>> > >> > > > William Blake - Songs of Experience -1794 England
>> > >> > > >
>> > >> > >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > --------------------------
>> > >> >
>> > >> > Benedetti Alessandro
>> > >> > Visiting card : http://about.me/alessandro_benedetti
>> > >> >
>> > >> > "Tyger, tyger burning bright
>> > >> > In the forests of the night,
>> > >> > What immortal hand or eye
>> > >> > Could frame thy fearful symmetry?"
>> > >> >
>> > >> > William Blake - Songs of Experience -1794 England
>> > >> >
>> > >>
>> > >
>> >
>> >
>> > --
>> > --------------------------
>> >
>> > Benedetti Alessandro
>> > Visiting card - http://about.me/alessandro_benedetti
>> > Blog - http://alexbenedetti.blogspot.co.uk
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>> <mkhludnev@griddynamics.com>
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message