lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <abenede...@apache.org>
Subject Re: [Solr 6] Migration from Solr 4.10.2
Date Tue, 24 May 2016 21:25:39 GMT
Mikhail, you have been really helpful!

On Tue, May 24, 2016 at 9:38 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Alessandro,
>
> I checked with Solr 6.0 distro on techproducts.
> Faceting on cat with uif hits fieldValueCache
>
> http://localhost:8983/solr/techproducts/select?facet.field=cat&facet.method=uif&facet=on&indent=on&q=*:*&wt=json
>
> fieldValueCache
> - class:org.apache.solr.search.FastLRUCache
> - description:Concurrent LRU Cache(maxSize=10000, initialSize=10,
> minSize=9000, acceptableSize=9500, cleanupThread=false)
> - src:
> - version:1.0 stats:
>
>    - cumulative_evictions:0
>    - cumulative_hitratio:0.5
>    - cumulative_hits:1
>    - cumulative_inserts:2
>    - cumulative_lookups:2
>    - evictions:0
>    - hitratio:0.5
>    - hits:1
>    - inserts:2
>    - item_cat:
>
>  {field=cat,memSize=4665,tindexSize=46,time=28,phase1=27,nTerms=16,bigTerms=2,termInstances=21,uses=0}
>    - lookups:2
>    - size:1
>
> Beware, for example field manu_exact doesn't hit field value cache, because
> it single valued and goes to FacetFieldProcessorDV instead of
> FacetFieldProcessorUIF.  And cat is multivalued and hits UIF.


It does completely make sense !
I think the query I was debugging today was containing only single valued
fields.
On the other hand the Solr 4.10.2 version I was testing was with a schema
with the same fields but set multi-valued.

It seems to me that proceeding with UIF seems the most reasonable approach
in my case, as it will automatically redirect to the proper method
depending on multi-value/single value.
Today I was mainly testing with FCS ( but I optimised the index in my
experiments so basically FCS =FC ).
Tomorrow I will try on a fresh index not optimised.
I have 3 additional questions:

1) Let's assume we set DocValues for the fields involved .
If some field is misconfigured, set multivalued in the schema but actually
single valued, according to the code we are going to hit UIF. This is going
to cause un-necessary usage of the FieldValueCache and slowness in
comparison with the DV approach that was the correct algorithm to apply ?

2) thanks to the facet.thread I got a huge benefit on a single query with
FC. Am I expecting to see even more benefit if I have a segmented index ? (
today I was playing with an optimised one).

3) In my experiments today, in Solr 4.10.2 I was getting better results
with the enum approach ( the overall cardinality of the fields involved was
pretty low). Using the enum approach in Solr 6 with no-DocValues was worst
in comparison to Solr 4 ( we know that with the legacy facet approach, if
you set docValues and the field is multi-valued we redirect always to DV).
This bit seems a little bit unrelated the well known bug, as according to
my knowledge the enum approach should make a massive usage of the
filterCache, but the fieldValueCache should not be involved.
Do you know why the termEnum approach has been involved in the regression
in the recents Solr ?

Thank you very much again!

see
> org.apache.solr.search.facet.FacetField.createFacetProcessor(FacetContext)
> it might need to just debug there.
>
> In summary, uif works and you have a chance to hit it. Goof Luck!
>
> On Tue, May 24, 2016 at 7:43 PM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
>
> > Update , it seems clear I incurred in the bad
> > https://issues.apache.org/jira/browse/SOLR-8096 :
> >
> > Just adding some additional information as I just incurred on the issue
> > with Solr 6.0 :
> > Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with
> high
> > cardinality on top of grouping.
> > Groping was not affecting at all.
> >
> > All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 around
> > 550 ms .
> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> > 6.0.
> > In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> > cumulative_hitratio of 0.96 .
> > Switching from enum to fc to fcs to uif did not change that much.
> >
> > Moving to DocValues didn't improve that much the situation ( but I was on
> > an optimized index, so I need to try the multi-segmented one according
> > to Mikhail
> > Khludnev
> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mkhludnev>
> > contribution
> > in Solr 5.4.0 ) .
> >
> > Moving to field collapsing moved down the query to 110-120 ms ( but this
> is
> > normal, we were faceting on 260 /1 million orignal docs)
> > Adding facet.threads=NCores moved down the queryTime to 100 ms, in
> > combination with field collapsing we reached 80-90 ms when warmed.
> >
> > What are the plan for the future related this ?
> > Do we want to deprecate the legacy facets implementation and move
> > everything to Json facets ( like it happened with the UIF ) ?
> > So backward compatible but different implementation ?
> >
> > I think for migrations should be a transparent process.
> >
> >
> > Cheers
> >
> > On Mon, May 23, 2016 at 6:49 PM, Alessandro Benedetti <
> > benedetti.alex85@gmail.com> wrote:
> >
> > > Furthermore I was checking the internals of the old facet
> implementation
> > (
> > > which comes when using the classic request parameter based,  instead of
> > the
> > > json facet). It seems that if you enable docValues even with the enun
> > > method passed as parameter , actually fc with docValues will be used.
> > > i will give some report on the performance we get with docValues.
> > >
> > > Cheers
> > > On 23 May 2016 16:29, "Joel Bernstein" <joelsolr@gmail.com> wrote:
> > >
> > >> If you can make min/max work for you instead of sort then it should be
> > >> faster, but I haven't spent time comparing the performance.
> > >>
> > >> But if you're using the top_fc with the min/max param the performance
> > >> between Solr 4 & Solr 6 should be very close as the data structures
> > behind
> > >> them are the same.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti <
> > >> abenedetti@apache.org
> > >> > wrote:
> > >>
> > >> > Hi Joel,
> > >> > thanks for the reply, actually we were not using field collapsing
> > >> before,
> > >> > we basically want to replace grouping with that.
> > >> > The grouping performance between Solr 4 and 6 are basically
> > comparable.
> > >> > It's surprising I got so big degradation with the field collapsing.
> > >> >
> > >> > So basically the comparison we did were based on the Solr4 queries
,
> > >> > extracted from logs, and modified slightly to include field
> collapsing
> > >> > parameter.
> > >> >
> > >> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically
> > >> proceeded
> > >> > in this way :
> > >> >
> > >> > 1) install Solr 4.10.2 and Solr 6.0.0
> > >> > 2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0
> ->
> > >> Solr
> > >> > 6.0 )
> > >> > 3) switch on/off the 2 instances and repeating the tests both with
> > cold
> > >> > instances and warm instances.
> > >> >
> > >> > This means that the query looks the same.
> > >> > I have not double checked the results but only the timings.
> > >> > I will provide additional feedback to see if the query are producing
> > >> > comparable results as well.
> > >> >
> > >> > Related your suggestion about the top_fc, thanks, I will try that
.
> > >> > I actually discovered that a little bit after I posted the mailing
> > list
> > >> ( I
> > >> > think exactly from another post of yours :) )
> > >> >
> > >> > Not sure if setting up docValues for the field we use to collapse
> > could
> > >> > give some benefit as well.
> > >> >
> > >> > I keep you updated,
> > >> >
> > >> > Cheers
> > >> >
> > >> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein <joelsolr@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Were you using the sort param or min/max param in Solr 4 to select
> > the
> > >> > > group head? The sort work came later and I'm not sure how it
> > compares
> > >> in
> > >> > > performance to the min/max param.
> > >> > >
> > >> > > Since you are collapsing on a string field you can use the top_fc
> > hint
> > >> > > which will use a top level field cache for the collapse. This
is
> > >> faster
> > >> > at
> > >> > > query time then the default which uses MultiDocValue ordinal
map.
> > >> > >
> > >> > > The docs cover the top_fc hint.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> > >> > >
> > >> > >
> > >> > >
> > >> > > Joel Bernstein
> > >> > > http://joelsolr.blogspot.com/
> > >> > >
> > >> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> > >> > > abenedetti@apache.org> wrote:
> > >> > >
> > >> > > > Let's add some additional details guys :
> > >> > > >
> > >> > > > 1) *Faceting*
> > >> > > > Currently the facet method used is "enum" and it runs over
20
> > fields
> > >> > more
> > >> > > > or less.
> > >> > > > Mainly using it on low cardinality fields except one which
has a
> > >> > > > cardinality of 1000 terms.
> > >> > > > I am aware of the famous Jira related faceting regression
:
> > >> > > > https://issues.apache.org/jira/browse/SOLR-8096 .
> > >> > > >
> > >> > > > Our index is indeed quite static ( we index once per day)
and
> the
> > >> > fields
> > >> > > we
> > >> > > > facet on are multi-valued ( by schema definition but not
in
> > >> practise) .
> > >> > > > But we use Term Enum as method so i was not expecting to
hit the
> > >> > > > regression.
> > >> > > > We currently see  query times which are 30% worse than Solr
> > 4.10.2 .
> > >> > > > Our next experiment will be to enable docValues for all
the
> fields
> > >> and
> > >> > > > verify if we get any benefit ( switching the facet method
to
> fc) .
> > >> > > > At the moment, switching to json faceting is not an option
as we
> > >> would
> > >> > > like
> > >> > > > first to proceed with a transparent migration and then possibly
> > add
> > >> > > > improvements and refactor in the future.
> > >> > > > Following will be to fix the schema to set as multi valued
only
> > >> what is
> > >> > > > really multi-valued ( do you know if this can affect ? the
wrong
> > >> schema
> > >> > > > definition is enough to mess up the facet performance ?
even if
> > then
> > >> > the
> > >> > > > fields are single valued ?)
> > >> > > >
> > >> > > >
> > >> > > > 2) *Field Collapsing*
> > >> > > > Field collapsing performance seems much, much worse, something
> > like
> > >> 200
> > >> > > ms
> > >> > > > ( Solr 4) vs 1800 ms ( Solr 6) .
> > >> > > > This is suprising as I never heard about any regression
in field
> > >> > > > collapsing.
> > >> > > > I will investigate a little bit more in details about the
> > internals
> > >> of
> > >> > > the
> > >> > > > field collapsing and why the performance could be so degraded.
> > >> > > > I will also verify if I find any info in the mailing list
or
> Jira.
> > >> > > >
> > >> > > > &fq={!collapse field=string_field sort='TrieDoubleField
asc'}
> > >> > > >
> > >> > > > let me know if you faced something similar
> > >> > > >
> > >> > > > Cheers
> > >> > > >
> > >> > > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
> > >> > > > abenedetti@apache.org> wrote:
> > >> > > >
> > >> > > > > I'm planning a migration from 4.10.2 to 6.0 .
> > >> > > > > Because we generate the index on daily basis from scratch,
we
> > >> don't
> > >> > > need
> > >> > > > > to migrate the index but actually only migrate the
server
> > >> instances.
> > >> > > > > With my team we were doing some experiments on some
dev
> > machines,
> > >> > > > > basically comparing Solr 4.10.2 and Solr 6.0 to check
any
> > >> functional
> > >> > > and
> > >> > > > > performance regression in our use cases.
> > >> > > > >
> > >> > > > > After setting up two installation on the same machine
(
> > switching
> > >> on
> > >> > > and
> > >> > > > > off each version for doing comparison and experiments)
we are
> > >> > > verifying a
> > >> > > > > degradation of the performances with Solr 6.
> > >> > > > >
> > >> > > > > Basically from a queryTime and throughput perspective
Solr 6
> is
> > >> not
> > >> > > > > performing as well as Solr 4.10.2 .
> > >> > > > > Still need to start the proper investigations but this
appears
> > >> weird
> > >> > to
> > >> > > > me.
> > >> > > > > Will proceed with all the analysis of the case and
a deep
> study
> > of
> > >> > our
> > >> > > > > queries ( which anyway are mainly fq , faceting and
grouping).
> > >> > > > >
> > >> > > > > Any suggestion in particular to start with ? Has anyone
> > >> experienced a
> > >> > > > > similar migration with similar experience ?
> > >> > > > > I will anyway explore also the mailing list in search
for
> > similar
> > >> > > cases.
> > >> > > > >
> > >> > > > > Cheers
> > >> > > > >
> > >> > > > > --
> > >> > > > > --------------------------
> > >> > > > >
> > >> > > > > Benedetti Alessandro
> > >> > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > >
> > >> > > > > "Tyger, tyger burning bright
> > >> > > > > In the forests of the night,
> > >> > > > > What immortal hand or eye
> > >> > > > > Could frame thy fearful symmetry?"
> > >> > > > >
> > >> > > > > William Blake - Songs of Experience -1794 England
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > --------------------------
> > >> > > >
> > >> > > > Benedetti Alessandro
> > >> > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > >
> > >> > > > "Tyger, tyger burning bright
> > >> > > > In the forests of the night,
> > >> > > > What immortal hand or eye
> > >> > > > Could frame thy fearful symmetry?"
> > >> > > >
> > >> > > > William Blake - Songs of Experience -1794 England
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > --------------------------
> > >> >
> > >> > Benedetti Alessandro
> > >> > Visiting card : http://about.me/alessandro_benedetti
> > >> >
> > >> > "Tyger, tyger burning bright
> > >> > In the forests of the night,
> > >> > What immortal hand or eye
> > >> > Could frame thy fearful symmetry?"
> > >> >
> > >> > William Blake - Songs of Experience -1794 England
> > >> >
> > >>
> > >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhludnev@griddynamics.com>
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message