lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6349) LocalParams for enabling/disabling individual stats
Date Wed, 18 Feb 2015 00:40:12 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-6349:
---------------------------
    Attachment: SOLR-6349.patch


Starting to get back into this, here's a quick checkpoint of some small progress


Step #1: This new patch brings Xu's latest patch up to date with trunk using the minimal changes
that seemed to work -- in particular: I haven't started really digging into the code changes
other then getting things to compile & tests to pass.

Step #2...

My main focus for now is making sure the tests are rock solid & all inclusive so we can
then iterate on the code changes (see early comments about my cocerns with spreading hte logic
arround).  Only 2 noticable changes in this patch...

* Fixed FacetPivotSmallTest.testPivotFacetStatsUnsortedTagged
** was prematurely specifying 'mean=true' but then trying to assert that all stats were returned
** beefed this up to also assert that it got an expected number of stats - if we add more
stats in the future, this will be a canary that the test needs updated to assert the correct
values for these new stats.

* StatsComponentTest
** added more asserts to the 3 testFieldStatisticsResults_TYPE_FieldAlwaysMissing to ensure
expected values for all stats (when there is nothing to compute stats on)...{noformat}
// numerics & strings & dates
min=null
max=null
// just numerics
sum=0.0
sumOfSquares=0.0
stddev=0.0
mean=NaN
{noformat}
*** these are based on the current behavior of the code ... my initial gut reaction was that
they should all be null, but a quick bit of research says that in maths the "empty sum" is
defined as "0" -- if you start with that premise, then the values for the rest seems correct
to me, but i'm definitely interested in knowing if there are contrary opinions (is NaN better?)
** included "expected number of stats" asserts in these tests as well - more canary's if/when
future stats are added.


> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
>                 Key: SOLR-6349
>                 URL: https://issues.apache.org/jira/browse/SOLR-6349
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Hoss Man
>         Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch,
SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch,
SOLR-6349___bad_idea_broken.patch
>
>
> Stats component currently computes all stats (except for one) every time because they
are relatively cheap, and in some cases dependent on eachother for distrib computation --
but if we start layering stats on other things it becomes unnecessarily expensive to compute
all the stats when they just want the "sum" (and it will definitely become excessively verbose
in the responses).  
> The plan here is to use local params to make this configurable.  All of the existing
stat options could be modeled as a simple boolean param, but future params (like percentiles)
might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message