lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chantal Ackermann <c.ackerm...@it-agenten.com>
Subject Re: Nested JSON Facets (Subfacets)
Date Thu, 15 Dec 2016 15:00:28 GMT
Hi Yonik,

are you certain that nesting a function works as documented on http://yonik.com/solr-subfacets/?

top_authors:{ 
        type: terms,
        field: author,
        limit: 7,
        sort: "revenue desc",
        facet:{
          revenue: "sum(sales)"
        }
      }


I’m getting the feeling that the function is never really executed because, for my index,
calling sum() with a non-number field (e.g. a multi-valued string field) throws an error when
*not nested* but does *not throw an error* when nested:

    json.facet={all_pop: "sum(gtin)“}

    "error":{
        "trace":“java.lang.UnsupportedOperationException
	    at org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47)

And the following does not throw an error but definitely should if the function would be executed:

    json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(gtin)"}}}

returns:

"facets":{
    "count":2815500,
    "all_pop":1.4065865823321116E8,
    "shop_cat":{
      "buckets":[{
          "val":"Kontaktlinsen > Torische Linsen",
          "count":75168,
          "cat_pop":0.0},
        {
          "val":"Damen-Mode/Inspirationen",
          "count":47053,
          "cat_pop":0.0},

For completeness: here is the field directive for „gtin“ with „text_noleadzero“ based
on „solr.TextField“:

    <field name="gtin" type="text_noleadzero" indexed="true" stored="true" required="false"
multiValued="true“/>


Is this a bug or is the documentation a glimpse of the future? I will try version 6.3.0, now.
I was still on 6.1.0 for the above tests.
(I have also tried with the „avg“ function, just to make sure, but same there.)

Cheers,
Chantal


> Am 15.12.2016 um 15:17 schrieb Chantal Ackermann <c.ackermann@it-agenten.com>:
> 
> Hi Yonik,
> 
> 
> here is an update on what I’ve tried so far, unfortunately without any more luck.
> 
> The field directive is (should have included this when asking the question):
> 
>   <field name="popularity" type="float" indexed="true" stored="false" required="false"
multiValued="false" docValues="true“/>
> 
> I have also re-indexed (removed data/ and indexed from scratch). The popularity field
is populated with random values (as I don’t have the real values from production) meaning
that all documents have values > 0.
> 
> Here the statistics output:
> 
> "stats":{
>    "stats_fields":{
>      "popularity":{
>        "min":7.952374289743602E-5,
>        "max":99.99993896484375,
>        "count":1687500,
>        "missing":0,
>        "sum":8.436878611434968E7,
>        "sumOfSquares":5.626142812197906E9,
>        "mean":49.9963176973924,
>        "stddev":28.885623872869992},
> 
> And this is the relevant facet output from calling
> 
> /solr/<core>/query?
> json.facet={
> num_pop:{query: "popularity[* TO  *]“},
> all_pop: "sum(popularity)“,
> shop_cat: {type:terms, field:shop_cat, facet: {cat_pop: "sum(popularity)"}}}&q=*:*&rows=1&stats.field=popularity&wt=json
> 
> "facets":{
>    "count":1687500,
>    "all_pop":1.5893775613258794E8,
>    "num_pop":{
>      "count":1687500},
>    "shop_cat":{
>      "buckets":[{
>          "val":"Kontaktlinsen > Torische Linsen",
>          "count":75168,
>          "cat_pop":0.0},
>        {
>          "val":"Neu",
>          "count":31772,
>          "cat_pop":0.0},
>        {
>          "val":"Gesundheit & Schönheit > Gesundheitspflege",
>          "count":20281,
>          "cat_pop":0.0},
> [… more facets omitted]
> 
> 
> The /query handler is an edismax configuration, though I don’t think this matters as
long as the results include documents with popularity > 0 which is the case as seen in
the facet output (and sum() works in general for all of the documents just not for the buckets
as seen in „all_pop").
> 
> I will try to explicitly turn off the docValues and add stored=„true“ just to try
out more. If someone has any other suggestions that I should try - I would be glad to here
them. If it is not possible to retrieve the sum in this way I would have to fetch each sum
separately which would be a huge performance impact.
> 
> Thanks!
> Chantal
> 
> 
> 
> 
> 
>> Am 15.12.2016 um 10:16 schrieb CA <ca@it-agenten.com>:
>> 
>>> num_pop:{query:"popularity:[* TO *]"}
> 


Mime
View raw message