lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Sebastien Vachon <jean-sebastien.vac...@wantedanalytics.com>
Subject Question regarding the lastest version of HeliosSearch
Date Thu, 15 May 2014 19:44:30 GMT
Hi All,

I spent some time today playing around with subfacets and facets functions now available in
helios search 0.05 and I have some concerns... They look very promising .

I indexed 10 000 documents and built some queries to look at each feature and found some weird
behaviour that I could not explain.

The first query I made was to find all documents having the word "java" in their title and
then compute a facet on the field position_id with stats about the field job_id. Basically,
I want the number of unique Job_ids for each position_id for all matching documents.

http://localhost:8983/solr/current/select?q=title:java&facet=on&facet.field=position_id&facet.stat=unique(job_id)&rows=1&facet.limit=10&facet.mincount=1&wt=json&indent=on&fl=job_id,position_id,super_alias_id

the response looks good except for one little thing... the mincount is not respected whenever
I specify the facet.stat parameter. Removing it will cause the mincount to be respected but
then I need this parameter.

Without the parameter the facet looks like this:
"facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "position_id":[
        "265151",5,
        "927284",1,
        "1662380",1,
        "2625553",1,
        "2862455",1,
        "4128904",1,
        "4253203",1]},  <=== accounted for all 11 documents

And now when adding the parameter:


"facets":{

    "position_id":{

      "stats":{

        "unique(job_id)":11, <== again, 11 documents, which is good

        "count":11},

      "buckets":[{

          "val":265151,

          "unique(job_id)":5,

          "count":5},

        {

          "val":927284,

          "unique(job_id)":1,

          "count":1},

        {

          "val":1662380,

          "unique(job_id)":1,

          "count":1},

        {

          "val":2625553,

          "unique(job_id)":1,

          "count":1},

        {

          "val":2862455,

          "unique(job_id)":1,

          "count":1},

        {

          "val":4128904,

          "unique(job_id)":1,

          "count":1},

        {

          "val":4253203,

          "unique(job_id)":1,

          "count":1},

        {

          "val":1133,

          "unique(job_id)":0, <== what is this?

          "count":0},
                .... Many zero entries following...

I was wondering where the extra entries were coming from... the position_id = 1133 above is
not even a match for my query (its title is "Audit Consultant")
I`ve also noticed a similar behaviour when using subfacets. It looks like the number of items
returned always match the "facet.limit" parameter.
If not enough values are present for a given entry then the bucket is filled with documents
not matching the original query.

Am I doing something wrong?

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message