lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "G, Rajesh" ...@cebglobal.com>
Subject RE: Facet ignoring repeated word
Date Tue, 10 May 2016 12:51:43 GMT
Thanks Toke. The issue I have is I cannot look for a specific word e.g. ddr in termfreq(%27name%27,%20%27ddr%27).
I have to find count of all words and their sum



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th
Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the addressee(s) and may
contain confidential and legally privileged information belonging to CEB and/or its subsidiaries,
including SHL. If you have received this e-mail in error, please notify the sender and immediately,
destroy all copies of this email and its attachments. The publication, copying, in whole or
in part, or use or dissemination in any other way of this e-mail and attachments by anyone
other than the intended person(s) is prohibited.

-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
Sent: Tuesday, May 10, 2016 1:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Facet ignoring repeated word

On Fri, 2016-04-29 at 08:55 +0000, G, Rajesh wrote:
> I am trying to implement word cloud<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_imgres-3Fimgurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fsites-252Fdefault-252Ffiles-252Fother-252Fsotu-5Fwordle.png-26imgrefurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fblog-252F2011-252F01-252F26-252Fstate-2Dunion-2Dword-2Dcloud-2Djobs-2Damerica-2Dpeople-2Dnew-26docid-3DeZ-5FHvQpd9FRBKM-26tbnid-3DqyIc-2Delv6z-2D0iM-253A-26w-3D895-26h-3D406-26bih-3D643-26biw-3D1366-26ved-3D0ahUKEwie-5F8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA-26iact-3Dmrc-26uact-3D8&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=fEZWmciBUrd2RCDeqkQcv4wZx4tZlQIt_u01gB6D0VU&e=
>  using Solr.  The problem I have is Solr facet query ignores repeated words in a document
eg.

Use a combination of faceting and stats:

1) Resolve candidate words with faceting, just as you have already done.

2) Create a stats-request with the same q as you used for faceting, with a termfreq-function
for each term in your facet result.


Working example from the techproducts-demo that comes with Solr:

https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_techproducts_select&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=UWysIbdd4V1fnKkuLiek_J_zQ66MM2YNLLVI7f--ICI&e=
?q=name%3Addr%0A
&fl=name&wt=json&indent=true
&stats=true
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27)
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%271GB%27)

where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB'
are two terms ('absorbed', 'am', 'believe' etc. in your setup).


The result will be something like

"response": {
    "numFound": 3,
...
"stats": {
    "stats_fields": {
      "termfreq('name', 'ddr')": {
        "sum": 6
      },
      "termfreq('name', '1GB')": {
        "sum": 3
      }
    }
  }


- Toke Eskildsen, State and University Library, Denmark


Mime
View raw message