lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Do not match on high frequency terms
Date Sat, 01 Aug 2015 10:34:33 GMT
It seems like you need to develop custom query or query parser. Regarding
SolrJ: you can try to call http://wiki.apache.org/solr/TermsComponent
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component I'm
not sure how exactly call TermsComponent in SolrJ, I just found
https://lucene.apache.org/solr/5_2_1/solr-solrj/org/apache/solr/client/solrj/response/TermsResponse.html
to read its' response.

On Fri, Jul 31, 2015 at 11:31 PM, Swedish, Steve <Steve.Swedish@noblis.org>
wrote:

> Hello,
>
> I'm hoping someone might be able to help me out with this as I do not have
> very much solr experience. Basically, I am wondering if it is possible to
> not match on terms that have a document frequency above a certain
> threshold. For my situation, a stop word list will be unrealistic to
> maintain, so I was wondering if there may be an alternative solution using
> term document frequency to identify common terms.
>
> What would actually be ideal is if I could somehow use the
> CommonTermsQuery. The problem I ran across when looking at this option was
> that the CommonTermsQuery seems to only work for queries on one field at a
> time (unless I'm mistaken). However, I have a query of the structure
> q=(field1:(blah) AND (field2:(blah) OR field3:(blah))) OR field1:(blah) OR
> (field2:(blah) AND field3:(blah)). If there are any ideas on how to use the
> CommonTermsQuery with this query structure, that would be great.
>
> If it's possible to extract the document frequency for terms in my query
> before the query is run, allowing me to remove the high frequency terms
> from the query first, that could also be a valid solution. I'm using solrj
> as well, so a solution that works with solrj would be appreciated.
>
> Thanks,
> Steve
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message