lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olivier H. Beauchesne" <oliv...@olihb.com>
Subject Re: filtering facets
Date Mon, 31 Aug 2009 21:21:27 GMT
yeah, but then I would have to retrieve *a lot* of facets. I think for 
now i'll retrieve all the subdomains with facet.prefix and then merge 
those queries. Not ideal, but when I will have more motivation, I will 
submit a patch to solr :-)

Michael a écrit :
> You could post-process the response and remove urls that don't match your
> domain pattern.
>
> On Mon, Aug 31, 2009 at 9:45 AM, Olivier H. Beauchesne <olivier@olihb.com>wrote:
>
>   
>> Hi Mike,
>>
>> No, my problem is that the field article_outlinks is multivalued thus it
>> contains several urls not related to my search. I would like to facet only
>> urls matching my query.
>>
>> For exemple(only on one document, but my search targets over 1M docs):
>>
>> Doc1:
>> article_url:
>> url1.com/1
>> url2.com/2
>> url1.com/1
>> url1.com/3
>>
>> And my query is: article_url:url1.com* and I facet by article_url and I
>> want it to give me:
>> url1.com/1 (2)
>> url1.com/3 (1)
>>
>> But right now, because url2.com/2 is contained in a multivalued field with
>> the matching urls, I get this:
>> url1.com/1 (2)
>> url1.com/3 (1)
>> url2.com/2 (1)
>>
>> I can use facet.prefix to filter, but it's not very flexible if my url
>> contains a subdomain as facet.prefix doesn't support wildcards.
>>
>> Thank you,
>>
>> Olivier
>>
>> Mike Topper a écrit :
>>
>>  Hi Olivier,
>>     
>>> are the facet counts on the urls you dont want 0?
>>>
>>> if so you can use facet.mincount to only return results greater than 0.
>>>
>>> -Mike
>>>
>>> Olivier H. Beauchesne wrote:
>>>
>>>
>>>       
>>>> Hi,
>>>>
>>>> Long time lurker, first time poster.
>>>>
>>>> I have a multi-valued field, let's call it article_outlinks containing
>>>> all outgoing urls from a document. I want to get all matching urls
>>>> sorted by counts.
>>>>
>>>> For exemple, I want to get all outgoing wikipedia url in my documents
>>>> sorted by counts.
>>>>
>>>> So I execute a query like this:
>>>> q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks
>>>>
>>>> But I get facets containing the other urls in the documents. I can get
>>>> something close by using facet.prefix=http://en.wikipedia.org but I
>>>> want to include other subdomains on wikipedia (ex: fr.wikipedia.org).
>>>>
>>>> Is there a way to do a search and getting facets only matching my query?
>>>>
>>>> I know facet.prefix isn't a query, but is there a way to get that
>>>> behavior?
>>>>
>>>> Is it easy to extend solr to do something like that?
>>>>
>>>> Thank you,
>>>>
>>>> Olivier
>>>>
>>>> Sorry for my english.
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message