lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <li...@holsman.net>
Subject Re: Facets with an IDF concept
Date Tue, 23 Jun 2009 12:57:42 GMT
Asif Rahman wrote:
> Hi Grant,
>
> I'll give a real life example of the problem that we are trying to solve.
>
> We index a large number of current news articles on a continuing basis.  We
> tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
> then use these tags to facet our queries.  For example, we might issue a
> query for all articles in the last 24 hours.  The facets would then tell us
> which news topics have been written about the most in that period.  The
> problem is that "Barack Obama", for example, is always written about in high
> frequency, as opposed to "Iran" which is currently very hot in the news, but
> which has not always been the case.  In this case, we'd like to see "Iran"
> show up higher than "Barack Obama" in the facet results.
>
>   

your not looking for a IDF based function.
you need to figure out what a 'normal' amount of news flow for a given 
topic is and then determine when an abnormal amount is happening.
note.. that an abnormal amount is positive or negative.
we use a similar method to this on http://love.com, so we know for 
example something is going on with Ed McMahon as I type.

I wouldn't be looking at using SOLR to do this kind of thing btw. try 
something like esper. I think it might hold some promise to this kind of 
thing (esper is a open source stream database).

Regards

> To me, this seems identical to the tf-idf scoring expression that is used in
> normal search.  The facet count is analogous to the tf and I can access the
> facet term idf's through the Similarity API.
>
> Is my reasoning sound?  Can you provide any guidance as to the best way to
> implement this?
>
> Thanks for your help,
>
> Asif
>
>
> On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll <gsingers@apache.org>wrote:
>
>   
>> On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:
>>
>>  Hi again,
>>     
>>> I guess nobody has used facets in the way I described below before.  Do
>>> any
>>> of the experts have any ideas as to how to do this efficiently and
>>> correctly?  Any thoughts would be greatly appreciated.
>>>
>>> Thanks,
>>>
>>> Asif
>>>
>>> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman <asif@newscred.com> wrote:
>>>
>>>  Hi all,
>>>       
>>>> We have an index of news articles that are tagged with news topics.
>>>> Currently, we use solr facets to see which topics are popular for a given
>>>> query or time period.  I'd like to apply the concept of IDF to the facet
>>>> counts so as to penalize the topics that occur broadly through our index.
>>>> I've begun to write custom facet component that applies the IDF to the
>>>> facet
>>>> counts, but I also wanted to check if anyone has experience using facets
>>>> in
>>>> this way.
>>>>
>>>>         
>> I'm not sure I'm following.  Would you be faceting on one field, but using
>> the DF from some other field?  Faceting is already a count of all the
>> documents that contain the term on a given field for that search.  If I'm
>> understanding, you would still do the typical faceting, but then rerank by
>> the global DF values, right?
>>
>> Backing up, what is the problem you are seeing that you are trying to
>> solve?
>>
>> I think you could do this, but you'd have to hook it in yourself.  By
>> penalize, do you mean remove, or just have them in the sort?  Generally
>> speaking, looking up the DF value can be expensive, especially if you do a
>> lot of skipping around.  I don't know how pluggable the sort capabilities
>> are for faceting, but that might be the place to start if you are just
>> looking at the sorting options.
>>
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>     
>
>
>   


Mime
View raw message