lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <>
Subject Re: Link map over results? or term freq
Date Thu, 16 Oct 2008 21:17:28 GMT
I think I understand what you're describing as a "link map" to be a  
"tag cloud" where each tag is a "frequent" or "strong" term.

We did something like this as an experiment (without Lucene):

If you're talking about something similar, then I think you can use  
Lucene's TFVs only to get at the frequency data in the context of the  
Documents (not the results).  I'm no expert, but I say this because  
I've only ever seen TermFrequencyVectors being discussed in the  
context of an IndexReader, not in the context of Hits or TopDocs. 

The other thing, though, is that TF may not be sufficient to  
determine what to use for each tag/link.  For example, given a set of  
Results, R, would you like to use:
1.  the top N most frequent terms for each Document in R?
2.  the top M most frequent terms that are common to all/many  
Documents in R?
3.  the top O most frequent terms that are common in results built  
using the highlighter?

To a certain extent, this is a clustering problem:-- given some set  
of Documents, R, which just happen to be the results of some search,  
represent R using a tag cloud/link map of terms that best represent R.

Have you looked at carrot2?  I haven't seen the tag cloud  
visualization there, but you may find some ideas for clustering/ 
document-set representation there:

Good luck!


On 16-Oct-2008, at 3:21 PM, Darren Govoni wrote:

> I guess a link map (as I understand it) is a collection of  
> hyperlinks of
> words/phrases where the dominant ones are bolder color and larger  
> font.
> Its relatively new schema, some sites are using.
> For example, someone searches for a person and a link map would show
> them all the most frequent terms in the results they got back. Sort of
> like latent relationships.
> Does that help?
> I thought this could be done using term frequency vectors in  
> Lucene, but
> I've never used TFV's before. And can then be limited to just a set of
> results.
> HTH,
> Darren
> On Thu, 2008-10-16 at 14:09 -0400, Glen Newton wrote:
>> Sorry, could you explain what you mean by a "link map over lucene  
>> results"?
>> thanks,
>> -glen
>> 2008/10/16 Darren Govoni <>:
>>> Hi,
>>>  Has anyone created a link map over lucene results or know of a link
>>> describing the process? If not, I would like to build one to  
>>> contribute.
>>> Also, I read about term frequencies in the book, but wanted to  
>>> know if I
>>> can extract the strongest occurring terms from a given result set or
>>> result?
>>> thank you for any help. I will keep reading/looking.
>>> Darren
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message