nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1539) Implement the Hypertext Induced Topic Search (HITS) algorithm in Nutch
Date Wed, 06 Mar 2013 09:58:13 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594561#comment-13594561
] 

Markus Jelsma commented on NUTCH-1539:
--------------------------------------

Ah yes i got it, HITS is an online ranking algorithm. According to Bing Lui's Web Data Mining
the rootset is a result set from the search engine which is then fed to HITS to be reranked,
meaning it is likely impossible to integrate this algorithm directly in the search engine.
Is it possible to use this algorith to calculate hub and authority scores of the entire graph
without the rootset?
                
> Implement the Hypertext Induced Topic Search (HITS) algorithm in Nutch
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1539
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1539
>             Project: Nutch
>          Issue Type: Bug
>          Components: linkdb
>         Environment: CSCI 572: Search Engines and Information Retrieval @ USC, http://sunset.usc.edu/classes/cs572_2010/
> Nutch 1.1
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.7
>
>         Attachments: CS572CourseProjectReport_Yongqiang.pdf, csci572CourseProject_Yongqiang.rar,
NUTCH-1538.yongqiang.Mattmann.030413.patch.txt
>
>
> In my Summer 2010 CSCI 572: Search Engines and Information Retrieval class, my student
Yongqiang Li and I implemented the HITS algorithm in Nutch based on Jon Kleinberg's paper:
> Authoritative Sources in a Hyperlinked Environment
> http://dl.acm.org/citation.cfm?id=324140
> I'll put up the code we had shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message