nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mladen Adamovic <>
Subject books (and articles) about search engine algorithms
Date Tue, 29 Aug 2006 15:26:43 GMT

I want to get more insight into various search engine algorithms. I have 
wide knowledge of standard data structures & algorithms (hashvalues, 
trees,  graphs, etc.). I thought that Lucene would be good place to 
start to seek for information and indeed I've found some decent 
information at Nutch website. However, I decided to post here some 
personal opinions regarding this issue thinking that someone might give 
me even more information.

As far as I understand I should read books about Informational Retrieval 
(i.e. Modern Information Retrieval by Balza-Yates, Ribero-Neto). Any update?

I also found using one article about link spam and citeseer wide 
articles about link spam techniques, namely:
1. Undue Influence: Eliminating the Impact of Link Plagiarism on Web 
Search Rankings
2. Using Rank Propagation and Probabilistic Counting for LinkBased Spam 
3. SpamRank   Fully Automatic Link Spam Detection
4. Identifying Link Farm Spam Pages
5. Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam

If you have some more opinions about valuable literature about search 
engine algorithms (primary books but also nice articles might work, let 
me know).

Thanks and keep on good work.

Mladen Adamovic

View raw message