nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Updated: (NUTCH-732) Subcollection plugin not working on Nutch-1.0
Date Tue, 27 Apr 2010 18:04:35 GMT


Andrzej Bialecki  updated NUTCH-732:

    Attachment: sub.patch

Turns out this was due to a way the list of applicable collections is created, and how that
field is added to the indexing backend. First, it appends a leading space, creating collection
names like ' nutch' instead of 'nutch'. Then, instead of tokenizing this field it passes it
as is, so the leading space is kept and prevents you from running a query.

I changed the collection name appending logic, and turned the field into tokenized.

I'll commit the patch shortly.

> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>                 Key: NUTCH-732
>                 URL:
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.0.0
>         Environment: Mac OS X 10.5 intel
>            Reporter: Filipe Antunes
>            Priority: Critical
>         Attachments: sub.patch
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection,
on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated,
yet i can't search for using the  subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message