nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (NUTCH-442) Integrate Solr/Nutch
Date Thu, 01 Nov 2007 13:45:50 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539365
] 

Doğacan Güney commented on NUTCH-442:
-------------------------------------

> make NutchDocument implement VersionedWritable instead of writable, and delegate version
checking to superclass

I never quite understood how VersionedWritable is supposed to work. It only checks for a version
then throws an exception. If you want your class to behave differently you have to write custom
code anyway.

> refactor getDetails() methods in HitDetailer to Searcher (it is not likely that a class
would implement Searcher but not HitDetailer)
> use Searcher, delete HitDetailer and SearchBean
> Rename XXXBean classes so that they do not include "bean". (I think it is confusing to
have bean objects that have non-trivial functionality)

I agree with these three but I would like to get feedback from others before making this change.

> refactor LuceneSearchBean.VERSION to RPCSearchBean

We can't do this because there may be other RPCSearchBean-s besides LuceneSearchBean. So VERSION
has to be redefined in every class that implements RPCSearchBean.

> remove unrelated changes from the patch.(the changes in NGramProfile, HTMLLanguageParser,LanguageIdentifier,...
correct me if i'm wrong)

Correct... I will send an updated patch (that also includes Java 5 ExecutorService fixes)

> As far as i can see, we do not need any metadata for Solr backend, and only need Store,Index
and Vector options for lucene backend, so i think we can simplify NutchDocument#metadata.
We may implement : [...]

I really don't want to make NutchDocument depend on Lucene. So I would prefer that FieldMeta
doesn't depend on Lucene data structures. Because it is possible to implement a non-lucene
backend (say, you might want to index to a database)

> Integrate Solr/Nutch
> --------------------
>
>                 Key: NUTCH-442
>                 URL: https://issues.apache.org/jira/browse/NUTCH-442
>             Project: Nutch
>          Issue Type: New Feature
>         Environment: Ubuntu linux
>            Reporter: rubdabadub
>         Attachments: NUTCH_442_v3.patch, RFC_multiple_search_backends.patch, schema.xml
>
>
> Hi:
> After trying out Sami's patch regarding Solr/Nutch. Can be found here (http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html)
and I can confirm it worked :-) And that lead me to request the following :
> I would be very very great full if this could be included in nutch 0.9 as I am trying
to eliminate my python based crawler which post documents to solr. As I am in the corporate
enviornment I can't install trunk version in the production enviornment thus I am asking this
to be included in 0.9 release. I hope my wish would be granted.
> I look forward to get some feedback.
> Thank you.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message