nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Ciborowski (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NUTCH-1517) CloudSearch indexer
Date Wed, 11 Sep 2013 16:21:52 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763015#comment-13763015
] 

Daniel Ciborowski edited comment on NUTCH-1517 at 9/11/13 4:20 PM:
-------------------------------------------------------------------

git clone https://github.com/apache/nutch
wget https://issues.apache.org/jira/secure/attachment/12601469/0023883254_1377197869_indexer-cloudsearch.patch
cd nutch/
git checkout -t origin/branch-1.7
patch -p0 -i ~/0023883254_1377197869_indexer-cloudsearch.patch 
vi conf/nutch-site.xml
ant
cd runtime/local/
mkdir -p urls
echo "http://www.princeton.edu/" > ./urls/seeds.txt 
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
bin/nutch index crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

the vi step is where I add my crawler name, change solr to cloudsearch and add my endpoint
url. Tried to do this with sed to replace lines but couldn't figure it out. 

Edits based on feedback.


                
      was (Author: djc391):
    git clone https://github.com/apache/nutch
wget https://issues.apache.org/jira/secure/attachment/12601469/0023883254_1377197869_indexer-cloudsearch.patch
cd nutch/
git checkout -t origin/branch-1.7
patch -p0 -i ~/0023883254_1377197869_indexer-cloudsearch.patch 
vi conf/nutch-site.xml
ant
cd runtime/local/
mkdir -p urls
echo "http://www.princeton.edu/" > ./urls/seeds.txt 
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

the vi step is where I add my crawler name, change solr to cloudsearch and add my endpoint
url. Tried to do this with sed to replace lines but couldn't figure it out. 
                  
> CloudSearch indexer
> -------------------
>
>                 Key: NUTCH-1517
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1517
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Julien Nioche
>             Fix For: 1.9
>
>         Attachments: 0023883254_1377197869_indexer-cloudsearch.patch
>
>
> Once we have made the indexers pluggable, we should add a plugin for Amazon CloudSearch.
See http://aws.amazon.com/cloudsearch/. Apparently it uses a JSON based representation Search
Data Format (SDF), which we could reuse for a file based indexer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message