lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Estrada <estrada.a...@gmail.com>
Subject Re: [Nutch] and Solr integration
Date Mon, 03 Jan 2011 17:27:47 GMT
BLEH! <facepalm> This is entirely possible to do in a single step AS LONG AS
YOU GET THE SYNTAX CORRECT ;-)

http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/

<http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/>bin/nutch
crawl urls -dir crawl -threads 10 -depth 100 -topN 50* -solr*
http://localhost:8983/solr

<http://localhost:8983/solr>The correct param is -solr NOT -solrindex.

Cheers,
Adam

On Mon, Jan 3, 2011 at 11:45 AM, Adam Estrada <estrada.adam@gmail.com>wrote:

> All,
>
> I realize that the documentation says that you crawl first then add to Solr
> but I spent several hours running the same command through Cygwin with
> -solrindex http://localhost:8983/solr on the command line (eg. bin/nutch
> crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
> http://localhost:8983/solr) and it worked. Does anyone know why it's not
> working for me anymore? I am using the Lucid build of Solr which was what i
> was using before. I neglected to write down the command line syntax which is
> biting me in the arse. Any tips on this one would be great!
>
> Thanks,
> Adam
>
> On Mon, Dec 20, 2010 at 4:21 PM, Anurag <anurag.it.jolly@gmail.com> wrote:
>
>>
>> why are using solrindex in the argument.? It is used when we need to index
>> the crawled data in Solr
>> For more read http://wiki.apache.org/nutch/NutchTutorial .
>>
>> Also for nutch-solr integration this is very useful blog
>> http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
>> I integrated nutch and solr and it works well.
>>
>> Thanks
>>
>> On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
>> ml-node+2122347-622655030-146354@n3.nabble.com<ml-node%2B2122347-622655030-146354@n3.nabble.com>
>> <ml-node%2B2122347-622655030-146354@n3.nabble.com<ml-node%252B2122347-622655030-146354@n3.nabble.com>
>> >
>> > wrote:
>>
>> > All,
>> >
>> > I have a couple websites that I need to crawl and the following command
>> > line
>> > used to work I think. Solr is up and running and everything is fine
>> there
>> > and I can go through and index the site but I really need the results
>> added
>> >
>> > to Solr after the crawl. Does anyone have any idea on how to make that
>> > happen or what I'm doing wrong?  These errors are being thrown fro
>> Hadoop
>> > which I am not using at all.
>> >
>> > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
>> > -solrindex
>> > ht
>> > tp://localhost:8983/solr
>> > crawl started in: crawl
>> > rootUrlDir = http://localhost:8983/solr
>> > threads = 10
>> > depth = 100
>> > indexer=lucene
>> > topN = 50
>> > Injector: starting at 2010-12-20 15:23:25
>> > Injector: crawlDb: crawl/crawldb
>> > Injector: urlDir: http://localhost:8983/solr
>> > Injector: Converting injected urls to crawl db entries.
>> > Exception in thread "main" java.io.IOException: No FileSystem for
>> scheme:
>> > http
>> >         at
>> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
>> > )
>> >         at
>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>> >         at
>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>> >         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>> >         at
>> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
>> > ava:169)
>> >         at
>> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
>> > va:201)
>> >         at
>> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>> >
>> >         at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
>> > 81)
>> >         at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>> >
>> >         at
>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
>> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
>> >
>> >
>> > ------------------------------
>> >  View message @
>> >
>> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
>> > To start a new topic under Solr - User, email
>> > ml-node+472068-1941297125-146354@n3.nabble.com<ml-node%2B472068-1941297125-146354@n3.nabble.com>
>> <ml-node%2B472068-1941297125-146354@n3.nabble.com<ml-node%252B472068-1941297125-146354@n3.nabble.com>
>> >
>> > To unsubscribe from Solr - User, click here<
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=
>> >.
>> >
>> >
>>
>>
>>
>> --
>> Kumar Anurag
>>
>>
>> -----
>> Kumar Anurag
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message