lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Markus.Rietz...@rzf.fin-nrw.de>
Subject AW: Solr Cell on web-based files?
Date Tue, 27 Oct 2009 14:42:12 GMT
curl reads from local file or stdin, so you could do something like

if it only a single file from a webserver


curl http://someserver/file.html/ | curl "http://localhost:8983/solr/update/extract?extractOnly=true"
-F name=@-


but this way no crawling, no link following etc...


--
mit freundlichen Grüßen

Markus Rietzler - <rietzler_software/>
Rechenzentrum der Finanzverwaltung NRW
0211/4572-2130
 

> -----Ursprüngliche Nachricht-----
> Von: Insight 49, LLC [mailto:insight49@gmail.com] 
> Gesendet: Dienstag, 27. Oktober 2009 16:14
> An: solr-user@lucene.apache.org
> Betreff: Solr Cell on web-based files?
> 
> Hi,
> 
> If I use the ExtractingRequestHandler 
> <http://wiki.apache.org/solr/ExtractingRequestHandler> on a 
> local file 
> (as shown in 
> http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput), 
> all works well, but how do I do this for files located on a server?
> 
> e.g. (works)
> curl http://localhost:8983/solr/update/extract?extractOnly=true 
> --data-binary @mylocalfile.htm -H "Content-type:text/html"
> 
> e.g (doesn't work)
> curl http://localhost:8983/solr/update/extract?extractOnly=true 
> --data-binary @http://myweb.com/mylocalfile.htm -H 
> "Content-type:text/html"
> 
> Thanks,
> 
> Dan
> 
> 

Mime
View raw message