lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teague James" <teag...@insystechinc.com>
Subject Indexing URLs for Binaries
Date Fri, 03 Jan 2014 18:29:34 GMT
I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
binary files, such as Word, PDF, etc. The crawler crawls the site but I am
not getting the URLs of the links for the binary files no matter how deep I
set the settings for the site. I see the labels for the links in the
content, but not the URLs. Any ideas on how I could get those URLs back in
my crawl?


Mime
View raw message