lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Manis" <ma...@digital39.com>
Subject Re: Indexing HTML and other doc types
Date Wed, 04 Jul 2007 08:13:46 GMT
A coworker of mine posted the code that we used for adding pdf, doc, xls,
etc documents into solr.  You can find the files at the following location.

https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Just apply the patch and put the lib files in the lib directory, run `ant
compile`, yada yada and you should be good to go.  If the build fails update
to revision 552853, that is the latest revision I have compiled with the
patch so I know it works.  Usually if the build fails it is something
unrelated to Eric's code and will be fixed in a new few revisions.

.

Peter Manis

On 7/3/07, Teruhiko Kurosaka <Kuro@basistech.com> wrote:
>
> Solr looks very good for indexing and searching strcutured data.
> But I noticed there is no tool in the Solr distribution with which
> documents
> of other doc types can be indexed.  Are there other side projects that
> develop Solr clients for indexing documents of other doc types?
>
> Or is the generic full-text search really a wrong area to apply Solr, and
> should I be using something like Nutch?
> -kuro
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message