lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Using EmbeddedSolrServer with static documents
Date Sun, 03 Apr 2011 23:10:02 GMT
OK, you're still not quite on the right track. You can't just
index XML documents without transforming them into
valid Solr XML documents. Ditto for HTML.

Take a look at the ExtractingRequestHandler documentation at:
http://wiki.apache.org/solr/ExtractingRequestHandler

Here's some more documentation that might help.
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika

But at root, you have to extract the relevant info from the file in question
and
form your own valid Solr document and send *that* to Solr if you want to
do it by hand.

Or you can use the ExtractingRequestHandler to do it for you, but then you
need
to be aware that it'll do the best it can at putting meta-data information
into
the appropriate fields in your schema, but you don't have total control over
that.

Oh, and why are you using embedded Solr? The normal HTTP request process
is recommended, which you can connect to easily with SolrJ..

FWIW
Erick

On Sun, Apr 3, 2011 at 6:48 PM, michael.i <michael.isvy@gmail.com> wrote:

> Hi Erick,
> thanx for getting back to me.
>
> "Well, what is "a document on the filesystem"? Solr deals
> with well-formed XML documents of a specific format."
>
> I would like to index all kinds of documents. For a start I'll be happy to
> be able to work with xml and html documents.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message