lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Indexing HTML and other doc types
Date Wed, 04 Jul 2007 00:03:23 GMT
Kuro,

doc of some type -> parse content into various fields -> post to Solr

Even Nutch does the same - there is a title field, a content field, and so on (the exact names
may be different).

Of course, you can always just combine everything into a single content field.

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Teruhiko Kurosaka <Kuro@basistech.com>
To: solr-user@lucene.apache.org
Sent: Tuesday, July 3, 2007 8:56:23 PM
Subject: Indexing HTML and other doc types

Solr looks very good for indexing and searching strcutured data. 
But I noticed there is no tool in the Solr distribution with which documents
of other doc types can be indexed.  Are there other side projects that 
develop Solr clients for indexing documents of other doc types?

Or is the generic full-text search really a wrong area to apply Solr, and
should I be using something like Nutch?
-kuro 




Mime
View raw message