lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Dzhurinsky <b...@redwerk.com>
Subject Re: adding and updating a lot of document to Solr, metadata extraction etc
Date Tue, 03 Nov 2009 16:08:01 GMT
On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
> About large XML files and http overhead: you can tell solr to load the
> file directly from a file system. This will stream thousands of
> documents in one XML file without loading everything in memory at
> once.
> 
> This is a new book on Solr. It will help you through this early learning phase.
> 
> http://www.packtpub.com/solr-1-4-enterprise-search-server

Thank you, but we have to prepare some proof of concept with the stable
version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.

Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
and looks like this way is preferred in my case.

I do have a lot of HTML pages on disk storage, and some metadata being stored
in SQL tables. What I seem to need is to provide some sort of EntityProcessor
and DataSource to DataImportHandler. Additionally I will need to provide some
sort of properties to instruct data source for data retrieval (table names
etc).

So may be there is some tutorial or how-to, describing the process of creation
of custom classes for importing the data into Solr 1.3.0?

Thank you in advance!

-- 
Eugene N Dzhurinsky

Mime
View raw message