lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Cooper <inanutshel...@gmail.com>
Subject DIH when using XML Files questions
Date Tue, 27 Sep 2011 20:51:21 GMT
I'm researching using DataImportHandler to import my data files utilizing
FileDataSource with FileListEntityProcessor and have a couple questions
before I get started that I'm hoping you guys can assist with.

1) I would like to put a file on the local filesystem in the configured
location and have Solr see and process the file without additional effort on
my part.
1a) Is this doable in any way? From what I've seen, this is not supported
and I must manually call a URL (e.g.
http://foo/solr/dataimport?command=full-import).
1b) The manual, URL-based invocation method seems perfectly logical in a
database-oriented world, where one might schedule an update to run regularly
but in my case I have a couple identical indexes I load balance between and
don't want to run the same hefty query multiple times in parallel. As such,
I'm doing one query, writing the results to an XML file, pushing that file
to each box, and then wanting that file processed. I'd like the process to
be as automated as possible.

2) I would like any files processed by Solr to be deleted after they've been
imported. I haven't seen any way to do this currently. I thought I might be
able to subclass something, but FileListEntityProcessor, for example,
doesn't seem to give any handles at the right time in the workflow to delete
a file.

3) When reading the DIH documentation, I ran across this statement: "When
delta-import command is executed, it reads the start time stored in *
conf/dataimport.properties*. It uses that timestamp to run delta queries and
after completion, updates the timestamp in *conf/dataimport.properties*." If
it really does update the date to the completion date, what happens to any
files added between the start and end dates? Are they lost?

4) For delta imports, I don't see mention of how processed files are ordered
other than that it tries not to re-import files older than that mentioned in
the conf/dataimport.properties file. In cases where order matters, does it
order the files by name or creation date or ...?

Thanks for any help,

Gabriel.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message