lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Sturlese <marc.sturl...@gmail.com>
Subject DataImportHandler: Deleteing from index and db; lastIndexed id feature
Date Tue, 02 Dec 2008 09:31:12 GMT

Hey there,

I have my dataimporthanlder almost completely configured. I am missing three
goals. I don't think I can reach them just via xml conf or transformer and
sqlEntitProcessor plugin. But need to be sure of that.
If there's no other way I will hack some solr source classes, would like to
know the best way to do that. Once I have it solved, I can upload or post
the source in the forum in case someone think it can be helpful.

1.- Every time I execute dataimporthandler (to index data from a db), at the
start time or end time I need to delete some expired documents. I have to
delete them from the database and from the index. I know wich documents must
be deleted because of a field in the db that says it. Would not like to
delete first all from DB or first all from index but one from index and one
from doc every time.
The "delete mark" is setted as an update in the db row so I think I could
use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not
find so much information about how to make it work. As deltaQuery modifies
docs (delete old and insert new) I supose it must be a easy way to do this
just doing the delete and not the new insert.

2.-This is probably my most difficult goal.
Deltaimport reads a timestamp from the dataimport.properties and modify/add
all documents from db wich were inserted after that date. What I want is to
be able to save in the field the id of the last idexed doc. So in the next
time I ejecute the indexer make it start indexing from that last indexed id
doc.
The point of doing this is that if I do a full import from a db with lots of
rows the app could encounter a problem in the middle of the execution and
abort the process. As deltaquey works I would have to restart the execution
from the begining. Having this new functionality I could optimize the index
and start from the last indexed doc.
I think I should begin modifying the SolrWriter.java and DocBuilder.java.
Creating functions like getStartTime, persistStartTime... for ID control

3.-I commented before about this last point. I want to give boost to doc
fields at indexing time.
>>Adding fieldboost is a planned item.

>>It must work as follows .
>>Add a special value $fieldBoost.<fieldname> to the row map

>>And DocBuilder should respect that. You can raise a bug and we can
>>commit it soon.
How can I do to rise a bug?

Thanks in advance




-- 
View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message