lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <noble.p...@gmail.com>
Subject Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
Date Wed, 03 Dec 2008 09:14:49 GMT
Good.
We need usecases like these and contributions from users .

This is a win-win
you will not have to manage the code yourself once it is checked in
As we have more eyes on the DIH code it will also improve

Thanks a lot,
Noble

On Wed, Dec 3, 2008 at 1:49 PM, Marc Sturlese <marc.sturlese@gmail.com> wrote:
>
> That's what I am trying to do. Thanks for the advice. Once I have it done I
> will rise the issue and upload the patch.
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> OK . I guess I see it.  I am thinking of exposing the writes to the
>> properties file via an API.
>>
>> say Context#persist(key,value);
>>
>>
>> This can write the data to the dataimport.properties.
>>
>> You must be able to retrieve that value by ${dataimport.persist.<key>}
>>
>> or through an API, Context.getPersistValue(key)
>>
>> You can raise an issue and give a patch and we can get it committed
>>
>> I guess this is what you wish to achieve
>>
>> --Noble
>>
>>
>>
>> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <marc.sturlese@gmail.com>
>> wrote:
>>>
>>> Do you mean the file used by dataimporthandler called
>>> dataimport.properties?
>>> If you mean this one it's writen at the end of the indexing proccess. The
>>> writen date will be used in the next indexation by delta-query to
>>> identify
>>> the new or modified rows from the database.
>>>
>>> What I am trying to do is instead of saving a timestamp save the last
>>> indexed id. Doing that, in the next execution I will start indexing from
>>> the
>>> last doc that was indexed in the previous indexation. But I am still a
>>> bit
>>> confused about how to do that...
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>
>>>> delta-import file?
>>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <goksron@gmail.com>
>>>> wrote:
>>>>> Does the DIH delta feature rewrite the delta-import file for each set
>>>>> of
>>>>> rows? If it does not, that sounds like a bug/enhancement.
>>>>> Lance
>>>>>
>>>>> -----Original Message-----
>>>>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>>>>> Sent: Tuesday, December 02, 2008 8:51 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: DataImportHandler: Deleteing from index and db;
>>>>> lastIndexed
>>>>> id feature
>>>>>
>>>>> You can write the details to a file using a Transformer itself.
>>>>>
>>>>> It is wise to stick to the public API as far as possible. We will
>>>>> maintain back compat and your code will be usable w/ newer versions.
>>>>>
>>>>>
>>>>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <marc.sturlese@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Thanks I really apreciate your help.
>>>>>>
>>>>>> I didn't explain myself so well in here:
>>>>>>
>>>>>>> 2.-This is probably my most difficult goal.
>>>>>>> Deltaimport reads a timestamp from the dataimport.properties
and
>>>>>>> modify/add all documents from db wich were inserted after that
date.
>>>>>>> What I want is to be able to save in the field the id of the
last
>>>>>>> idexed doc. So in the next time I ejecute the indexer make it
start
>>>>>>> indexing from that last indexed id doc.
>>>>>> You can use a Transformer to write something to the DB.
>>>>>> Context#getDataSource(String) for each row
>>>>>>
>>>>>> When I said:
>>>>>>
>>>>>>> be able to save in the field the id of the last idexed doc
>>>>>> I made a mistake, wanted to mean :
>>>>>>
>>>>>> be able to save in the file (dataimport.properties) the id of the
last
>>>>>> indexed doc.
>>>>>> The point would be to do my own deltaquery indexing from the last
doc
>>>>>> indexed id instead of the timestamp.
>>>>>> So I think this would not work in that case (it's my mistake because
>>>>>> of the bad explanation):
>>>>>>
>>>>>>>You can use a Transformer to write something to the DB.
>>>>>>>Context#getDataSource(String) for each row
>>>>>>
>>>>>> It is because I was saying:
>>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>>> DocBuilder.java.
>>>>>>> Creating functions like getStartTime, persistStartTime... for
ID
>>>>>>> control
>>>>>>
>>>>>> I am in the correct direction?
>>>>>>  Sorry for my englis and thanks in advance
>>>>>>
>>>>>>
>>>>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>>>
>>>>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>>>>>> <marc.sturlese@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hey there,
>>>>>>>>
>>>>>>>> I have my dataimporthanlder almost completely configured.
I am
>>>>>>>> missing three goals. I don't think I can reach them just
via xml
>>>>>>>> conf or transformer and sqlEntitProcessor plugin. But need
to be
>>>>>>>> sure of that.
>>>>>>>> If there's no other way I will hack some solr source classes,
would
>>>>>>>> like to know the best way to do that. Once I have it solved,
I can
>>>>>>>> upload or post the source in the forum in case someone think
it can
>>>>>>>> be helpful.
>>>>>>>>
>>>>>>>> 1.- Every time I execute dataimporthandler (to index data
from a
>>>>>>>> db), at the start time or end time I need to delete some
expired
>>>>>>>> documents. I have to delete them from the database and from
the
>>>>>>>> index. I know wich documents must be deleted because of a
field in
>>>>>>>> the db that says it. Would not like to delete first all from
DB or
>>>>>>>> first all from index but one from index and one from doc
every time.
>>>>>>>
>>>>>>> You can override the init() destroy() of the SqlEntityProcessor
and
>>>>>>> use it as the processor for the root entity. At this point you
can
>>>>>>> run the necessary db queries and solr delete queries . look at
>>>>>>> Context#getSolrCore() and Context#getdataSource(String)
>>>>>>>
>>>>>>>
>>>>>>>> The "delete mark" is setted as an update in the db row so
I think I
>>>>>>>> could use deltaImport. Don't know If deletedPkQuery is the
way to do
>>>>>>>> that. Can not find so much information about how to make
it work. As
>>>>>>>> deltaQuery modifies docs (delete old and insert new) I supose
it
>>>>>>>> must be a easy way to do this just doing the delete and not
the new
>>>>>>>> insert.
>>>>>>> deletedPkQuery does everything first. it runs the query and uses
that
>>>>>>> to identify the deleted rows.
>>>>>>>>
>>>>>>>> 2.-This is probably my most difficult goal.
>>>>>>>> Deltaimport reads a timestamp from the dataimport.properties
and
>>>>>>>> modify/add all documents from db wich were inserted after
that date.
>>>>>>>> What I want is to be able to save in the field the id of
the last
>>>>>>>> idexed doc. So in the next time I ejecute the indexer make
it start
>>>>>>>> indexing from that last indexed id doc.
>>>>>>> You can use a Transformer to write something to the DB.
>>>>>>> Context#getDataSource(String) for each row
>>>>>>>
>>>>>>>> The point of doing this is that if I do a full import from
a db with
>>>>>>>> lots of rows the app could encounter a problem in the middle
of the
>>>>>>>> execution and abort the process. As deltaquey works I would
have to
>>>>>>>> restart the execution from the begining. Having this new
>>>>>>>> functionality I could optimize the index and start from the
last
>>>>>>>> indexed doc.
>>>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>>>> DocBuilder.java.
>>>>>>>> Creating functions like getStartTime, persistStartTime...
for ID
>>>>>>>> control
>>>>>>>>
>>>>>>>> 3.-I commented before about this last point. I want to give
boost to
>>>>>>>> doc fields at indexing time.
>>>>>>>>>>Adding fieldboost is a planned item.
>>>>>>>>
>>>>>>>>>>It must work as follows .
>>>>>>>>>>Add a special value $fieldBoost.<fieldname>
to the row map
>>>>>>>>
>>>>>>>>>>And DocBuilder should respect that. You can raise
a bug and we can
>>>>>>>>>>commit it soon.
>>>>>>>> How can I do to rise a bug?
>>>>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --Noble Paul
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>>>>>> --lastIndexed-id-feature-tp20788755p20790542.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20808620.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul
Mime
View raw message