lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <noble.p...@gmail.com>
Subject Re: DataImportHandler current_index_time & post-completion actio
Date Thu, 17 Jul 2008 04:30:55 GMT
comments inline

On Thu, Jul 17, 2008 at 5:00 AM, wojtekpia <wojtek_p@hotmail.com> wrote:
>
> I have two questions:
>
> 1. I am pulling data from 2 data sources using the DIH. I am using the
> deltaQuery functionality. Since the data sources pull data sequentially, I
> find that some data is getting unnecessarily re-indexed from my second data
> source. Hopefully this helps illustrate my probem:
>
> Assume last_index_time is 0.
> At time = 1, pull data from data source 1 with a query that includes
> "last_modified> '${dataimporter.last_index_time}'". Note that this pulls
> data for the time interval [0,1]. This step takes 1 time interval.
> At time = 2, data source 2 is polled with the same query. This step takes 1
> time interval. Note that this pulls data for the time interval [0,2].
> At t=3, last_index_time is set to 1
>
> Next time I run the DIH, I will be unneccessarily re-indexing data that
> appeared in data source 2 in the inteval [1,2].
>
> Ideally, I'd like to have access to something like
> ${dataimporter.current_index_time}, so I could restrict my delta query to:
> "last_modified> '${dataimporter.last_index_time}' AND last_modified <
> '${dataimporter.current_index_time}'"
>
> Is this available?
It is not available but can be added easily. I shall give this in the
next patch. If you want it earlier I can it must be a small
modification in DocBuilder.java
>
>
> 2. I have a transient table that I query with the DIH to load my index.
> After loading values into the index, I want to delete them from the
> transient table. Is there a way to do this from the DIH? I tried stuffing a
> delete statement into the deltaQuery attribute, but that didn't work:
>
> <dataConfig>
>    <dataSource driver="org.hsqldb.jdbcDriver"
> url="jdbc:hsqldb:/temp/example/ex" user="sa" />
>    <document name="products">
>            <entity name="item" pk="ID" query="select * from item"
>                deltaQuery="select id from item where last_modified >
> '${dataimporter.last_index_time}'; delete from item where last_modified <
> '${dataimporter.last_index_time}'">
>            </entity>
>        </entity>
>    </document>
> </dataConfig>

There is not straight forward way to achieve this. but the last
component to get a callback when the indexing is finished is the
DatSource#close(). if you are adventurous enough you can extend the
JdbcdataSource and override the method close() and invoke a delete
query from the close() method . and use that as your DataSource
>
>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler-current_index_time---post-completion-action-tp18498832p18498832.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Mime
View raw message