lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Check if Term present in Existing Index before Merging indexes from Directory.
Date Wed, 11 Sep 2013 13:26:14 GMT
If you want to stick with the approach of multiple indexes you'll have
to add some logic to work round it.

Option 1.

Post merge, loop through all docs identifying duplicates and deleting
the one(s) you don't want.


Option 2.

Pre merge, read all indexes in parallel, identifying and deleting as above.


Option 3.

When creating a new index, check the first and delete matches or don't
index the file, whichever makes sense.


I'm sure there are other options as well, but no instant solutions.
One obvious option is to skip the merging altogether: if you want one
big index, why not just work directly with that, using updateDocument
with filename as the Term.



--
Ian.


On Wed, Sep 11, 2013 at 1:40 PM, Ankit Murarka
<ankit.murarka@rancoretech.com> wrote:
> Hello
>
> Have a peculiar problem to deal with and I am sure there must be some way to
> handle it.
>
> 1. Indexes exist on the server for existing files.
> 2. Generating indexing is automated so files when generated will also lead
> to index generation.
> 3. I am merging the newly generated indexes and existing index.
>
> /*Field of prime importance is fileName.*/
>
> Now since merging is being done with /* writer.addIndexes(Directory name)*/
>
> The same file if indexed again is being added in the indexes twice. So in
> Hit I am getting more than 1 entries for same file. No problem with the
> HIT..
>
> Problem is with the same file being indexed two times during merging..
>
> I need to ensure that when I merge indexes, if term say /*"File1"*/ is
> already present, the indexes should be updated instead of adding. This is
> supposed to happen during indexing process.
>
> Kindly guide as to how it can be achieved.. Javadoc does not seem to help
> me.
>
> TIA.
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message