lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: DIH FileListEntityProcessor recursion and fileName clash
Date Sun, 01 Feb 2009 21:34:37 GMT
On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie <fergus@twig.me.uk> wrote:

> Hello
>
> I have been trying to find out why DIH in FileListEntityProcessor
> mode did not appear to be recursing into subdirectories. Going through
> FileListEntityProcessor.java I eventually tumbled to the fact that my
> filename filter setting from data-config.xml also applied to directory
> names.


Hmm, not good.


>
>
>    <entity name="jc"
>       processor="FileListEntityProcessor"
>       fileName=".*\.xml"
>       newerThan="'NOW-1000DAYS'"
>       recursive="true"
>       rootEntity="false"
>       dataSource="null"
>       baseDir="/Volumes/spare/ts/stuff/ford">
>
> Now, I feel that the fieldName filter should be applied to files fed
> into the parser, it should not be applied to the directory names we are
> recursing through. I bodged the code as follows to adjust the behavior
> so  that the "FileName" and "excludes" attributes of "entity" only
> apply to filenames and not directory names.


I agree with you.

Perhaps we can have separate filters for directories and files but let's
hold on till the need comes up.

>
>
> It now recurses though my directory tree only indexing the appropriate
> files! I think the new behavior is more standard.
>
> Is this a change valid?


Absolutely. Can you please create an issue and attach the patch? Thanks!

-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message