lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <>
Subject Re: DIH FileListEntityProcessor recursion and fileName clash
Date Sun, 01 Feb 2009 21:34:37 GMT
On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie <> wrote:

> Hello
> I have been trying to find out why DIH in FileListEntityProcessor
> mode did not appear to be recursing into subdirectories. Going through
> I eventually tumbled to the fact that my
> filename filter setting from data-config.xml also applied to directory
> names.

Hmm, not good.

>    <entity name="jc"
>       processor="FileListEntityProcessor"
>       fileName=".*\.xml"
>       newerThan="'NOW-1000DAYS'"
>       recursive="true"
>       rootEntity="false"
>       dataSource="null"
>       baseDir="/Volumes/spare/ts/stuff/ford">
> Now, I feel that the fieldName filter should be applied to files fed
> into the parser, it should not be applied to the directory names we are
> recursing through. I bodged the code as follows to adjust the behavior
> so  that the "FileName" and "excludes" attributes of "entity" only
> apply to filenames and not directory names.

I agree with you.

Perhaps we can have separate filters for directories and files but let's
hold on till the need comes up.

> It now recurses though my directory tree only indexing the appropriate
> files! I think the new behavior is more standard.
> Is this a change valid?

Absolutely. Can you please create an issue and attach the patch? Thanks!

Shalin Shekhar Mangar.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message