lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fergus McMenemie <>
Subject Re: DIH FileListEntityProcessor recursion and fileName clash
Date Mon, 02 Feb 2009 16:38:48 GMT


I got myself a JIRA account and opened solr-1000 and followed the
wiki instructions on creating a patch which I have now uploaded! Only
problem is that while the fix seems fine the test case I added to fails. I need somebody who knows 
what they are doing to point out what I am doing wrong and/or how
to debug test failures.

It would also be nice if I knew how to run or debug one Junit
test rather than all of them, which takes almost 8min.

  public void testRECURSION() throws IOException {
    long time = System.currentTimeMillis();
    File childdir = new File("." + time + "/child" );
    createFile(childdir, "a.xml", "a.xml".getBytes(), true);
    createFile(childdir, "b.xml", "b.xml".getBytes(), true);
    createFile(childdir, "c.props", "c.props".getBytes(), true);
    Map attrs = AbstractDataImportHandlerTest.createMap(
            FileListEntityProcessor.FILE_NAME, "^.*\\.xml$",
            FileListEntityProcessor.BASE_DIR, childdir.getAbsolutePath(),
            FileListEntityProcessor.RECURSIVE, true);
    Context c = AbstractDataImportHandlerTest.getContext(null,
            new VariableResolverImpl(), null, 0, Collections.EMPTY_LIST, attrs);
    FileListEntityProcessor fileListEntityProcessor = new FileListEntityProcessor();
    List<String> fList = new ArrayList<String>();
    while (true) {
      // add the documents to the index
      Map<String, Object> f = fileListEntityProcessor.nextRow();
      if (f == null)
      fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
    System.out.println("List of files indexed -- " + fList);
    Assert.assertEquals(3, fList.size());

Regards Fergus.

>On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie <> wrote:
>> Hello
>> I have been trying to find out why DIH in FileListEntityProcessor
>> mode did not appear to be recursing into subdirectories. Going through
>> I eventually tumbled to the fact that my
>> filename filter setting from data-config.xml also applied to directory
>> names.
>Hmm, not good.
>>    <entity name="jc"
>>       processor="FileListEntityProcessor"
>>       fileName=".*\.xml"
>>       newerThan="'NOW-1000DAYS'"
>>       recursive="true"
>>       rootEntity="false"
>>       dataSource="null"
>>       baseDir="/Volumes/spare/ts/stuff/ford">
>> Now, I feel that the fieldName filter should be applied to files fed
>> into the parser, it should not be applied to the directory names we are
>> recursing through. I bodged the code as follows to adjust the behavior
>> so  that the "FileName" and "excludes" attributes of "entity" only
>> apply to filenames and not directory names.
>I agree with you.
>Perhaps we can have separate filters for directories and files but let's
>hold on till the need comes up.
>> It now recurses though my directory tree only indexing the appropriate
>> files! I think the new behavior is more standard.
>> Is this a change valid?
>Absolutely. Can you please create an issue and attach the patch? Thanks!
>Shalin Shekhar Mangar.


Fergus McMenemie     
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer

View raw message