manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronny Heylen <>
Subject How to index files, not folders
Date Sat, 09 Nov 2013 12:50:01 GMT
Indexing all indexable files on our Windows drive fails with different
Several of these problems were solved by the list, thanks for that, now we
still have (at least) the missing class in common-compress problem. Using
jar from common-compress 1.6 did not help.
Anyway, this introduction is just to explain our approach to have most of
interesting files indexed and to "easily" identify where the problems are:
we have one job for all *.doc*, one for all *.xls*, ...
We observe that on the drive we have:
84000 *.doc* files
172000 *.xls* files
161000 folders
If we just index *.doc*, it give nothing, we have to say indexable files
*.doc* and folder *
Then the job indexes 245000 documents (=number of *.doc* + number of
The same for *.xls* => indexing 333000 documents
If we define a job *.doc* + *.xls* + folder we get a 417000 documents job.
So we suppose that with two jobs (one doc and one xls) the folders are
indexed twice.
The question is: how can we avoid to index folders?
Perhaps there is another way to define the paths in the rule set, to avoid
indexing folders? But how?

View raw message