lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ny1984 <>
Subject DuplicateFilter Problem
Date Wed, 07 Jan 2009 12:19:54 GMT

Hi everyone,

I have a problem about Lucene DuplicateFilter. I have some PDF files and
have 3 field (id, title and content). I am indexing pdf files page by page.
Different pages on the same pdf stores same id and title, only content is
different. I want to search a string and eliminate the same id. But on some
documents DuplicateFilter runs perfect, but in some socumetns it returns 0
result. By the way if I search the string in title it again returns true
results, but if we search in content 0 results resturn. I have added my code
below. I could not find the problem. Please help me about the issue. Thank

        String directory = "C:/indexes/";
        Query queryd = null;
        IndexReader =;
        IndexSearcher searcher = new IndexSearcher(IndexReader);
        Analyzer sanalyzer = new StopAnalyzer();
        QueryParser parser = new QueryParser("content",sanalyzer);

        queryd = parser.parse("point");
        DuplicateFilter df = new   DuplicateFilter("id",1,1);
        ehits =, df);

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message