lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georger Araujo <>
Subject Iterating over all documents in an index
Date Sat, 12 Feb 2011 14:07:32 GMT
I want to iterate over all documents in a given index. I've found the
following piece of code [1]:

IndexReader reader = // create IndexReader
for (int i=0; i<reader.maxDoc(); i++) {
    if (reader.isDeleted(i))

    Document doc = reader.document(i);
    String docId = doc.get("docId");

    // do something with docId here...

I implemented it in my code and it worked fine. After that, I found out
about MatchAllDocsQuery.
I am not concerned with scoring nor sorting - all I want to do is iterate
over all documents in the index and collect their terms. My ultimate goal is
to build a bag-of-words of all documents and their terms so that I can run a
clustering algorithm on it.I've also found out about Mahout's built-in
vector creation utility [2], but I need to do this task from my own code.

I ask, what is the recommended approach?




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message