lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjula Wijewickrema <manjul...@gmail.com>
Subject Why hit is 0 for bigrams?
Date Tue, 08 Jul 2014 04:30:57 GMT
Hi,

I tried to index bigrams from a documhe system gave and the system gave me
the following output with the frequencies of the bigrams(output 1):

array size:15
array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist
sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian
sabaragamuwa/1, main librari/2, manjula assist/4, manjula fine/1, manjula
name/1, name manjula/1, sabaragamuwa univers/3, univers main/2, univers
sabaragamuwa/1}

For this I used the follwing code in the createIndex() class:


ShingleAnalyzerWrapper sw=*new *ShingleAnalyzerWrapper(analyzer,2);

sw.setOutputUnigrams(*false*);



Then I tried search the indexed bigrams of the same document using the
following code in searchIndex()class:


IndexReader indexReader = IndexReader.open(directory);

IndexSearcher indexSearcher = *new* IndexSearcher(indexReader);



Analyzer analyzer = *new* WhitespaceAnalyzer();



QueryParser queryParser = *new* QueryParser(*FIELD_CONTENTS*, analyzer);



Query query = queryParser.parse(terms[pos[freqs.length-q1]]);



System.*out*.println("Query: " +query);



Hits hits = indexSearcher.search(query);

System.*out*.println("Number of hits: " + hits.length());




For this, the system gave me the following output (output2):


Query: contents:manjula contents:assist

Number of hits: 0

Query: contents:sabaragamuwa contents:univers

Number of hits: 0

Query: contents:univers contents:main

Number of hits: 0

Query: contents:main contents:librari

Number of hits: 0


If someone can please explain me;


(1)why 'contents: /1' is included in the array as an array element? (output
1)


(2) why the system return me the query as 'contents:manjula
contents:assist' instead of 'manjula assist'? (output 2)


(3) why the number of hits given as 0 instead of their frequencies? (output
2)


I highly appreciate your kind reply.


Manjula.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message