lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Wagenaar <evert.wagen...@gmail.com>
Subject Re: How to get the terms matching a WildCardQuery in Lucene 6.2?
Date Tue, 25 Oct 2016 17:42:09 GMT
Hi Allison,

Unfortunately I can't compile the code (see below). Can you tell me what's
wrong?
I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
CONSTANT_SCORE_BOOLEAN_REWRITE

What I don't understand actually is the relation between my Query (which is
a wildcard Query and not a MultiTermQuery.

Can you explain?

Thanks,

Evert Wagenaar


[image: Inline image 1]

*Full code of Searcher:*

package tk.evertwagenaar.lucene;


import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Paths;

import java.util.Date;

import java.util.HashSet;

import java.util.Set;


import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.Term;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.MultiTermQuery;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.search.Weight;

import org.apache.lucene.store.FSDirectory;


/** Simple command-line based search demo. */

public class SearchFiles {


private static IndexReader reader;

private static Query q;


private SearchFiles() {

}


/** Simple command-line based search demo. */

public static void main(String[] args) throws Exception {

String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles [-index
dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] [-paging
hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/ for details.";

if (args.length > 0 && ("-h".equals(args[0]) || "-help".equals(args[0]))) {

System.out.println(usage);

System.exit(0);

}


String index = "index";

String field = "contents";

String queries = null;

int repeat = 0;

boolean raw = false;

String queryString = "aard????";

int hitsPerPage = 10;


reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));

IndexSearcher searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer();


BufferedReader in = null;


QueryParser parser = new QueryParser(field, analyzer);

while (true) {

if (queries == null && queryString == null) { // prompt the user

System.out.println("Enter query: ");

}


Query q = parser.parse(queryString);

System.out.println("Searching for: " + q.toString(field));


if (repeat > 0) { // repeat & time as benchmark

Date start = new Date();

for (int i = 0; i < repeat; i++) {

searcher.search(q, 100);

}

Date end = new Date();

System.out.println("Time: " + (end.getTime() - start.getTime()) + "ms");

doPagingSearch(in, searcher, q, hitsPerPage, raw, queries == null &&
queryString == null);

MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE

q = q.rewrite(reader);

Set<Term> terms = new HashSet<>();

Weight weight = q.createWeight(searcher, false);

terms = weight.extractTerms(terms);


System.out.println("Match: " + terms);

reader.close();


}

}

}


/**

* Search the Query against the Index

*/

public static void doPagingSearch(BufferedReader in, IndexSearcher searcher,
Query query, int hitsPerPage,

boolean raw, boolean interactive) throws IOException {


// Collect enough docs to show 5 pages

TopDocs results = searcher.search(query, 5 * hitsPerPage);

ScoreDoc[] hits = results.scoreDocs;


int numTotalHits = results.totalHits;

System.out.println(numTotalHits + " total matching documents");


int start = 0;

int end = Math.min(numTotalHits, hitsPerPage);


hits = searcher.search(query, numTotalHits).scoreDocs;

end = Math.min(hits.length, start + hitsPerPage);


for (int i = start; i < end; i++) {

Document doc = searcher.doc(hits[i].doc);

String path = doc.get("path");

System.out.println((i + 1) + ". " + path);

query.rewrite(reader);

}

}

}
Evert  Wagenaar

On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <evert.wagenaar@gmail.com>
wrote:

> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <tallison@mitre.org>
> het volgende geschreven:
>
>> Make sure to setRewriteMethod on the MultiTermQuery to:
>>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>>
>> Then something like this should work:
>>
>>         q = q.rewrite(reader);
>>
>>         Set<Term> terms = new HashSet<>();
>>         Weight weight = q.createWeight(searcher, false);
>>
>>         weight.extractTerms(terms);
>>
>>
>>
>> -----Original Message-----
>> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
>> Sent: Monday, October 24, 2016 12:41 PM
>> To: java-user@lucene.apache.org
>> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>>
>> I already asked this on StackOverflow. Unfortunately without any answer
>> for over a week now.
>>
>> Therefore again to the real experts:
>>
>>
>> I downloaded a list of 350.000 English words in a .txt file and Indexed
>> it using the latest Lucene (6.2). I want to apply wildcard queries like
>> aard???? and then retreive a list of matches.
>>
>> I've done this before in an older version of Lucene. Here it was pretty
>> simple. I just had to do a Query.rewrite() and this retuned what I needed.
>> Unfortunately in 6.2 this doesn't work anymore. There is a
>> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
>> In my case there's only one matching Term (aardvark). The Searcher
>> returns one hit, containing the Document path to the wordlist. The HashMap
>> is however empty.
>>
>> When I change the Query to find more then one single match (like aa*) the
>> HashMap remains empty.
>>
>> I tried the MatchExtractor too. Unfortunately without result.
>>
>> The Objective of this is to demonstrate the power of Lucene to easily
>> find words of a particular length, given one or more characters. I'm pretty
>> sure I can do this using regular expressions in Java but then it's outside
>> my objective.
>>
>> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
>> Should I use a different Application?
>>
>> Any help is greatly appreciated.
>>
>> Thanks.
>>
>>
>>
>> --
>> Sent from Gmail IPad
>>
>
>
> --
> Sent from Gmail IPad
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message