lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mhzm...@interia.eu
Subject Re: [RegexQuery] how to check what words were founded in particulary Documents ?
Date Sat, 21 Jul 2007 17:39:06 GMT
To begin with, thanks for Yours quick response.

I am still trying write this code and I've already written this one:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.regex.*;

public class TmpMain {
	public static void main(String[] args) throws Exception {	
		RAMDirectory directory = new RAMDirectory(); 
		IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), 
                                                      true); 
		Document doc = new Document(); 
		doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES, 
                                    Field.Index.TOKENIZED)); 
		writer.addDocument(doc); 
		writer.optimize();
		writer.close(); 
		IndexReader reader=IndexReader.open(directory);
		RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field", 
				"C.t"), new JavaUtilRegexCapabilities());
	/*
	//Second Version:
		WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
                                          Term("field", "C?t"));
	*/
		while(regexTermEnum.next()) {
			System.out.println("Next:");
			System.out.println(regexTermEnum.term().text());
		}
		System.out.println("End.");		
	}
}


But the output of this code is:
End.

As you can see above I have tried change line:
	RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field", 
		"C.t"), new JavaUtilRegexCapabilities());
on
	WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new Term("field", 
		"C?t"));

but the result is the same :/
What am I doing wrong ?




> Erick - you're not missing anything, except that the original poster  
> is after RegexQuery, not WildcardQuery.  Both work basically the same  
> way, except in the pattern matching capabilities.
> 
> 	Erik
> 
> On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
> > Erik:
> >
> > Well, you wrote the book <G>. But I thought something like this
> > would work
> >
> > TermDocs td = reader.termDocs();
> > WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
> > "c*t"));
> > while (we.next()) {
> >  td.seek(we);
> >  while (td.next()) {
> >     report document contains term;
> >  }
> > }
> >
> > Although I admit I haven't tried it, so I could be totally off  
> > base. What
> > am I missing?
> >
> > Erick
> >
> > On 7/20/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> >>
> >> Erick - I think you're mixing things up with WildcardQuery.
> >> RegexQuery does support all regex capabilities (depending on the
> >> underlying regex matcher used).
> >>
> >> A couple of techniques you could use to achieve the goal:
> >>
> >>         * Use RegexTermEnum, though that'll give you the terms  
> >> across the
> >> entire index, so maybe in your use case you could index a single
> >> document into a RAMDirectory and RegexTermEnum on it.
> >>
> >>         * Try out SpanRegexQuery and use getSpans() to get the exact
> >> matches.
> >>
> >> Erik
> >>
> >>
> >>
> >> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
> >>
> >> > First, the period (.) isn't part of the syntax, so make sure you  
> >> look
> >> > more carefully at the Lucene syntax...
> >> >
> >> > Then, you might be able to use WildcardTermEnum to find
> >> > the terms that match and TermDocs to find the documents
> >> > that contain those terms.
> >> >
> >> > There's nothing built into Lucene to do this out of the box, you
> >> > have to roll your own.
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On 20 Jul 2007 21:27:40 +0200, mhzmark@interia.eu  
> >> <mhzmark@interia.eu>
> >> > wrote:
> >> >>
> >> >> Hello.
> >> >>
> >> >> Let assume that I have this code in my application:
> >> >>
> >> >>    (...)
> >> >>    Query query = new RegexQuery(new Term("field", "C.T"));;
> >> >>    // searching...
> >> >>    (...)
> >> >>
> >> >> And now, I would like to know if my application founded "cat" or
> >> >> "cot" or
> >> >> something else. How can I check what was founded by my  
> >> application ?
> >> >>
> >> >> I would like to write application like this:
> >> >>    INPUT -> regular expression
> >> >>    OUTPUT -> file  ---> word
> >> >>
> >> >> example: INPUT = "C.T"
> >> >>          OUTPUT =
> >> >>                   a.txt --> CAT
> >> >>                   a.txt --> COT
> >> >>                   b.txt --> CAT
> >> >>                   b.txt --> CAT
> >> >>                   b.txt --> COT
> >> >>                   (...)
> >> >>
> >> >> So, how to check what words were founded in particulary Documents
> >> >> after
> >> >> searching? I see that Hits class contains only founded  
> >> documents and
> >> >> nothing more (I am new in this technology so I can be wrong...)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>  
> >> ---------------------------------------------------------------------
> >> >> -
> >> >> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
> >> latwiej je
> >> >> oczarujesz
> >> >>
> >> >> >>>http://link.interia.pl/f1b17
> >> >>
> >> >>
> >> >>  
> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


----------------------------------------------------------------------
Najbogatsza Polka 
Zobacz >>> http://link.interia.pl/f1ae2


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message