lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mhzm...@interia.eu
Subject Re: [RegexQuery] how to check what words were founded in particulary Documents ?
Date Sat, 21 Jul 2007 18:16:35 GMT
You are right. After I changed "Cat Cot Cit Cet" to "cat cot cit cet", it works very good :)I
see that there is quite a lot of knowledge to acquire in order to be familiar with this technology...
Thank You !

> StandardAnalyzer lowercases terms. The term you pass to the term enum 
> has an upper case character. You really need to be careful if you are 
> going to use an analyzer to store and then a term enum to cycle 
> through...the term you pass the term enum must match the output of the 
> analyzer.
> 
> - Mark
> 
> mhzmark@interia.eu wrote:
> > To begin with, thanks for Yours quick response.
> >
> > I am still trying write this code and I've already written this one:
> >
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.store.*;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.document.*;
> > import org.apache.lucene.search.*;
> > import org.apache.lucene.search.regex.*;
> >
> > public class TmpMain {
> > 	public static void main(String[] args) throws Exception {	
> > 		RAMDirectory directory = new RAMDirectory(); 
> > 		IndexWriter writer = new IndexWriter(directory, new
> StandardAnalyzer(), 
> >                                                       true); 
> > 		Document doc = new Document(); 
> > 		doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES, 
> >                                     Field.Index.TOKENIZED)); 
> > 		writer.addDocument(doc); 
> > 		writer.optimize();
> > 		writer.close(); 
> > 		IndexReader reader=IndexReader.open(directory);
> > 		RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new
> Term("field", 
> > 				"C.t"), new JavaUtilRegexCapabilities());
> > 	/*
> > 	//Second Version:
> > 		WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
> >                                           Term("field", "C?t"));
> > 	*/
> > 		while(regexTermEnum.next()) {
> > 			System.out.println("Next:");
> > 			System.out.println(regexTermEnum.term().text());
> > 		}
> > 		System.out.println("End.");		
> > 	}
> > }
> >
> >
> > But the output of this code is:
> > End.
> >
> > As you can see above I have tried change line:
> > 	RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field",
> 
> > 		"C.t"), new JavaUtilRegexCapabilities());
> > on
> > 	WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new
> Term("field", 
> > 		"C?t"));
> >
> > but the result is the same :/
> > What am I doing wrong ?
> >
> >
> >
> >
> >   
> >> Erick - you're not missing anything, except that the original poster  
> >> is after RegexQuery, not WildcardQuery.  Both work basically the same 
> 
> >> way, except in the pattern matching capabilities.
> >>
> >> 	Erik
> >>
> >> On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
> >>     
> >>> Erik:
> >>>
> >>> Well, you wrote the book <G>. But I thought something like this
> >>> would work
> >>>
> >>> TermDocs td = reader.termDocs();
> >>> WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
> >>> "c*t"));
> >>> while (we.next()) {
> >>>  td.seek(we);
> >>>  while (td.next()) {
> >>>     report document contains term;
> >>>  }
> >>> }
> >>>
> >>> Although I admit I haven't tried it, so I could be totally off  
> >>> base. What
> >>> am I missing?
> >>>
> >>> Erick
> >>>
> >>> On 7/20/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> >>>       
> >>>> Erick - I think you're mixing things up with WildcardQuery.
> >>>> RegexQuery does support all regex capabilities (depending on the
> >>>> underlying regex matcher used).
> >>>>
> >>>> A couple of techniques you could use to achieve the goal:
> >>>>
> >>>>         * Use RegexTermEnum, though that'll give you the terms  
> >>>> across the
> >>>> entire index, so maybe in your use case you could index a single
> >>>> document into a RAMDirectory and RegexTermEnum on it.
> >>>>
> >>>>         * Try out SpanRegexQuery and use getSpans() to get the exact
> >>>> matches.
> >>>>
> >>>> Erik
> >>>>
> >>>>
> >>>>
> >>>> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
> >>>>
> >>>>         
> >>>>> First, the period (.) isn't part of the syntax, so make sure you
 
> >>>>>           
> >>>> look
> >>>>         
> >>>>> more carefully at the Lucene syntax...
> >>>>>
> >>>>> Then, you might be able to use WildcardTermEnum to find
> >>>>> the terms that match and TermDocs to find the documents
> >>>>> that contain those terms.
> >>>>>
> >>>>> There's nothing built into Lucene to do this out of the box, you
> >>>>> have to roll your own.
> >>>>>
> >>>>> Best
> >>>>> Erick
> >>>>>
> >>>>> On 20 Jul 2007 21:27:40 +0200, mhzmark@interia.eu  
> >>>>>           
> >>>> <mhzmark@interia.eu>
> >>>>         
> >>>>> wrote:
> >>>>>           
> >>>>>> Hello.
> >>>>>>
> >>>>>> Let assume that I have this code in my application:
> >>>>>>
> >>>>>>    (...)
> >>>>>>    Query query = new RegexQuery(new Term("field", "C.T"));;
> >>>>>>    // searching...
> >>>>>>    (...)
> >>>>>>
> >>>>>> And now, I would like to know if my application founded "cat"
or
> >>>>>> "cot" or
> >>>>>> something else. How can I check what was founded by my  
> >>>>>>             
> >>>> application ?
> >>>>         
> >>>>>> I would like to write application like this:
> >>>>>>    INPUT -> regular expression
> >>>>>>    OUTPUT -> file  ---> word
> >>>>>>
> >>>>>> example: INPUT = "C.T"
> >>>>>>          OUTPUT =
> >>>>>>                   a.txt --> CAT
> >>>>>>                   a.txt --> COT
> >>>>>>                   b.txt --> CAT
> >>>>>>                   b.txt --> CAT
> >>>>>>                   b.txt --> COT
> >>>>>>                   (...)
> >>>>>>
> >>>>>> So, how to check what words were founded in particulary Documents
> >>>>>> after
> >>>>>> searching? I see that Hits class contains only founded  
> >>>>>>             
> >>>> documents and
> >>>>         
> >>>>>> nothing more (I am new in this technology so I can be wrong...)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>>         
> >>>>>> -
> >>>>>> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
> >>>>>>             
> >>>> latwiej je
> >>>>         
> >>>>>> oczarujesz
> >>>>>>
> >>>>>>             
> >>>>>>>>> http://link.interia.pl/f1b17
> >>>>>>>>>                   
> >>>>>>  
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>>         
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>         
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>     
> >
> >
> > ----------------------------------------------------------------------
> > Najbogatsza Polka 
> > Zobacz >>> http://link.interia.pl/f1ae2
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


----------------------------------------------------------------------
Wszystko czego potrzebujesz latem: kremy do opalania, 
stroje kapielowe, maly romans 

>>>http://link.interia.pl/f1b15


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message