lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <li...@ehatchersolutions.com>
Subject Re: Range queries
Date Thu, 23 Jan 2003 01:56:25 GMT
I wanted to see this first-hand, so I wrote some test code to 
understand how dates are represented and how QueryParser deals with 
them.  I've indexed 500 documents with random dates between 1/1/2002 
and 12/31/2002.

Here's what works:

         QueryParser parser = new QueryParser("contents", new 
StandardAnalyzer());
         String begin = DateField.dateToString(new Date(102, 0, 01)); // 
20020101
         String end = DateField.dateToString(new Date(102, 11, 31)); // 
20021231
         String q = "date:[" + begin + " TO " + end + "]";
         System.out.println("q = " + q);
         Query query = parser.parse(q);
         System.out.println("query = " + query.toString("date"));
         Hits hits = searcher.search(query);
         System.out.println("# found = " + hits.length());

Here's the output:

q = date:[0cvx9a8w0 TO 0daddkbk0]
query = [0cvx9a8w0-0daddkbk0]
# found = 500

If I change begin and end to "20020101" and "20021231" respectively I 
get zero hits.

I'm running the latest Lucene version from CVS, in case that makes a 
difference.

So, while I would love it if QueryParser behaved with the YYYYMMDD 
syntax, it does not.  Or am I missing something here?

Any JavaCC wizzes out there that could modify it to take readable date 
formats and construct the query using the dateToString?  That would be 
sweet!    Has anyone created any JavaScript that mimics the 
dateToString functionality that you'd share?

	Erik



On Wednesday, January 22, 2003, at 10:20  AM, Terry Steichen wrote:
> Erik,
>
> I believe the question was on range queries in general, which of 
> course work
> with the QueryParser.
>
> You can use range queries for dates, provided, as I believe you imply, 
> the
> dates are in lexiographic order (ie, 20030122).  (As to whether dates
> expresed as such are too challenging for the average human being, I 
> don't
> know.)
>
> Regards,
>
> Terry
>
> PS: Just to clarify, I believe that dates represented this way are
> internally treated as strings by Lucene.
>
> ----- Original Message -----
> From: "Erik Hatcher" <lists@ehatchersolutions.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, January 22, 2003 9:49 AM
> Subject: Re: Range queries
>
>
>> Unfortunately I don't believe date field range queries work with
>> QueryParser, or at least not human-readable dates.
>>
>> Is that correct?
>>
>> I think it supports date ranges if they are turned into a numeric
>> format, but no human would type that kind of query in.  I'm sure
>> supporting true date range queries gets tricky with locale issues and
>> such too.
>>
>> Erik
>>
>>
>> On Wednesday, January 22, 2003, at 09:19  AM, Terry Steichen wrote:
>>> Tatu,
>>>
>>> I believe the range query syntax for the latest Lucene version is
>>> "field:[lower TO upper]", or "field:[null TO upper]", or 
>>> "field:[lower
>>> TO
>>> null]".  In earlier versions replace "TO" with a dash ("-").
>>>
>>> I also believe that multiple wildcards ("?" and/or "*") work just 
>>> fine
>>> (as
>>> long as they aren't the first character of the term).
>>>
>>> HTH,
>>>
>>> Terry
>>>
>>> ----- Original Message -----
>>> From: "Tatu Saloranta" <tatu@hypermall.net>
>>> To: <lucene-user@jakarta.apache.org>
>>> Sent: Wednesday, January 22, 2003 11:48 PM
>>> Subject: Range queries
>>>
>>>
>>>> My apologies if this is a FAQ (which is possible as I am new to
>>>> Lucene,
>>>> however, I tried checking the web page for the answer).
>>>>
>>>> I read through the "Query syntax" web page first, and then checked 
>>>> the
>>>> matching query classes. It seems like query syntax page is missing
>>>> some
>>>> details; the one I was wondering about was the range query. Since
>>>> query
>>>> parser seems to construct these queries, I guess they have been
>>> implemented,
>>>> even though syntax page didn't explain them. Is that correct?
>>>>
>>>> Looking at QueryParser, it seems that inclusive range query uses
>
>>>> and ],
>>> and
>>>> exclusive query { and }? Is this right? And does it expect exactly 
>>>> two
>>>> arguments?
>>>> Also, am I right in assuming that range uses lexiographic ordering, 
>>>> so
>>> that it
>>>> basically includes all possible words (terms) between specified 
>>>> terms
>>> (which
>>>> will work ok with numbers/dates as long as they have been padded 
>>>> with
>>> zeroes
>>>> or such)?
>>>>
>>>> Another question I have is regarding wildcard search. Page mentions
>>>> that
>>> there
>>>> is a restriction that search term can not start with a wild card (as
>>>> that
>>>> would render index useless I guess... would need to full scan?).
>>>> However,
>>> it
>>>> doesn't mention if multiple wildcards are allowed? All the example
>>>> cases
>>> just
>>>> have single wild card?
>>>>
>>>> Sorry for the newbie questions,
>>>>
>>>> -+ Tatu +-
>>>>
>>>> ps. Thanks for the developers for the neat indexing engine. I am
>>>> currently
>>>> evaluating it for use in a large-scale enterprise content management
>>> system.
>>>>
>>>>
>>>> --
>>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>>
>>>>
>>>
>>>
>>> --
>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>
>>>
>>
>>
>> --
>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>>
>
>
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message