lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Indexing and searching a DateTime range
Date Mon, 09 Feb 2015 08:10:55 GMT
Hi,

> I am in the beginning of implementing a Lucene application which would
> supposedly search through some log files.
> 
> One of the requirements is to return results between a time range. Let's say
> these are two lines in a series of log files:
> 2015-02-08 00:02:06.852Z INFO...
> ...
> 2015-02-08 18:02:04.012Z INFO...
> 
> Now I need to search for these lines and return all the text in-between. I was
> using this demo application to build an index:
> http://lucene.apache.org/core/4_10_3/demo/src-
> html/org/apache/lucene/demo/IndexFiles.html
> 
> After that my first thought was using a term range query like this:
>         TermRangeQuery query =
> TermRangeQuery.newStringRange("contents",
> "2015-02-08 00:02:06.852Z", "2015-02-08 18:02:04.012Z", true, true);
> 
> But for some reason this didn't return any results.

Lucene tokenizes the text, so you can search for terms ("words"). Those dates are splitted
into several terms. In general, this is not the way to search on numeric / date range:
- it is horribly slow, because there are many terms in that "content" field.

> Then I was Googling for a while how to solve this problem, but all the
> datetime examples I found are searching based on a much simpler field.
> Those examples usually use a field like this:
> doc.add(new LongField("modified", file.lastModified(), Field.Store.NO));

That is the way to do it. Log files are "structured", so you need to do preprocessing. You
have to put the different information into different fields (like the "modified" field, as
mentioned in your example). You can still fill the "contents" field as you did above with
all information to do plain fulltext search (like finding a log line based on some message
contents), but in addition, you use other fields for more specific searches like ranges. In
Lucene you generally fill several fields with the redundant information (like dates in fulltext
field and some extra timestamp field).

The information you return to the user can be put into a "stored" only field. This one is
returned with search results.

> So I was wondering, how can I index these log files to make a range query
> work on them? Any ideas? Maybe my approach is completely wrong. I am
> still new to Lucene so any help is appreciated.

The first aproach is wrong, the second approach is right. You just have to make your field
definitions correct.

An alternative would be to use Logstash in combination with Elasticsearch, which is based
on Lucene. This has everything you want to do already implemented for log files.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message