lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <>
Subject Terms Matching Query
Date Wed, 15 Oct 2008 07:00:58 GMT
Hopefully I'm not double posting this, but I haven't seen it show up
in the archive and I got a weird rejection email from so I'll try again.

I'm working on indexing JSON documents via Lucene and I've run into a
bit of a snag. Currently, I'm indexing JSON documents by adding fields
that are path/value pairs. For example, given a JSON document like:

     "first": "Paul",
     "last": "Davis"
 "jobs": ["hotdog vendor", "signboard guy"]

I build the doc with something like:

doc.add(new Field("/name/first", "Paul", ...))
doc.add(new Field("/name/last", "Davis", ...))
doc.add(new Field("/jobs/$0", "hotdog vendor", ...))
doc.add(new Field("/jobs/$1", "signboard guy", ...))

doc.add(new Field("__JSON_PATH__", "/name/first", ...))
doc.add(new Field("__JSON_PATH__", "/name/last", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$0", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$1", ...))

Now, at query time I'm attempting to support expanding searches as such:

(OOB) alias =  "name.*" => __JSON_PATH__:name.*

Which would get expanded to the obvious "name.first:Paul
name.last:Paul". I was planning on just doing a search for
__JSON_PATH__:name.* collecting the terms and doing an OR like what
the old Prefix/Range queries. (I think I read that RangeQueries are
different now, but right now I'm going for Make It Work)

So anyway, long story short, is there a way to get just the list of
terms that match an arbitrary lucene query?

I tried poking at Query.extractTerms(Set terms) but it appears to fall
down for RangeQueries. I contemplated telling the query parser to use
old range queries, but wasn't sure if a different type of query would
cause the same problems or not. I've started just creating unique
documents for each possible path, but that seems like a Bad Idea (TM).

Any thoughts on ways to completely rework the whole system would also
be welcome. I'm relative new to Lucene so I may be completely screwing
the pooch on something I'm not even aware of.

Paul Davis

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message