lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <>
Subject Re: Hierarchical document
Date Tue, 21 Oct 2003 15:34:20 GMT
One way to implement hierarchical documents is through the use of
predefined phrases. Consider the 2 hierarchies:

1. Kids_and_Teens/Computers/Software/Games
2. Computers/Software/Freeware

When indexing a document belonging to (1), add these terms in consecutive
order (autoincrement=1): "dir:Top dir:Kids_and_Teens dir:Computers
dir:Software dir:Games dir:Bottom"

For documents belonging to (2), add: "dir:Top dir:Computers dir:Software

The terms "dir:Top" and "dir:Bottom" can be used to anchor a query
to a specific portion of the hierachy.

Thus, a query containing the phrase: "dir:Computers dir:Software" would
match documents in both (1) and (2) (and perhaps others), but a query for:
"dir:Top dir:Kids_and_Teens dir:Computers dir:Software" would target only
'Computer/Software' documents from the 'Kids_and_Teens' top level directory.
(The QueryPhrase 'slop factor' would be set to 0).


----- Original Message ----- 
From: "Tatu Saloranta" <>
To: "Lucene Users List" <>
Sent: Monday, October 20, 2003 8:24 PM
Subject: Re: Hierarchical document

> On Monday 20 October 2003 10:31, Erik Hatcher wrote:
> > On Monday, October 20, 2003, at 11:06  AM, Tom Howe wrote:
> > There is not a more "lucene" way to do this - its really up to you to
> > be creative with this.  I'm sure there are folks that have implemented
> > something along these lines on top of Lucene.  In fact, I have a
> > particular interest in doing so at some point myself.  This is very
> > similar to the object-relational issues surrounding relational
> > databases - turning a pretty flat structure into an object graph.
> > There are several ideas that could be explored by playing tricks with
> > fields, such as giving them a hierarchical naming structure and
> > querying at the level you like (think Field.Keyword and PrefixQuery,
> > for example), and using a field to indicate type and narrowing queries
> > to documents of the desired type.
> >
> > I'm interested to see what others have done in this area, or what ideas
> > emerge about how to accomplish this.
> I'm planning to do something similar. In my case problem is bit simpler;
> documents have associated products, and products form a hierarchy.
> Searches should be able to match not only direct matches (searching
> product, article associated with product), but also indirect ones via
> membership (product member of a product group, matching group).
> Product hierarchy also has variable depth.
> To do searches using non-leaf hierarchy items (groups), all actual product
> items/groups associated with docs are expanded to full ids when
> indexing (ie. they contain path from root, up to and including node,
> each node component having its own unique id).
> Thus, when searching for an intermediate node (product grouping),
> match occurs since that node id is part of path to products that are in
> the group (either directly or as members of sub-groups).
> Since no such path is stored (directly) in database, this also allows me
to do
> queries that would be impossible to do in database (I could add similar
> path/full id fields for search purposes of course). Thus, Lucene index is
> optimized for searching purposes, and database structure for editing
> and retrieval of data.
> Another thing to keep in mind is that at least for metadata it may make
> to use specialized analyzer, one that allows tokenizing using specific ids
> to store ids as separate tokens; instead of using some standard plain text
> analyzer. This way it is possible to separate ids from textual words (by
> using prefixes, for example, "@1253" or "#13945"); this allows for
> matching based on identity of associated metadata selections.
> -+ Tatu +-
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message