lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KEGan <>
Subject Re: Modelling relational data in Lucene Index?
Date Tue, 07 Nov 2006 07:36:35 GMT

I am actually doing something what the original poster mentioned.
Previously, I have use Hibernate and Lucene. But I found that for my
particular project my data is quite flat, so in the next version I totally
take out Hibernate (and the complexity with it :)) and use Lucene as the
"main storage".

In this new version, my data is persisted into filesystem simply using
XMLEncoder. Lucene is both used as a text search index, and also provides
reference to the encoded data (.xml) in the filesystem. Everytime data is
added, an entry into Lucene will be made. And I am using RAMDirectory (super
fast), hence if the server ever shutdown/restart, the Lucene in built again
on startup.

This works for my case because my data set is small enough (hey I read
1.1million documents only average about 300MB of lucene index, and I
plenty of RAM), my data probably wont reach anywhere near 0.5 million. The
cons are startup will be slow when data increase, but server shouldnt be
down that often.

Is anyone using similar model ? Any pitfall that I should be aware about ?


On 11/6/06, Emmanuel Bernard <> wrote:
> I had a quick look at SOLR and DBSight. They seem to achieve a different
> goal than Hibernate Lucene.
> The formers belong to the project box category: you set up a server that
> will handle the search for you. The application will then delegate the
> work to those servers.
> The latter belongs to the framework category: you use it inside your
> Hibernate/EJB 3.0 application to enable an index based search feature.
> To a certain extend, it is the same difference between a Google box and
> Lucene.
> You can write some code based on the latter to covers the formers
> features esp the platform abstraction (PHP, .net), but it is probably a
> lot of work and that is not really the point.
> You can write some code based on the formers to enable indexing and
> search of your persistent domain model (persisted through Hibernate),
> but that is probably more work.
> Really it is a matter of easing the pain from one side of the problem or
> the other side. I don't see much competition between the 2 approaches,
> they cover different goals.
> To specifically answer some of your remarks:
> - yes, you need to write some code to recreate an index. Literally, 6
> lines of code.
> - no, I do not currently cache the searcher because, Hibernate is
> transactional by nature and protect yourself as much as possible from
> read uncommited and other data inconsistencies. I guess I could
> implement some caching capabilities using reader.isCurrent() or
> something equivalent.
> - the ability to split searchers servers from indexers servers is on my
> todo list.
> Cheers
> Emmanuel
> Chris Lu wrote:
> > I personally like your effort, but technically I would  disagree.
> >
> > The SOLR project, and the project I am working on, DBSight, have an
> > detached approach which is implementation agnostic, no matter if it's
> > java, ruby, php, .net. The return results can be a rendered HTML,
> > JSON, XML. I don't think you can be more flexible than that. If
> > creating an new index takes 5 minutes without any coding, you can
> > create something more creative.
> >
> >> From business side, you don't need to worry about indexing when
> > designing a system. New requirement may come. It's very hard trying to
> > anticipate all the needs.
> >
> > Technically, detached approach gives more flexible on resources like
> > CPU, memory, hard drive. For example, if your index grows large, say
> > 1G, indexing can take hours with merging, I am not sure how compass or
> > hibernate/lucene handles it. Need to re-write code at that time? I
> > actually feel it's a dangerous trap.
> >
> >> I've introduced a session.index() which forces the (re)indexing of the
> >> document
> > So does it mean you need to write some code to fix the index if it's
> > crashed?
> >
> >> What do you mean by multithread safe? The indexing?
> >> the indexing is multithread safe in the Hibernate Lucene integration
> > The indexing can be threadsafe. But will it affect the searching? With
> > many files changing and merging, if you cache the searcher. the
> > searching will have "read passed EOF" exceptions. If you don't cache
> > the searcher, you will loose the built-in caching, FieldCacheImpl, in
> > Lucene.
> >
> >>
> >> The query process?
> >> the query doesn't have to since you query on a give session (aka user
> >> conversation), so no multithread threat here.
> > So you are not caching searcher.
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message