lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer" <>
Subject Re: GData Server - Lucene storage
Date Fri, 02 Jun 2006 19:57:28 GMT
On 6/2/06, Otis Gospodnetic <> wrote:
> Simon,
> I look a quick look at the UML PDF.  It seems to me that various *Services
> are overly complicated.  Since you can have only 1 thread modifying the
> Lucene index, perhaps you should go the same route as IndexModifier (I never
> used it, but it looks like people are using it to manage write/delete/search
> concurrency).  So perhaps all you need are IndexStorageService and
> SearchService for the searchable Lucene index(es), and a DataStorageService
> for storing and reading data from the BDB store or whatever you end up
> using.

The UML is just about the storage nothing to do with the search index.  The
search index will be a different index.
Thank you for the hint with the indexmodifier. I changed the uml and
uploaded it again if you wanna have a look at it.
I guess the performance drawback won't be too big due to the size of the
entries i will store. A feed server does also mainly serve get requests. I
will implements 2 storage types anyway but not sure yet which one will be
the first ;) guess i'll go for lucene.

Regarding the naming of StorageCache - this confused me at first.  Seeing
> "cache" makes me think "previously retrieved/found data stored in a cache
> for faster subsequent requests/searches".  But from what I can tell, that is
> not what StorageCache is about.  It looks like StorageCache is really a
> buffer of entries that are scheduled to be written to or deleted from the
> index+storage.  If that's so, I would consder renaming this "StorageBuffer"
> or some such.

This is true.  that should be changed. :)

> ----- Original Message ----
> From: Simon Willnauer <>
> To:
> Sent: Thursday, June 1, 2006 7:37:44 PM
> Subject: GData Server - Lucene storage
> Hello folks,
> as I'm the only developer on the project due to  the SummerOfCode
> program it is quiet a tough task to discuss all the architecture with
> you on the mailing list. For this reason I decided to create UML
> diagrams to discuss the main components. I will not attach the uml to
> the mails rather upload it to a server so you can download an study
> it.
> Well, the next thing I have to implement is a storage to store the
> entries in. I will provide 2 kinds of storage's (lucene and BerkleyDB
> based). The first will be a lucene index to store the entries
> identified by the entry ID and  feed ID stored in the index as a
> Keyword (used to be Field.Keyword). The underlaying lucene storage
> will only be used to store the entries compressed. Which feed entries
> to retrieve from the lucene storage will be based on results of the
> indexing/search component as every client request to a gdata server is
> a query to the index. So the results of the search are entry ids and a
> corresponding feed. These entries will be retrieved from the storage
> and send back to the client. The storage component does also provide
> delete / update and insert functionality (wouldn't be a storage
> without these).
> The biggest problem with the lucene storage is to achieve a
> transactional state. Imagine the following scenario:
> An update request comes in. -> the entry to update will be added to
> the lucene writer   who writes the update. But another delete request
> has locked the index and an IOException will be thrown. So the update
> request will queue the entry and retries to obtain the lock. No
> problem so far. But if the index writer can not open the index due to
> some other error (the index could not be found)  the exception will
> also be an IOExc. Is there any way to figure out whether the
> IOException is caused due to a lock which would be alright or due to
> some other serious reasons?
> I added some comments on the UML to describe the arch. to you in more
> detail. So please download the file and have a look at it.
> I will appreciate all your comments!!
> regards Simon
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message