hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <renaud.del...@deri.org>
Subject Re: Lucene instead of HFiles?
Date Fri, 05 Oct 2012 08:48:58 GMT

With respect to point 3, I know there is a new codec in Lucene 4.0 for 
append-only filesystem such as hdfs (LUCENE-2373)

Also, it would also depend on the use case. At the moment, for storing 
data, I would expect HFile to be much more efficient in term of 
compression than Lucene file system (in fact, there is no real 
comnpression, apart by compressing yourself the field byte stream before 
storing it). There is some work to try to make Lucene more efficient for 
small and medium sized fields (LUCENE-4226 - block-style compression and 
storing), but I think HFile is far more optimised for this task.
In fact, another interesting idea would be to investigate the use of 
HFile as a StoredFieldFormat in Lucene. Efficient storage of data in 
Lucene is imho quite a missing feature.

Renaud Delbru

On 05/10/12 07:36, Adrien Mogenet wrote:
> "Don't bother trying this in production" ;-)
> 1. Are you sure lookup by key are faster ?
> 2. Updating Lucene files in a lock-free maneer and ensuring good
> concurrency can be a bit tricky
> 3. AFAIK, Lucene files don't fit in HDFS and thus another distributed
> storage is required. Katta does not look as powerful as Hadoop.
> On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic
> <otis.gospodnetic@gmail.com> wrote:
>> Hi,
>> Has anyone attempted using Lucene instead of HFiles (see
>> https://twitter.com/otisg/status/254047978174701568 )?
>> Is that a completely crazy, bad, would-never-work,
>> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or
>> not?
>> Thanks,
>> Otis
>> --
>> Search Analytics - http://sematext.com/search-analytics/index.html
>> Performance Monitoring - http://sematext.com/spm/index.html

View raw message