lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ishan Chattopadhyaya <ichattopadhy...@gmail.com>
Subject Re: Possible to cause documents to be contiguous after forceMerge?
Date Wed, 16 Nov 2016 08:45:09 GMT
http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html

On Wed, Nov 16, 2016 at 11:15 AM, Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> Can IndexSort help here?
> ------------------------------
> From: Erick Erickson <erickerickson@gmail.com>
> Sent: ‎11/‎16/‎2016 9:29
> To: java-user <java-user@lucene.apache.org>
> Subject: Re: Possible to cause documents to be contiguous after
> forceMerge?
>
> Well, codecs are pluggable so if you can show that you'd get
> an improvement (however you measure them) and that whatever
> you have in mind wouldn't penalize the general case you could
> submit it as a proposal/patch.
>
> Best,
> Erick
>
> On Tue, Nov 15, 2016 at 6:21 PM, Kevin Burton <burton@spinn3r.com> wrote:
> > On Tue, Nov 15, 2016 at 6:16 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> You can make no assumptions about locality in terms of where separate
> >> documents land on disk. I suppose if you have the whole corpus at index
> >> time you
> >> could index these "similar" documents contiguously. T
> >>
> >
> > Wow.. that's shockingly frightening. There are a ton of optimizations if
> > you can trick the underlying content store into performing locality.
> >
> > Not trying to be overly negative so another way to phrase it is that at
> > least there's room for improvement !
> >
> >
> >> My base question is why you'd care about compressing 500G. Disk space
> >> is so cheap that the expense of trying to control this dwarfs any
> >> imaginable
> >> $avings, unless you're talking about a lot of 500G indexes. In other
> words
> >> this seems like an
> >> XY problem, you're asking about compressing when you are really
> concerned
> >> with something else.
> >>
> >
> > 500GB per day... additionally, disk is cheap, but IOPS are not. The more
> we
> > can keep in ram and on SSD the better.
> >
> > And we're trying to get as much in RAM then SSD as possible... plus we
> have
> > about 2 years of content.  It adds up ;)
> >
> > Kevin
> >
> > --
> >
> > We’re hiring if you know of any awesome Java Devops or Linux Operations
> > Engineers!
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+ profile
> > <https://plus.google.com/102718274791889610666/posts>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message