lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ishan Chattopadhyaya <>
Subject RE: Possible to cause documents to be contiguous after forceMerge?
Date Wed, 16 Nov 2016 05:45:16 GMT
Can IndexSort help here?

-----Original Message-----
From: "Erick Erickson" <>
Sent: ‎11/‎16/‎2016 9:29
To: "java-user" <>
Subject: Re: Possible to cause documents to be contiguous after forceMerge?

Well, codecs are pluggable so if you can show that you'd get
an improvement (however you measure them) and that whatever
you have in mind wouldn't penalize the general case you could
submit it as a proposal/patch.


On Tue, Nov 15, 2016 at 6:21 PM, Kevin Burton <> wrote:
> On Tue, Nov 15, 2016 at 6:16 PM, Erick Erickson <>
> wrote:
>> You can make no assumptions about locality in terms of where separate
>> documents land on disk. I suppose if you have the whole corpus at index
>> time you
>> could index these "similar" documents contiguously. T
> Wow.. that's shockingly frightening. There are a ton of optimizations if
> you can trick the underlying content store into performing locality.
> Not trying to be overly negative so another way to phrase it is that at
> least there's room for improvement !
>> My base question is why you'd care about compressing 500G. Disk space
>> is so cheap that the expense of trying to control this dwarfs any
>> imaginable
>> $avings, unless you're talking about a lot of 500G indexes. In other words
>> this seems like an
>> XY problem, you're asking about compressing when you are really concerned
>> with something else.
> 500GB per day... additionally, disk is cheap, but IOPS are not. The more we
> can keep in ram and on SSD the better.
> And we're trying to get as much in RAM then SSD as possible... plus we have
> about 2 years of content.  It adds up ;)
> Kevin
> --
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
> Founder/CEO
> Location: *San Francisco, CA*
> blog:
> … or check out my Google+ profile
> <>

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message