lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Possible to cause documents to be contiguous after forceMerge?
Date Wed, 16 Nov 2016 01:32:07 GMT
I have a large index (say 500GB) that with a large percentage of near
duplicate documents.

I have to keep the documents there (can't delete them) as the metadata is
important.

Is it possible to get the documents to be contiguous somehow?

Once they are contiguous then they will compress very well - which I've
already confirmed by writing the exact same document N times.

IDEALLY I could use two fields and have a unique document ID but then a
group_id so that they can be located on disk by the group_id... but I don't
think this is possible.

Can I just create a synthetic "id" field for this and assume that "id" is
ordered on disk in the lucene index?


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message