lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hastings <>
Subject commongrams
Date Fri, 10 Feb 2017 21:55:29 GMT
Hey All,
I followed an old blog post about  implementing the common grams, and used
the 400 most popular words file on a subset of my data.  original index
size was 33gb with 2.2 million documents, using the 400, it grep to 96gb.
I scaled it down to the 100 most common words and got to about 76gb, but
with a cold phrase search going from 4 seconds at 400 words to 6 with 100.
 this will not really scale well, as the base index that this is a subset
of right now has 22 million documents and sits around 360 gb.  at this
rate, it would be around a TB index size.  is there a common
hardware/software configuration to handle TB size indexes?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message