lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: How would you architect solr/lucene if you were starting from scratch for them to be 10X+ faster/efficient ?
Date Mon, 23 Jan 2017 17:04:20 GMT

I’ve had some curiosity about this question too.

For a while, I watched for a seastar-like library for the JVM, but https://github.com/bestwpw/windmill
was the only one I came across, and it doesn’t seem to be going anywhere. Since one of the
points of the JVM is to abstract away the platform, I certainty wonder if the JVM will ever
get the kinds of machine affinity these other projects see. Your one-shard-per-core could
probably be faked with multiple JVMs and numactl - could be an interesting experiment.

That said, I’m aware that a phenomenal amount of optimization effort has gone into Lucene,
and I’d also be interested in hearing about things that worked well.


From: Dorian Hoxha <dorian.hoxha@gmail.com>
Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org>
Date: Friday, January 20, 2017 at 8:12 AM
To: "dev@lucene.apache.org" <dev@lucene.apache.org>
Subject: How would you architect solr/lucene if you were starting from scratch for them to
be 10X+ faster/efficient ?

Hi friends,
I was thinking how scylladb architecture<http://www.scylladb.com/technology/architecture/>
works compared to cassandra which gives them 10x+ performance and lower latency. If you were
starting lucene and solr from scratch what would you do to achieve something similar ?

Different language (rust/c++?) for better SIMD<http://blog-archive.griddynamics.com/2015/06/lucene-simd-codec-benchmark-and-future.html>
?
Use a GPU with a SSD for posting-list intersection ?(not out yet)
Make it in-memory and use better data structures?
Shard on cores like scylladb (so 1 shard for each core on the machine) ?
External cache (like keeping n redis-servers with big ram/network & slow cpu/disk just
for cache) ??
Use better data structures (like algolia autocomplete radix<https://blog.algolia.com/inside-the-algolia-engine-part-2-the-indexing-challenge-of-instant-search/>
)
Distributing documents by term instead of id<http://research.microsoft.com/en-us/um/people/trishulc/papers/Maguro.pdf>
?
Using ASIC / FPGA ?

Regards,
Dorian
Mime
View raw message