lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Performance Implications of Different Routing Schemes
Date Sat, 03 Mar 2018 00:51:26 GMT
On 3/2/2018 11:43 AM, Stephen Lewis wrote:
> I'm wondering what information you may be able to provide on performance
> implications of implicit routing VS composite ID routing. In particular,
> I'm curious what the horizontal scaling behavior may be of implicit routing
> or composite ID routing with and without the "/<bits>" param appended on.

The hash calculations should probably introduce so little overhead that
you'd never notice it.

I once implemented a hash algorithm using two hash classes built into
Java.  I'm pretty sure that it was NOT a fast implementation ... and it
could calculate over a million hashes per second on input strings of
about 20 characters.

The hash algorithm used by CompositeId (one of the MurmurHash
implementations) is supposed to be one of the fastest algorithms in the
world.  Unless your uniqueId field values are extremely huge, I really
doubt that hash calculation is a significant source of overhead.

The use of implicit doesn't automatically mean there's no overhead for
routing.  The implicit router can still redirect documents to different
shards, it just does it explicitly, usually with a shard name in a
particular field, rather than by hash calculation.

> A relatively simple assessment I've done belowleads me to believe the
> following is likely the case: if we have S shards and B as our "/bits"
> param, then resource usage would Big O scale as follows (note: Previously
> I've received the advice that any shard should be capped at a max of 120M
> documents, which is where the cap on docs/shard-key comes from)

Query performance should be about the same for any routing type.  It
does look like when you use compositeId and actually implement shard
keys, you can limit queries to those shards, but a *general* query is
going to hit all shards.

If your query rate is very low (or shards are distributed across a lot
of hardware that has significant spare CPU capacity) performance isn't
going to be dramatically different for a query that hits 2 shards versus
one that hits 6 shards.  If your query rate is high, then more shards
probably will be noticeably slower than fewer shards.

For the maximum docs to allow per shard:  It depends.  For some indexes
and use cases, a million documents per shard might be way too big.  For
others, 500 million per shard might have incredible performance.  There
are no hard rules about this.  It's entirely dependent on what you're
actually doing.


View raw message