1. So, explicitly setting phoenix.query.rowKeyOrderSaltedTable to true should be done, right?

I forgot to mention that the config is deprecated now. The config you want to override is phoenix.query.force.rowkeyorder. Do note that by setting the config to true, you are asking Phoenix to do a client side merge sort to make sure rows are sorted by the row key order. By avoiding the need to sort the rows, for non-aggregate queries, you get a huge perf boost since Phoenix can then utilize an optimization. See PHOENIX-1779 for details. I would recommend providing an explicit ORDER BY on the row key columns since that is the contract we will support going forward.

2. So, a single region server has to be able to hold multiple salt buckets. Is that correct?

Salt buckets are nothing but pre-split regions. So yes, a region server will handle load for multiple regions/salt buckets if the number of salt buckets is greater than number of region servers.

3. Where does Phoenix maintain the mapping of salt buckets to region server given that the two are orthogonal to each other?

There is no such mapping. In case of salted tables, because data is randomly distributed across regions, Phoenix issues parallel scans for all the regions.

HTH.




On Thu, Oct 8, 2015 at 4:02 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:
Thank you Samarth and Ravi.

1. So, explicitly setting phoenix.query.rowKeyOrderSaltedTable to true should be done, right?

Also, thanks for clarifying salting. However, tables are split at salt byte boundaries and within a region (salt byte, that is), all rows are sorted. This means that different portions of the table land in different region servers. 

2. So, a single region server has to be able to hold multiple salt buckets. Is that correct?
3. Where does Phoenix maintain the mapping of salt buckets to region server given that the two are orthogonal to each other?

Best regards,
Sumit


From: Samarth Jain <samarth@apache.org>
To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Cc: Sumit Nigam <sumit_only@yahoo.com>
Sent: Wednesday, October 7, 2015 10:53 PM
Subject: Re: Salting and pre-splitting

- Default value of phoenix.query.rowKeyOrderSaltedTable is true and that ensure that LIMIT clause returns data in rowkey order

This is no longer the case starting Phoenix 4.4. You need to provide an explicit ORDER BY on row key columns if you need the rows to be returned in row key order.



On Wed, Oct 7, 2015 at 9:59 AM, Ravi Kiran <maghamravikiran@gmail.com> wrote:
Hi Sumit,
   
 The PhoenixInputFormat gets the number of splits based on the region boundaries .  However, if guideposts are configured(https://phoenix.apache.org/update_statistics.html) you might not see a 1 to 1 mapping. @James please correct me if I am wrong here.

   You are right on the salting behavior.

Regards
Ravi 

On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:
I did some homework and got some answers. Now open questions that remain:

1. Is number of buckets = number of task splits that Phoenix InputFormat uses?
2. Salting uses the first byte of stable hash of rowkey and it is this byte that is prefixed. Is this correct?

Answers, I could get:

1. Pre-splitting is not needed with salting. Salting anyway, pre-splits at salt byte boundary. 
2. SALT_BUCKETS can be set to a higher value than region servers for future.
3. Adding a new region server does not matter to existing records as the mod is with SALT_BUCKETS and not region servers
4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that ensure that LIMIT clause returns data in rowkey order

Thanks,
Sumit


From: Sumit Nigam <sumit_only@yahoo.com>
To: Users Mail List Phoenix <user@phoenix.apache.org>
Sent: Wednesday, October 7, 2015 12:41 PM
Subject: Salting and pre-splitting

Hi,

I am somewhat confused by salting and pre-splitting. Would be grateful if any of you can clarify the following:

1. Do I need to use pre-splitting along with salting to take advantage of performance? Or I can still have single region server hot-spotting until I have enough regions to split into 2?
2. Is it true that SALT_BUCKETS should be set to (number of region servers) * (number of cores per region server) ?
3. I cannot modify salt buckets after table is created. If so, what happens when I add a new region server to the mix?
4. Is number of buckets = number of task splits that Phoenix InputFormat uses?
5. Does salting create a hex rowkey as is recommended?
6. With salting, can I still perform range scans with LIMIT clause?

Thanks,
Sumit