late reply...
We are trying to run drill on 200 nodes, but we keep getting random lost of
connectivity with certain nodes, which spoil the query completely, happens
maybe 50% of the time.
It depends on how many files gets queried, basically how heavy is the query.
It looks exactly like this problem:
ForemanException: One or more nodes lost connectivity during query
https://issues.apache.org/jira/browse/DRILL-4325
Until we find a solution, we stick to a dedicated dozen node cluster.
It would be nice to have to have a query recover from disconnected nodes
and keep gathering result from valid nodes.
On Thu, Jul 14, 2016 at 11:22 PM, scott <tcots8888@gmail.com> wrote:
> Curious what the biggest is. Has anyone configured more than 100 drillbits
> in a cluster before?
>
> Scott
>
> On 07/14/2016 10:27 AM, Ted Dunning wrote:
>
>> On the right distribution, you can restrict the subset of the cluster that
>> has the data you need to avoid locality variation when Drill only runs on
>> a
>> subset of nodes.
>>
>>
>>
>> On Thu, Jul 14, 2016 at 6:48 AM, François Méthot <fmethot78@gmail.com>
>> wrote:
>>
>> We have observed that if the number of drillbits is lower than the number
>>> of nodes in our cluster, some minor fragment takes longer to complete
>>> their
>>> query (We hypothesize that it is because they can't take advantage of
>>> data
>>> locality, fragment has to reach out for data on a different node). One
>>> drillbit to one node, with evenly spread data is the best scenario.
>>>
>>> These results may also vary depending on your hardware I think.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 7:06 PM, Ashish Goel <
>>> ashish.kumar.goel1@gmail.com>
>>> wrote:
>>>
>>> That's an interesting question. I would also be curious to learn more
>>>>
>>> about
>>>
>>>> this. Did anyone run any benchmarks around this? It would be helpful to
>>>> understand.
>>>>
>>>> On Thu, Jul 7, 2016 at 11:13 AM, scott <tcots8888@gmail.com> wrote:
>>>>
>>>> Abdel,
>>>>> I didn't ask about having more than one drillbit per node. I asked
>>>>>
>>>> about
>>>
>>>> the number of drillbits per cluster. For instance, if I had a 1000 node
>>>>> Hadoop cluster, should I install drillbits on each node? Or, is there
>>>>>
>>>> some
>>>>
>>>>> point at which the interaction of 1000 drillbits causes contention
>>>>> resulting in a plateau or decline of performance?
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>> On Thu, Jul 7, 2016 at 5:00 PM, Abdel Hakim Deneche <
>>>>>
>>>> adeneche@maprtech.com
>>>>
>>>>> wrote:
>>>>>
>>>>> I'm not sure you'll get any performance improvement from running more
>>>>>>
>>>>> than
>>>>>
>>>>>> a single drillbit per cluster node.
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 9:47 AM, scott <tcots8888@gmail.com>
wrote:
>>>>>>
>>>>>> Follow up question: Is there a sweet spot for
>>>>>>>
>>>>>> DRILL_MAX_DIRECT_MEMORY
>>>
>>>> and
>>>>>
>>>>>> DRILL_HEAP settings?
>>>>>>>
>>>>>>> On Wed, Jul 6, 2016 at 2:42 PM, scott <tcots8888@gmail.com>
wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>> Does anyone know if there is a maximum number of drillbits
>>>>>>>>
>>>>>>> recommended
>>>>>
>>>>>> in
>>>>>>
>>>>>>> a Drill cluster? For example, I've observed that in a Solr Cloud,
>>>>>>>>
>>>>>>> the
>>>>
>>>>> performance tapers off for ingest at around 16 JVM instances. Is
>>>>>>>>
>>>>>>> there
>>>>>
>>>>>> a
>>>>>>
>>>>>>> similar practical limitation to the number of drillbits I should
>>>>>>>>
>>>>>>> cluster
>>>>>>
>>>>>>> together?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Abdelhakim Deneche
>>>>>>
>>>>>> Software Engineer
>>>>>>
>>>>>> <http://www.mapr.com/>
>>>>>>
>>>>>>
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>>>
>>>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Sig
>>> nature&utm_campaign=Free%20available
>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Ashish
>>>>
>>>>
>
|