trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Kazlenka <pavel.kazle...@measurement-factory.com>
Subject Re: Tuning ATS for performance testing.
Date Thu, 17 Oct 2013 18:15:21 GMT
Thank you Igor.

I've rebuilt ATS with hwloc and things became a bit better. Now I see 
that load is being balanced fairly between configured number of threads 
during the test. If I understood correctly, this number is either 
exec_thread.autoconfig.scale * number of cpu cores if 
exec_thread.autoconfig is set to '1' or exec_thread.limit in opposite case.

Also I found that there should be about 30 threads to keep load of 4-5k 
requests per second. But there are still many CPU cycles left (threads 
take about 80-90% CPU of 400% available).
One of the possible problems I observe is that threads are migrating 
between CPU cores. Each time such a migration occurs, thread receives 
performance penalty that is not good. Is there any mechanism to bind 
thread to cpu core at startup?

Also, may be there are another useful libraries ATS should be built 
against for better performance? My config.log can be found here:  
http://pastebin.com/u5sV61rR

TIA,
Pavel

On 10/17/2013 04:22 PM, Igor Galić wrote:
>
> ----- Original Message -----
>> On 10/17/2013 12:30 AM, Igor Galić wrote:
>>> ----- Original Message -----
>>>> Hi gentlemen,
>>> Hi Pavel,
>>>    
>>>> I'm trying to test the performance of ATS v.4.0.2.
>>>>
>>>> Server under test has quad-core CPU with HT disabled. During test (1k
>>> Given the range of Platforms we support it's /always/ Good to explicitly
>>> state which platform (OS, version, kernel) you're running on.
>>>
>>> But also, exactly how you compiled it.
>> It's ubuntu 12.04 LTS (32 bit). ATS is configured with default options
>> (except of --prefix).
> there is no such thing as default, when everything is being discovered ;)
> Are you compiling with hwloc? (And if not, can you try to do, and report
> how it changes the behaviour.)
>
>>>> user-agents, 1k origin servers, up to 6k requests per second with
>>>> average size of 8kb) at mark of 2-2.5k requests per second I see the
>>> Given the range of configurations we support it's always good to explictly
>>> state if this is a forward, reverse or transparent proxy (You only mention
>>> later that caching is fully disabled..)
>> Right. This is forward proxy case with reverse proxy mode explicitly
>> disabled.
>>>> signs of overloading (growing delay time, missed responses). The problem
>>>> is that according to top output, CPU cycles are not under heavy load
>>>> (which is strange for overloaded system). All the other parameters (ram,
>>>> I/O, network) are far from saturation too. Top shows load at about
>>>> 50-60% of one core for [ET_NET 0] process. traffic_server instances seem
>>>> to be spreaded between all the cores, even if I'm trying to bind them
>>>> mandatory to one or the two of the corec using taskset.
>>> (at this point I can now guess with certainty that you're talking about
>>> Linux, but I still don't know which distro/version, etc..)
>>>    
>>>> My alterations to default ats configuration (mostly following this
>>>> guide:http://www.ogre.com/node/392):
>>>>
>>>> Cache is fully disabled:
>>>> CONFIG proxy.config.http.cache.http INT 0
>>>> Threads:
>>>> CONFIG proxy.config.exec_thread.autoconfig INT 0
>>>> CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1
>>>> CONFIG proxy.config.exec_thread.limit INT 4
>>>> CONFIG proxy.config.accept_threads INT 2
>>>> CONFIG proxy.config.cache.threads_per_disk INT 1
>>>> CONFIG proxy.config.task_threads INT 4
>>>>
>>>> So my questions are the next:
>>>> 1) Is there any known strategy to distribute ATS processes/threads by
>>>> CPU cores? E.g. All the traffic_server threads bind to cpu0 and cpu1,
>>>> all traffic_manager threads to cpu2 and networking interrupts to cpu3?
>>>> 2) If so, how can this be done? I see some threads ignore 'taskset -a -p
>>>> 1,2 <traffic_server pid>' and are being executed on any CPU core. May
be
>>>> configuration directives?
>>>> 3) What is the better strategy for core configuration? Should sum of
>>>> task, accept and network threads be equal to CPU cores number + 1? Or
>>>> anything else? May be it's better to use 40 threads in sum for quad-core
>>>> device?
>>>> 4) Does *thread* config options are taking in account if
>>>> proxy.config.http.cache.http is set to '1'?
>> Here I copied wrong option. I meant
>> 'proxy.config.exec_thread.autoconfig' set to '1'
>>>> 5) What other options should have influence on system performance in
>>>> case of cache-off test?
>>>>
>>>> TIA,
>>>> Pavel
>>>>
>>>>
>>>>
>>


Mime
View raw message