trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Geffon <bri...@apache.org>
Subject Re: Look Mom, no locks!
Date Sat, 10 Mar 2012 01:10:50 GMT
Hi James,
Did you ever open a ticket in Jira for this so it can be tracked?

Brian

On Fri, Mar 9, 2012 at 4:59 PM, Whitaker, James X. -ND <
James.X.Whitaker.-ND@disney.com> wrote:

> Yes….  It is a pretty interesting find.  But WAY too much tension has
> built up around it.
>
> From: Jason Giedymin <jason.giedymin@gmail.com>
> Reply-To: "users@trafficserver.apache.org" <users@trafficserver.apache.org
> >
> Date: Fri, 9 Mar 2012 16:57:35 -0800
> To: "users@trafficserver.apache.org" <users@trafficserver.apache.org>
> Subject: Re: Look Mom, no locks!
>
> Sorry, was using this as an example of an exciting report vs a dull one at
> work.
>
> -Jason
>
> On Mar 9, 2012, at 7:55 PM, Jason Giedymin <jason.giedymin@gmail.com>
> wrote:
>
> Came in last week, check out this awesome bug report.
>
> -Jason
>
> On Mar 2, 2012, at 10:29 PM, "Whitaker, James X. -ND" <
> James.X.Whitaker.-ND@disney.com> wrote:
>
> I imagine that it isprobably humanly impossible to see THIS bug with code
> inspection.  Or even by stepping through it in a debugger.****
>
> ** **
>
> I must have scoured over this code about a dozen times looking for
> problems, and I missed it.  I had to utterly pulverize ATS source code
> with instrumentation generating ruby scripts to zero in on it.  What a
> mess I made.****
>
> ** **
>
> On a RedHat or CentOS box, go into lib/ts, edit the Makefile changing the
> gcc optimizer flags to –E, run make in that lib/ts directory and look at
> the preprocessor output in the files ink_queue.o and Allocator.o****
>
> ** **
>
> You will have to stare, and stare, and stare before you see it.  And you
> gotta have the right kind of brain.****
>
> ** **
>
> ink_freelist_new()****
>
> ** **
>
> f->head can get grabbed by simultaneous threads/cores that call
> ink_freelist_new() at the same instant in time -- which will grab the same
> memory location on each thread.****
>
> ** **
>
> The remaining code with all those atomic operators neglects to couple the
> assignment of item=f->head to the increments of head within a single lock.
> ****
>
> ** **
>
> Allocator.o shows that there are no locks.****
>
> ** **
>
> The more cores used by ATS, the more unstable ATS will be.****
>
> ** **
>
> ---****
>
> ** **
>
> I would imagine that the best of the best of the world’s developers have
> been chasing the truly baffling side effects of this bug down for years,
> and probably just plain giving up.  There are all sorts of odd
> mitigations in the code.  Lots of mystery crash reports.  I have to
> wonder if it is all due to this.****
>
> ** **
>
> ---****
>
> ** **
>
> Credit for this bug find goes to the Foundational Technologies Team at
> Disney, Cameron Stevens, my supervisor who pushed me (and paid me) and
> cleared my to do list to make room to find this thing, of course moi who
> spent the last several weeks wracking my brains coming up with one
> (sometimes absurd) hypothesis after another and one debugging and code
> scanning technique after another trying to zero in on it...  and I
> suppose Lisa providing the incentive of stressing the need for financial
> stability before we get married... a serious incentive to make a name for
> myself.****
>
> ** **
>
> ---****
>
> ** **
>
> I created some ruby scripts to imbed instrumentation in the source code
> (and lots of hand cleanup to make it compile... sigh).****
>
> ** **
>
> Among other hacks:****
>
>    each continuation is assigned a unique ID (jimw_stamp) when it is
> created****
>
>    each InkContInternal::init call creates another unique ID****
>
>        (because the freepool allocator does a memcopy without constructor
> invocation).****
>
>    each time an event was scheduled, the event inherits the jimw_stamp
> from the continuation of "origin"****
>
>    every schedule_* macro invocation and TSContSchedule and a few others
> kept track of __FILE__ and __LINE__ that called it, logging that in new
> member variables of the event object.****
>
>    the configure script was altered by replacing "-O2" with "-O0 -ggdb3"
> for the release version ****
>
> ** **
>
> When it crashes, thedebugger shows me:****
>
>     the Event::jimw_stamp taken from the continuation whose associated
> instruction stream originated the event****
>
>     the InkContInternal::jimw_stamp of the receiving continuation****
>
>     The line and file number in the source code that originated the event
> that triggered the crash****
>
> ** **
>
> The core dump showed an event generated for a continuation that had been
> created by TSContCreate in(lets call it) path A through the code. The
> event handler that received the event was the code for handling a
> completely different continuation, path B.****
>
> ** **
>
> Both Path A and Path B were based upon continuations that were created
> with TSContCreate.  Usually, but not always, the handler function whose
> code triggered the crash would be code that was for a continuation that was
> a "helper" continuation intended to support the HttSM class or other
> similar classes.  This Path B code was utterly unprepared to handle the
> events from Path A, would do something freaky, and quite naturally trigger
> an assert.****
>
> ** **
>
> It took lots of convoluted setup to allow RHEL to generate core dumps of
> setuid programs...****
>
> ** **
>
> The big boogeyman, HOWEVER, is that both the incoming EVENT and the
> receiving mismatched CONTINUATION had the same "jimw_stamp" tag in them.**
> **
>
> ** **
>
> WOAH!****
>
> ** **
>
> Think about that for a moment.****
>
> ** **
>
> An event that had been assigned the unique tag from the continuation's
> allocated by TSContCreate in Path A is received by the code for another
> contination created by TSContCreate in Path B.  They share the same
> guaranteed-by-the-constructor-to-be-unique ID tag.****
>
> ** **
>
> The only explanation for that possible is that two continuations somehow
> had the same jimw_stamp!  The same unique tag.  Physically impossible.  That
> means that TWO LIVE CONTINUATIONS had to be occupying the SAME MEMORY SLOT
> IN THE FREELIST.  One of the continuations is receiving all the events
> for both of them!****
>
> ** **
>
> "YOU HAVE GOT TO BE KIDDING ME!"  I said out loud.  Banged my ankle on
> something really hard.  Still nursing the wound.****
>
> ** **
>
> Um... that would be bad.  Real bad.  Unlikely, I thought.  That would
> mean a bug in a freepool allocator...  most likely a missing mutex....  I
> could not imagine that kind of thing ever getting through a code review
> guantlet.  Hundreds of developers have inspected THAT code, right?****
>
> ** **
>
> Code inspection shows that there are locks and mutexes and spinlocks,
> atomic incriments, atomic cas.....  There is NO WAY that could fail
> right?  It looks like it should be bulletproof, but look at all those
> preprocessor directives!****
>
> ** **
>
> Well, ink_freelist_new isnot the easiest code to figure out.  Take a look
> at it....  mind boggling....  After an hour or more I just plain gave up
> trying to figure out what it is ACTUALLY DOING.  I went digging through
> gcc manuals and figured out how to turn on pre-processor output.  If this
> code, so full of preprocessor directives, and so full of locks is putting
> two thingsinto the same memory location: WHAT ON EARTH IS ACTUALLY GETTING
> COMPILED?  I edited the Makefile and put in -E in place of the optimizer
> flags, and let the preprocessor do it's thing.  What on earth does the
> actual C++ code thatis actually getting compiled in look like?****
>
> ** **
>
> ---****
>
> ** **
>
> ink_queue.o, preprocessor output****
>
> ** **
>
> ** **
>
> void *****
>
> ink_freelist_new(InkFreeList * f)****
>
> ** **
>
> {****
>
> # 234 "ink_queue.cc"****
>
>   head_p item;****
>
>   head_p next;****
>
>   int result = 0;****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>   do {****
>
>     *((uint64_t*)&(item)) = *((uint64_t*)&(f->head));****
>
>     if (((void*)(((void*)(((((intptr_t)(item).data)<<16)>>16) |
> (((~((((intptr_t)(item).data)<<16>>63)-1))>>48)<<48))))) == __null)
{****
>
>       uint32_t type_size = f->type_size;****
>
>       uint32_t i;****
>
> # 255 "ink_queue.cc"****
>
>       void *newp = __null;****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>       if (f->alignment)****
>
>         newp = ink_memalign(f->alignment, f->chunk_size * type_size);****
>
>       else****
>
>         newp = ink_malloc(f->chunk_size * type_size);****
>
>       if (newp)****
>
>         ink_atomic_increment64(&freelist_allocated_mem, (int64_t)
> (f->chunk_size * type_size));;****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>       (item).data = ((((intptr_t)(newp))&0x0000FFFFFFFFFFFFULL) |
> (((0)&0xFFFFULL) << 48));****
>
> # 291 "ink_queue.cc"****
>
>       ink_atomic_increment((int *) &f->allocated, f->chunk_size);****
>
>       ink_atomic_increment64(&fastalloc_mem_total, (int64_t)
> f->chunk_size * f->type_size);****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>       for (i = 0; i < f->chunk_size; i++) {****
>
>         char *a = ((char *) ((void*)(((((intptr_t)(item).data)<<16)>>16)
> | (((~((((intptr_t)(item).data)<<16>>63)-1))>>48)<<48)))) + i
* type_size;
> ****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>         ink_freelist_free(f, a);****
>
> # 315 "ink_queue.cc"****
>
>       }****
>
> ** **
>
>       ink_atomic_increment((int *) &f->count, f->chunk_size);****
>
>       ink_atomic_increment64(&fastalloc_mem_in_use, (int64_t)
> f->chunk_size * f->type_size);****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
>     } else {****
>
>       (next).data = ((((intptr_t)(*((void **)((char
> *)((void*)(((void*)(((((intptr_t)(item).data)<<16)>>16) |
> (((~((((intptr_t)(item).data)<<16>>63)-1))>>48)<<48))))) +
> f->offset))))&0x0000FFFFFFFFFFFFULL) | ((((((intptr_t)(item).data)>>48)
+
> 1)&0xFFFFULL) << 48));****
>
> ** **
>
> ** **
>
>       result = ink_atomic_cas64((int64_t *) & f->head.data, item.data,
> next.data);****
>
> # 345 "ink_queue.cc"****
>
>     }****
>
>   }****
>
>   while (result == 0);****
>
>   (void)(!((uintptr_t)((void*)(((void*)(((((intptr_t)(item).data)<<16)>>16)
> |
> (((~((((intptr_t)(item).data)<<16>>63)-1))>>48)<<48)))))&(((uintptr_t)f->alignment)-1)));
> ****
>
> ** **
>
>   ink_atomic_increment((int *) &f->count, 1);****
>
>   ink_atomic_increment64(&fastalloc_mem_in_use, (int64_t) f->type_size);**
> **
>
> ** **
>
>   return ((void*)(((void*)(((((intptr_t)(item).data)<<16)>>16) |
> (((~((((intptr_t)(item).data)<<16>>63)-1))>>48)<<48)))));****
>
> ** **
>
> }****
>
> ** **
>
> ** **
>
> Allocator.o section:****
>
> ** **
>
> template<class C> inline C * ClassAllocator<C>::alloc()****
>
> {****
>
> ** **
>
> ** **
>
> ** **
>
>   void *ptr = ink_freelist_new(&this->fl);****
>
> ** **
>
>   if (sizeof(C) < 512) {****
>
>     for (unsigned int i = 0; i < (((sizeof(C))+15)&~15) /
> sizeof(int64_t); i++)****
>
>       ((int64_t *) ptr)[i] = ((int64_t *) &this->proto.typeObject)[i];****
>
>   } else****
>
>     memcpy(ptr, &this->proto.typeObject, sizeof(C));****
>
>   return (C *) ptr;****
>
> }****
>
> ** **
>
> InkAPI.cc:****
>
> ** **
>
> static ClassAllocator<INKContInternal>
> INKContAllocator("INKContAllocator");****
>
> ** **
>
> later in InkAPI.cc****
>
> ** **
>
> ////////////////////////////////////////////////////////////////////****
>
> //****
>
> // Continuations****
>
> //****
>
> ////////////////////////////////////////////////////////////////////****
>
> ** **
>
> TSCont****
>
> TSContCreate(TSEventFunc funcp, TSMutex mutexp)****
>
> {****
>
>   // mutexp can be NULL****
>
>   if (mutexp != NULL)****
>
>     sdk_assert(sdk_sanity_check_mutex(mutexp) == TS_SUCCESS);****
>
> ** **
>
>   INKContInternal *i = INKContAllocator.alloc();****
>
> ** **
>
>   i->init(funcp, mutexp);****
>
> // printf ("CREATINGCONT: %p  marker: %d\n", i, i->jimw_stamp);****
>
>   return (TSCont)i;****
>
> }****
>
> ** **
>
> ** **
>
> LOOK MOM!  NO LOCKS!!!!!****
>
> ** **
>
> ---------------------------------****
>
> ** **
>
> Now, my guess is that for who knows how many years, anyone reviewing the
> code saw this in ink_queue.h and assumed the locks were in place.****
>
> ** **
>
> INK_USE_MUTEX_FOR_FREELISTS is not in actuality defined, it is only
> defined when IS_TS_MICRO_BUILD (??!?) is 1, which it is NOT.****
>
> ** **
>
> If you do a "grep", it shows the line where INK_USE_MUTEX_FOR_FREELISTS is
> defined.****
>
> ** **
>
> 161 #if !defined(INK_USE_MUTEX_FOR_FREELISTS)****
>
> 162   inkcoreapi void *ink_freelist_new(InkFreeList * f);****
>
> 163   inkcoreapi void ink_freelist_free(InkFreeList * f, void *item);****
>
> 164 #else                           /* INK_USE_MUTEX_FOR_FREELISTS */****
>
> 165   inkcoreapi void *ink_freelist_new_wrap(InkFreeList * f);****
>
> 166   static inline void *ink_freelist_new(InkFreeList * f)****
>
> 167   {****
>
> 168     void *retval = NULL;****
>
> 169 ****
>
> 170     ink_mutex_acquire(&(f->inkfreelist_mutex));****
>
> 171     retval = ink_freelist_new_wrap(f);****
>
> 172     ink_mutex_release(&(f->inkfreelist_mutex));****
>
> 173     return retval;****
>
> 174   }****
>
> 175 ****
>
> 176   inkcoreapi void ink_freelist_free_wrap(InkFreeList * f, void *item);
> ****
>
> 177   static inline void ink_freelist_free(InkFreeList * f, void *item)***
> *
>
> 178   {****
>
> 179     ink_mutex_acquire(&(f->inkfreelist_mutex));****
>
> 180     ink_freelist_free_wrap(f, item);****
>
> 181     ink_mutex_release(&(f->inkfreelist_mutex));****
>
> 182   }****
>
> 183 #endif /* INK_USE_MUTEX_FOR_FREELISTS */****
>
> ** **
>
> If you turn it on bydefining INK_USE_MUTEX_FOR_FREELISTS at the top of
> ink_queue.h, the code does NOT compile.  This suggests a VERY old bug.  Once
> you get it to compile by adding the missing header file, it deadlocks on
> itself in line 306 of ink_queue.ccwhere it calls ink_freelist_free instead
> of ink_freelist_free_wrap....  another sign of a very old bug....  Well...
> Fix that...  fire it up...****
>
> ** **
>
> Stable...  REAL STABLE  It has been running for over twenty hours now
> hammered by my ruby script load generator with no trace of errors where it
> used to barf and crash in seconds or minutes, or occasionally an hour.****
>
> ** **
>
> But I can not get it toshow over 56% of CPU with "top" (15-20% max on any
> given core), not even with 100 load generators pounding on ATS.  Before
> it didn't take much of anything to make it max out all four virtual cores.
> This is, of course, on my Mac VMWare Fusion with the version from 2009...
> the inexplicable delays could be some kind of queuing glitch from what I
> suspect is the probable absence of a monotonically increasing possibly
> double buffered clock on my VM/Linux Kernel.  So I tried it with the
> 3.0.3 version with Brian's queuing fix, and it does thesame thing, with
> query times starting at around 30ms instead of 3 milliseconds...****
>
> ** **
>
> Turn on the locks, and I get a 5 to 10X performance hit!****
>
> ** **
>
> I am not sure yet what is up with the degraded performance turning on
> these mutex locks...  I would expect a hit because of locking delays...
> but this is REALLY severe... This is why MALLOC in a multi-core box is a
> challenge, because threadlocks are a real serious pain.  What has me
> slightly baffled is that f->inkfreelist_mutex appears to be local to each
> freelist...  they are NOT global locks, and yet it slows down to a
> snail's pace.****
>
> ** **
>
> Cameron says it's time to have some other brains look at this one.  That's
> probably a good judgement call, because I am pretty wired.  Hard to think
> straight this Friday afternoon.****
>
> ** **
>
> ----------------------------------****
>
> ** **
>
> How to fix this?****
>
> ** **
>
> Well, tcmalloc has an interesting approach...****
>
> ** **
>
> Gets me thinking anyway....****
>
> ** **
>
> Replacing the freepools outright with tcmalloc doesn't seem like an
> optimal solution.  Might wind up with a memory fragmentation problem...  Also
> the current code has been debugged and is running using the freepool's
> memcpy of a prototype object in place of a constructor.  Given how much
> code there is in ATS, I would not want to force every class that uses the
> freepools to suddenly call a constructor for every object...  There is no
> telling what that could break if there are bugs in any of the previously
> unexercised constructors.****
>
> ** **
>
> So I am thinking to have per thread free pools, that share a global pool.
> In that kind of design, if the per thread free pool has anything
> available, the memory allocator doesn't need to lock.  It only needs to
> lock when grabbing a batch of, say 100 items, from the global shared pool,
> or when returning excessivenumbers of freed items back to the global free
> pool.  Every thread has to be able to return items back into it's own
> free pool no matter which thread allocated it, so I am initially imagining
> that having a global pool with locking batch operators feeding thread pools
> would be the best design.... though there could be better.****
>
> ** **
>
> One complicated option is having a dedicated thread that was triggered by
> watermarks detected by eachthread that owned a pool that would push or pull
> batches of thread pool objects out of the global pools.  That is kind of
> complicated, but it would keep the thread pools full all the time with
> never a locking delay.  Assuming I do it right, of course.****
>
> ** **
>
> Ideas?****
>
> ** **
>
> ---------------------------------****
>
> ** **
>
> A couple of years ago, I worked for 21 months at Microsoft's bing.com's
> UX Platform Team (since renamed the "Front Door Team" I have been told), on
> their front line server team that handled all the search traffic for all of
> bing.com.  I am a grey haired C++ server performance andstability WIZARD
> (for lack of a better term) with industry experience goingback into the
> 1980's.  I am probably just what the ATS team needs as a committer…****
>
> ** **
>
> The "-ND" in my name field, means "Not a Disney employee".  I am a
> contractor here, with career aspirations.****
>
> ** **
>
> I have heard from numerous sources that the ATS committer team members are
> regarded as celebrities in the field, especially if they hold onto that
> distinction for any length of time.  That someone who gets on the ATS
> committer team just does not wind up on unemployment....  That Disney
> management (who seem to like me.... and I just love this place and the
> people here...) would absolutely NEVER even think of letting me go.****
>
> ** **
>
> Some say that the best way to go is to show up with a huge bang and snow
> your doors in.****
>
> ** **
>
> One of the big motivators to take this contract position was something
> Cameron said: "we need someone who knows C++ to learn ATS inside and out,
> which might provide the opportunity for open source contributions and
> making a name for yourself".****
>
> ** **
>
> So... Umm.. HI THERE!****
>
> ** **
>
> ** **
>
>

Mime
View raw message